Curl and a missing slash

Monday, February 24, 2014

While I was playing around with proxy-mirror I noticed an interesting behaviour when testing the proxy with curl. The following command:
curl http://wp.pl --proxy http://localhost:8888/
will result in following output when the proxy is Fiddler:
[Fiddler] Response Header parsing failed.
Response Data:
<plaintext>
43 6F 6E 6E 65 63 74 69 6F 6E 3A 20 63 6C 6F 73 65 0D 0A 0D 0A 3C 48 54  Connection: close....<HT
4D 4C 3E 3C 48 45 41 44 3E 3C 54 49 54 4C 45 3E 34 30 30 20 42 61 64 20  ML><HEAD><TITLE>400 Bad
52 65 71 75 65 73 74 3C 2F 54 49 54 4C 45 3E 3C 2F 48 45 41 44 3E 0A 3C  Request</TITLE></HEAD>.<
42 4F 44 59 3E 3C 48 32 3E 34 30 30 20 42 61 64 20 52 65 71 75 65 73 74  BODY><H2>400 Bad Request
3C 2F 48 32 3E 0A 59 6F 75 72 20 72 65 71 75 65 73 74 20 68 61 73 20 62  </H2>.Your request has b
61 64 20 73 79 6E 74 61 78 20 6F 72 20 69 73 20 69 6E 68 65 72 65 6E 74  ad syntax or is inherent
6C 79 20 69 6D 70 6F 73 73 69 62 6C 65 20 74 6F 20 73 61 74 69 73 66 79  ly impossible to satisfy
2E 0A 3C 48 52 3E 0A 3C 41 44 44 52 45 53 53 3E 3C 41 20 48 52 45 46 3D  ..<HR>.<ADDRESS><A HREF=
22 68 74 74 70 3A 2F 2F 77 77 77 2E 77 70 2E 70 6C 2F 22 3E 61 72 69 73  "http://www.wp.pl/">aris
3C 2F 41 3E 3C 2F 41 44 44 52 45 53 53 3E 0A 3C 2F 42 4F 44 59 3E 3C 2F  </A></ADDRESS>.</BODY></
48 54 4D 4C 3E 0A                                                        HTML>.
while a simple implementation relaying on node.js core http module and http-proxy module outputs this:
An error has occurred: {"bytesParsed":0,"code":"HPE_INVALID_CONSTANT"}
Meanwhile without the proxy parameter the actual response is:
HTTP/1.1 301 Moved Permanently
Server: aris
Location: http://www.wp.pl
Content-type: text/html
Content-Length: 0
Connection: close

Curl forgiving behaviour

As it turns out the actual outgoing HTTP request is different depending on the presence of –-proxy parameter. Without it the target server receives and responds with:
GET / HTTP/1.1
User-Agent: curl/7.26.0
Host: wp.pl
Accept: */*

HTTP/1.1 301 Moved Permanently
Server: aris
Location: http://www.wp.pl
Content-type: text/html
Content-Length: 0
Connection: close
but when the proxy setting is present:
GET http://wp.pl HTTP/1.1
User-Agent: curl/7.26.0
Host: wp.pl
Accept: */*
Proxy-Connection: Keep-Alive

UNKNOWN 400 Bad Request
Server: aris
Content-Type: text/html
Date: Sun, 23 Feb 2014 16:01:36 GMT
Last-Modified: Sun, 23 Feb 2014 16:01:36 GMT
Accept-Ranges: bytes
Connection: close

<HTML><HEAD><TITLE>400 Bad Request</TITLE></HEAD>
<BODY><H2>400 Bad Request</H2>
Your request has bad syntax or is inherently impossible to satisfy.
<HR>
<ADDRESS><A HREF="http://www.wp.pl/">aris</A></ADDRESS>
</BODY></HTML>
As you may have noticed the difference in requests (apart from additional header) is in first line – where in the first case curl assumed we want to GET /. The former uses a relative URI while the later absolute. Now if we change the command line just a little bit so that the address looks like http://wp.pl/ the server will receive request with correct absolute URI and will respond with 301.

UNKNOWN Status Line

Careful readers may have already noticed the Status Line of response returned by server is malformed. According to the spec it should have following form:
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
That's the reason why Fiddler warns users about protocol violation and the reason of error inside node.js proxy.

A less intrusive proxy

An example described above leads to other questions about behaviour of http proxy especially if we think about implementing HTTP debugging tool like proxy-mirror. It’s only a guess but I suspect that there are other subtle differences between requests sent by client and those received by target server when the proxy is implemented using http-proxy module or core http module. I am aware of at least one such case where HTTP Headers are lowercased. If I’m right a debugging tool relaying on them would make it really hard to spot certain problems – like protocol violations on both client and server side.
In my next blog post I’ll describe how to built even less intrusive proxy without building the http parser by hand.

An HTTPS system proxy with node

Thursday, January 2, 2014

Creating a simple http proxy in node.js is super easy thanks to an excellent module – http-proxy. With only the following code you’ll have a proxy that can be used as system or browser level proxy:

var httpProxy = require('http-proxy');
var proxyServer = httpProxy.createServer(function (req,res,proxy) {
	var hostNameHeader = req.headers.host,
	hostAndPort = hostNameHeader.split(':'),
	host = hostAndPort[0],
	port = parseInt(hostAndPort[1]) || 80;
	proxy.proxyRequest(req,res, {
		host: host,
		port: port
	});
});

proxyServer.listen(8888);

Adding support for HTTPS

The first thing we will need is a certificate that will be used by TLS implementation for encryption. A server has access to certificate private and public key thus it is able to get a clear text from a message encrypted with public key. It’s a common knowledge that asymmetric encryption is more expensive in terms of CPU cycles than its symmetric counterpart. That’s way when using HTTPS connection asymmetric encryption is only used during initial handshake to exchange a session key in a secure manner between client and server. This session key is then used by both client and server for traffic encryption using symmetric algorithm.

Naturally to get or rather to buy a proper certificate one would have to be verified by a valid certification authority. Fortunately for the sake of a demo we can generate a self signed certificate. That’s really easy, a nice description is available on heroku help pages:

openssl genrsa -des3 -passout pass:x -out proxy-mirror.pass.key 2048
echo "Generated proxy-mirror.pass.key"

openssl rsa -passin pass:x -in proxy-mirror.pass.key -out proxy-mirror.key
rm proxy-mirror.pass.key
echo "Generated proxy-mirror.key"

openssl req -new -batch -key proxy-mirror.key -out proxy-mirror.csr -subj /CN=proxy-mirror/emailAddress=piotr.mionskowski@gmail.com/OU=proxy-mirror/C=PL/O=proxy-mirror
echo "Generated proxy-mirror.csr"

openssl x509 -req -days 365 -in proxy-mirror.csr -signkey proxy-mirror.key -out proxy-mirror.crt
echo "Generated proxy-mirror.crt"

http-proxy support for https

http-proxy module supports various https configuration for example passing traffic from https to http and vice versa. Unfortunately I couldn’t find a way to get it working without preconfiguring target host and port – which was a problem while I was implementing proxy-mirror. Here is what curlwill print out when trying to use such proxy:

curl -vk --proxy https://localhost:8888/ https://pl-pl.facebook.com/
* timeout on name lookup is not supported
* About to connect() to proxy localhost port 8888 (#0)
* Trying 127.0.0.1...
* connected
* Connected to localhost (127.0.0.1) port 8888 (#0)
* Establish HTTP proxy tunnel to pl-pl.facebook.com:443
> CONNECT pl-pl.facebook.com:443 HTTP/1.1
> Host: pl-pl.facebook.com:443
> User-Agent: curl/7.26.0
> Proxy-Connection: Keep-Alive
>

I suspect the problem is inherently related to the way http-proxy uses nodejs core http(s) modules – I might be wrong here though. The workaround I’ve used in proxy-mirror was to listen to CONNECT event on http server, establish socket connection to a fake https server listening on different port. When the https connection is established the fake https server handler proxies requests further. The advantage of this approach is that we can use the same proxy address for both http and https. Here is the code:

	var httpProxy = require('http-proxy'),
    fs = require('fs'),
    https = require('https'),
    net = require('net'),
    httpsOptions = {
	    key: fs.readFileSync('proxy-mirror.key', 'utf8'),
	    cert: fs.readFileSync('proxy-mirror.crt', 'utf8')
    };
    
    var proxyServer = httpProxy.createServer(function (req, res, proxy) {
	    console.log('will proxy request', req.url);
	    var hostNameHeader = req.headers.host,
	    hostAndPort = hostNameHeader.split(':'),
	    host = hostAndPort[0],
	    port = parseInt(hostAndPort[1]) || 80;
	    proxy.proxyRequest(req, res, {
		    host: host,
		    port: port
	    });
    });
    
    proxyServer.addListener('connect', function (request, socketRequest, bodyhead) {
	    var srvSocket = net.connect(8889, 'localhost', function () {
		    socketRequest.write('HTTP/1.1 200 Connection Established\r\n\r\n');
		    srvSocket.write(bodyhead);
		    srvSocket.pipe(socketRequest);
		    socketRequest.pipe(srvSocket);
	    });
    });
    
    var fakeHttps = https.createServer(httpsOptions, function (req, res) {
	    var hostNameHeader = req.headers.host,
	    hostAndPort = hostNameHeader.split(':'),
	    host = hostAndPort[0],
	    port = parseInt(hostAndPort[1]) || 443;
	    
	    proxyServer.proxy.proxyRequest(req, res, {
		    host: host,
		    port: port,
		    changeOrigin: true,
		    target: {
		    	https: true
		    }
	    });
    });
    
    proxyServer.listen(8888);
    fakeHttps.listen(8889);

HTML5 WebSocket support

The above code has still problems handling WebSockets. This is because browsers, according to the spec, change the way they establish initial connection when they detect that they are behind an http proxy. As you can read on wikipedia more existing http proxy implementation suffer from this.  I still haven’t figured out how to handle this scenario elegantly with http-proxy and node.js, when I do I will post my findings here.

Building an HTTP sniffer with node.js

Wednesday, January 1, 2014

Whether to inspect a server response, optimize network usage or just to fiddle with new REST API it almost always make sense to use an application that can display http requests and responses for investigation. On Windows there is the great Fiddler. While it’s possible to use Fiddler on other platforms using virtualization software I think it’s an overkill. Fortunately there are alternatives. Charles looks like the most advanced one offering most if not all features available in Fiddler. There is also a little less popular HTTP Scoop. Both Charles and HTTP Scoop aren’t free but in my opinion they are worth the price especially if used often. Command line lovers might find mitmproxy suit their needs. If you only need basic features ngrok might serve you well. To dive a bit deeper and see http traffic on a tcp level WireShark is indispensable.

As you can see there are plenty of tools available to help understand what is happening on http level. As I was learning about http caching a question popped to my head. How hard would it be to actually build a simple http sniffer?

proxy-mirror – a simple http inspector

As it turned out it’s actually not that hard to built one thus proxy-mirror came to be. Of course this is a very simplistic http sniffer – nowhere near to tools I mentioned above both in terms of features and reliability – but it works (at least in most scenariosWinking smile). It’s open source and I learned couple of new things about HTTP while implementing it – more on that in future posts. Here are some screen shoots of the the tool: 

As you might have guessed from screenshots proxy-mirror is a web application. Right now it’s not very easy to try it out – the instructions are in the Readme – I’ll try to fix it soon.

How it works

I wanted to run proxy-mirror on many platforms and rely on a foundation that has both great support for http and building web applications. I picked node.js over java or ruby mainly because I wanted to sharpen my skills in it.

You can think of proxy-mirror as 2 logical components. The first is a regular http proxy that you can use by configuring your browser or system. I didn’t want to focus on building that first so I utilized a great node module call http-proxy. With it’s helped you can have a system wide proxy in couple of minutes. The http proxy component emits events whenever a request or response goes through it. Those events are consumed by the second logical component – a web application built with express.js. The information about http traffic received from events is then pushed to browser part through socket.io where a simple SPA built with my beloved AngularJS displays information about them.

Features

Right now proxy-mirror has only couple of features:

  • http and https support – although the latter one requires additional setup
  • simple session list – a grid with list of request/response pairs that you can inspect
  • a detailed view – both for request and response that can display headers, message body and a preview right in the browser

Although it still is a very simple (and buggy) application I actually found that it has most of features I frequently use – probably except for filtering. Maybe someday it will become a reasonable alternative for Charles - at least for me.

I think building an http sniffer is a great exercise in the process of learning how http(s) works. I guess the famous saying about journey being more more important than the destination fits nice here.