Curl and a missing slash

Monday, February 24, 2014

While I was playing around with proxy-mirror I noticed an interesting behaviour when testing the proxy with curl. The following command:
curl http://wp.pl --proxy http://localhost:8888/
will result in following output when the proxy is Fiddler:
[Fiddler] Response Header parsing failed.
Response Data:
<plaintext>
43 6F 6E 6E 65 63 74 69 6F 6E 3A 20 63 6C 6F 73 65 0D 0A 0D 0A 3C 48 54  Connection: close....<HT
4D 4C 3E 3C 48 45 41 44 3E 3C 54 49 54 4C 45 3E 34 30 30 20 42 61 64 20  ML><HEAD><TITLE>400 Bad
52 65 71 75 65 73 74 3C 2F 54 49 54 4C 45 3E 3C 2F 48 45 41 44 3E 0A 3C  Request</TITLE></HEAD>.<
42 4F 44 59 3E 3C 48 32 3E 34 30 30 20 42 61 64 20 52 65 71 75 65 73 74  BODY><H2>400 Bad Request
3C 2F 48 32 3E 0A 59 6F 75 72 20 72 65 71 75 65 73 74 20 68 61 73 20 62  </H2>.Your request has b
61 64 20 73 79 6E 74 61 78 20 6F 72 20 69 73 20 69 6E 68 65 72 65 6E 74  ad syntax or is inherent
6C 79 20 69 6D 70 6F 73 73 69 62 6C 65 20 74 6F 20 73 61 74 69 73 66 79  ly impossible to satisfy
2E 0A 3C 48 52 3E 0A 3C 41 44 44 52 45 53 53 3E 3C 41 20 48 52 45 46 3D  ..<HR>.<ADDRESS><A HREF=
22 68 74 74 70 3A 2F 2F 77 77 77 2E 77 70 2E 70 6C 2F 22 3E 61 72 69 73  "http://www.wp.pl/">aris
3C 2F 41 3E 3C 2F 41 44 44 52 45 53 53 3E 0A 3C 2F 42 4F 44 59 3E 3C 2F  </A></ADDRESS>.</BODY></
48 54 4D 4C 3E 0A                                                        HTML>.
while a simple implementation relaying on node.js core http module and http-proxy module outputs this:
An error has occurred: {"bytesParsed":0,"code":"HPE_INVALID_CONSTANT"}
Meanwhile without the proxy parameter the actual response is:
HTTP/1.1 301 Moved Permanently
Server: aris
Location: http://www.wp.pl
Content-type: text/html
Content-Length: 0
Connection: close

Curl forgiving behaviour

As it turns out the actual outgoing HTTP request is different depending on the presence of –-proxy parameter. Without it the target server receives and responds with:
GET / HTTP/1.1
User-Agent: curl/7.26.0
Host: wp.pl
Accept: */*

HTTP/1.1 301 Moved Permanently
Server: aris
Location: http://www.wp.pl
Content-type: text/html
Content-Length: 0
Connection: close
but when the proxy setting is present:
GET http://wp.pl HTTP/1.1
User-Agent: curl/7.26.0
Host: wp.pl
Accept: */*
Proxy-Connection: Keep-Alive

UNKNOWN 400 Bad Request
Server: aris
Content-Type: text/html
Date: Sun, 23 Feb 2014 16:01:36 GMT
Last-Modified: Sun, 23 Feb 2014 16:01:36 GMT
Accept-Ranges: bytes
Connection: close

<HTML><HEAD><TITLE>400 Bad Request</TITLE></HEAD>
<BODY><H2>400 Bad Request</H2>
Your request has bad syntax or is inherently impossible to satisfy.
<HR>
<ADDRESS><A HREF="http://www.wp.pl/">aris</A></ADDRESS>
</BODY></HTML>
As you may have noticed the difference in requests (apart from additional header) is in first line – where in the first case curl assumed we want to GET /. The former uses a relative URI while the later absolute. Now if we change the command line just a little bit so that the address looks like http://wp.pl/ the server will receive request with correct absolute URI and will respond with 301.

UNKNOWN Status Line

Careful readers may have already noticed the Status Line of response returned by server is malformed. According to the spec it should have following form:
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
That's the reason why Fiddler warns users about protocol violation and the reason of error inside node.js proxy.

A less intrusive proxy

An example described above leads to other questions about behaviour of http proxy especially if we think about implementing HTTP debugging tool like proxy-mirror. It’s only a guess but I suspect that there are other subtle differences between requests sent by client and those received by target server when the proxy is implemented using http-proxy module or core http module. I am aware of at least one such case where HTTP Headers are lowercased. If I’m right a debugging tool relaying on them would make it really hard to spot certain problems – like protocol violations on both client and server side.
In my next blog post I’ll describe how to built even less intrusive proxy without building the http parser by hand.