Self-Coded Proxy cannot retrieve image from wikipedia - c#-4.0

I'm trying to write a small proxy server in c#. It is working nicely for many webpages I tested (including google.com and microsoft.com). For testing I started my proxy server and configured IE 10 on Windows 8 to use it.
But when I try wikipedia.org it does only load the main page but no pictures. I tried to load a single picture (http://upload.wikimedia.org/wikipedia/commons/6/63/Wikipedia-logo.png). When I use IE without proxy it works, but with the proxy I get a 404 response.
This is the GET Request which IE (my proxy just forwards it) issues:
GET http://upload.wikimedia.org/wikipedia/commons/6/63/Wikipedia-logo.png HTTP/1.1
Accept: text/html, application/xhtml+xml, */*\
Accept-Language: de-CH\
User-Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)
Accept-Encoding: gzip, deflate
Host: upload.wikimedia.org
DNT: 1
Proxy-Connection: Keep-Alive
IMHO it looks correct. This is the response I get (omited some html tags):
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
X-Varnish: 1427845074 1427806476, 274786836, 3671934588
Via: 1.1 varnish, 1.1 varnish, 1.1 varnish
Content-Length: 262
Accept-Ranges: bytes
Date: Mon, 01 Jul 2013 21:30:54 GMT
Age: 28
Connection: keep-alive
X-Cache: cp1063 hit (1), cp3004 miss (0), cp3003 frontend miss (0)
Access-Control-Allow-Origin: *
...404 Not Found\n The resource could not be found.\nRegexp failed to match URI: "http:/upload.wikimedia.org/wikipedia/commons/6/63/Wikipedia-logo.png"
The strange part is here:
Regexp failed to match URI: "http:/upload.wikimedia.org/wikipedia/commons/6/63/Wikipedia-logo.png"
-> the URL starts with a http:/
In the code I connect to uploads.wikimedia.org like this:
// connect to uploads.wikimedia.org
ServerSocket.Connect(RemoteHost, 80);
byte[] SendBuffer = Request.ToArray();
// send the clients request to the server
ServerSocket.Send(SendBuffer);
I have no idea why it doesn't work. Any help is appreciated. My full code is located on Github: Proxy_C_Sharp

I just found out why.
According to the HTTP/1.1 specification (http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5) in Chapter 5.2.1:
"To allow for transition to absoluteURIs in all requests in future versions of HTTP, all HTTP/1.1 servers MUST accept the absoluteURI form in requests, even though HTTP/1.1 clients will only generate them in requests to proxies."
I tried it out with a small tool. if I make a request like this:
GET /wikipedia/commons/6/63/Wikipedia-logo.png HTTP/1.1
Host: upload.wikimedia.org
It works. So the reason is that Wikipedia is not conform to the standard. It should accept absolute urls. But it works if I visit the site without a proxy because the browser uses absolute URIs only with proxies. If there is no proxy configured it uses a relative one.

Related

Websocket Connection Failed

Im trying to create a websocket server using net in Node.js. In the chrome console im getting an error that simply says "WebSocket Connection Failed!" and dosent show an error code or any other details. As far as i can tell ive don the handshake correctly, but the connection still fails anyway and im not certain why.
Heres the HTTP request my client sent (via WebSocket API) -
GET /chat HTTP/1.1
Host: localhost:3000
Connection: Upgrade
Pragma: no-cache
Cache-Control: no-cache
User-Agent: Mozilla/5.0 (Linux; Android 7.0; AGS-L03) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.91 Safari/537.36
Upgrade: websocket
Origin: http://localhost:3000
Sec-WebSocket-Version: 13
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9,es;q=0.8,ar;q=0.7,hy;q=0.6,mi;q=0.5
Sec-WebSocket-Key: shhs88pGIFyzpQgczYc3uw==
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
And heres the response my server sends back. Ive followed each step correctly according to the docs.
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: RDoXJ7C3/oDQLkUyHc2BFYdevY8=
I know there isnt anything wrong with the client code, since the WebSocket API handles the nitty gritty for me. There must be somthing im missing with the handshake on the servers side but, i still dont see what. I apreciate if anyone can point anything out i may have missed.
All fixed. According to the specs, each HTTP header must be folowed by a line termination (\r\n). My mistake was that in didn't add a double line break, which is supposed to come after the headers.

Upgrading to Azure (Microsoft) websocket not acknowledged

Due to constraints of current project, I am having to write the WebSocket protocol by hand in C++. I am able to get the authorization key, but when I try to upgrade the next socket connection, the server stalls after getting a completed MIME header. Then when I send anything after it, I get a 400 error. I do not get an acknowledgement from the server that the connection has been upgraded to a WebSocket. Here is a dump:
=========================================================================
POST /sts/v1.0/issueToken HTTP/1.1
Accept: */*
Connection: Keep-Alive
Content-Length: 0
Content-Type: application/x-www-form-urlencoded
Host: api.cognitive.microsoft.com
Ocp-Apim-Subscription-Key: 21cedc8aaab847369294240b2122b08d
Origin: https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapi.cognitive.microsoft.com&data=04%7C01%7Cv-lufil%40microsoft.com%7C427ffe760b7a4f6ffe6a08d4d8fa0613%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636372015735060988%7CUnknown%7CVW5rbm93bnx7IlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiT3RoZXIifQ%3D%3D%7C-1&sdata=823HpmiJeZ54tzq6CpX86ZS8B0yUiOYSNMXvrmDSunA%3D&reserved=0
User-Agent: Gideon/0.0.1
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 495
Content-Type: application/jwt; charset=us-ascii
Expires: -1
Server: Microsoft-IIS/8.5 Microsoft-HTTPAPI/2.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
apim-request-id: 00fe24bc-ba53-4d91-9363-ea7fddfe2a5a
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Operation-Location
Date: Tue, 01 Aug 2017 15:21:40 GMT
eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzY29wZSI6Imh0dHBzOi8vc3BlZWNoLnBsYXRmb3JtLmJpbmcuY29tIiwic3Vic2NyaXB0aW9uLWlkIjoiZmMwOGVlNGM5ZmNkNGI0MWFmNTZiNzJmZDliZTE4ZWEiLCJwcm9kdWN0LWlkIjoiQmluZy5TcGVlY2guUHJldmlldyIsImNvZ25pdGl2ZS1zZXJ2aWNlcy1lbmRwb2ludCI6Imh0dHBzOi8vYXBpLmNvZ25pdGl2ZS5taWNyb3NvZnQuY29tL2ludGVybmFsL3YxLjAvIiwiYXp1cmUtcmVzb3VyY2UtaWQiOiIiLCJpc3MiOiJ1cm46bXMuY29nbml0aXZlc2VydmljZXMiLCJhdWQiOiJ1cm46bXMuc3BlZWNoIiwiZXhwIjoxNTAxNjAxNDk5fQ.2RQhid_B45fN5M2BmUlodhIe4Xxx71Ws1b03JylERUw
=========================================================================
POST /speech/recognition/dictation/cognitiveservices/v1?language=en-US HTTP/1.1
Accept: */*
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzY29wZSI6Imh0dHBzOi8vc3BlZWNoLnBsYXRmb3JtLmJpbmcuY29tIiwic3Vic2NyaXB0aW9uLWlkIjoiZmMwOGVlNGM5ZmNkNGI0MWFmNTZiNzJmZDliZTE4ZWEiLCJwcm9kdWN0LWlkIjoiQmluZy5TcGVlY2guUHJldmlldyIsImNvZ25pdGl2ZS1zZXJ2aWNlcy1lbmRwb2ludCI6Imh0dHBzOi8vYXBpLmNvZ25pdGl2ZS5taWNyb3NvZnQuY29tL2ludGVybmFsL3YxLjAvIiwiYXp1cmUtcmVzb3VyY2UtaWQiOiIiLCJpc3MiOiJ1cm46bXMuY29nbml0aXZlc2VydmljZXMiLCJhdWQiOiJ1cm46bXMuc3BlZWNoIiwiZXhwIjoxNTAxNjAxNDk5fQ.2RQhid_B45fN5M2BmUlodhIe4Xxx71Ws1b03JylERUw
Connection: upgrade
Content-Length: 8002
Content-Type: audio/wav; codec=audio/pcm; samplerate=16000
Host: speech.platform.bing.com
Path: audio
Sec-WebSocket-Key: Z2lkZW9ucm9ja3MK
Transfer-Encoding: chunked
Upgrade: websocket
User-Agent: Gideon/0.0.1
X-RequestId: 21cedc8aaab847369294240b2122b08d
X-Timestamp: 2017-08-01T15:21:40
HTTP/1.1 400 Bad Request
Exception: 4xx Client failure
Note that the server does not reply despite getting two "\r\n" to indicate an end of MIME header. When I send anything afterwards I get a 400 error.
According to your dump logs, it seems that you want to use Microsoft's Speech Service to convert the speech to text.
By using Microsoft's Speech Service,the 400 error means you don't have applied all the required parameters and HTTP headers and that the values are correct.
I found your request missed the X-ConnectionId in your request.
According to this article:
The Microsoft Speech Service requires that all clients include a unique id to identify the connection. Clients must include the X-ConnectionId header when starting a web socket handshake. The X-ConnectionId header value must be a universally unique identifier. Web socket upgrade requests that do not include the X-ConnectionId, that do not specify a value for the X-ConnectionId header, or that do not include a valid universally unique identifier value will be rejected by the service with a 400 Bad Request response.
So I suggest you could add the identify id and test again.

Uploading to another domain gives HTTP code 405

I'm trying to upload a file (which can be quite large) from the website of one server to the backend of another server using plupload. Lets say:
domain 1 = http://www.websitedomain.com/uploadform
domain 2 = http://www.backenddomain.com/uploadhandler
Trying to upload i send the following:
OPTIONS /main/uploadnetwork.php HTTP/1.1
Host: backenddomain.com
Connection: keep-alive
Access-Control-Request-Method: POST
Origin: http://www.websitedomain.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/537.4
Access-Control-Request-Headers: origin, content-type
Accept: */*
Referer: http://www.websitedomain.com/uploadform
Accept-Encoding: gzip,deflate,sdch
Accept-Language: nl-NL,nl;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
DNT: 1
But when I try to start the upload the server returns the following:
HTTP/1.1 405 Method Not Allowed
Allow: GET, HEAD, OPTIONS, TRACE
Content-Type: text/html
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
X-Powered-By-Plesk: PleskWin
Date: Mon, 01 Oct 2012 12:41:57 GMT
Content-Length: 999
After doing some research I found out that a browser does this to check if the server will accept the intended message. It looks like my server doesn't feel like accepting a simple POST call even tho i use post all the time.
The Google Chrome console gives the following error:
XMLHttpRequest cannot load http://www.backenddomain.com/uploadhandler. Origin http://www.websitedomain.com is not allowed by Access-Control-Allow-Origin.
Does anyone know how to stop the browser from checking or how i can tell my server to just accept the POST?
You seem to face a Same origin policy problem
Adding a special header should help on some browsers :
http://en.wikipedia.org/wiki/Cross-origin_resource_sharing
Answers to this question might also be helpfull :
Cross-domain data access in JavaScript
You should also check the cross-domain tag : https://stackoverflow.com/questions/tagged/cross-domain

How are cookies sent to a website

After you enter your name and password on a website, a cookie is stored on your computer. Your computer then sends information from that cookie to the website whenever you browse to another page on that site so that the site knows who you are.
How is information from the cookie sent? Does the browser append information from within the cookie to the html address?
The browser sends a HTTP (http://www.w3.org/Protocols/rfc2616/rfc2616.html) request, which includes the URL, the request method (GET, POST etc), cookies and a whole bunch of other stuff. Here's the request from my browser to this SO page:
GET /questions/2575970/how-are-cookies-sent-to-a-website HTTP/1.1
Host: stackoverflow.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: __utma=140021253.1463780230058740000.12348924611.1279210754.1270438283.1398; __utmz=140222553.12686423964.1149.21...
If-Modified-Since: Sun, 04 Apr 2010 21:30:58 GMT
Note that the cookie doesn't normally contain the user name, just a index to a lookup table that's stored server-side.

Browser Cache Control, Dynamic Content

Problem: I can't seem to get FireFox to cache images sent from a dynamic server
Setup: Static Apache Server with reverse proxy to a dynamic server (mod_perl2) at backend.
Here is the request URL for the server. It is sent to the the dynamic server, where the cookie is used to validate access to the image:
Request Headers
Host: <OBSCURED>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.15) Gecko/2009102815 Ubuntu/9.04 (jaunty) Firefox/3.0.15
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: <OBSCURED>
Cookie: pz_cred=4KCNr0RM15%2FJCOt%2BEa6%2BL62z%2Fxvbp2xNQHY5pJw5d6Q
Pragma: no-cache
Cache-Control: no-cache
The dynamic server streams the image back to the server, and provides the following response:
Response Headers
Date: Tue, 24 Nov 2009 04:28:07 GMT
Server: Apache/2.2.11 (Ubuntu) mod_apreq2-20051231/2.6.0 mod_perl/2.0.4 Perl/v5.10.0
Cache-Control: public, max-age=31536000
Content-Length: 25496
Content-Type: image/jpeg
Via: 1.1 127.0.1.1:8081
Keep-Alive: timeout=15, max=75
Connection: Keep-Alive
So far, so good (me thinks). However, on reload of the page, the image does not appear cached, and a request is again sent:
Request Headers
Host: <OBSCURED>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.15) Gecko/2009102815 Ubuntu/9.04 (jaunty) Firefox/3.0.15
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: <OBSCURED>
Cookie: pz_cred=4KCNr0RM15%2FJCOt%2BEa6%2BL62z%2Fxvbp2xNQHY5pJw5d6Q
Cache-Control: max-age=0
It doesn't seem that request should happen as the browser should have cached the image. As it is, a 200 response is received, same as the first, and the image appears to be re-fetched (although the browser does appear to be using the cached images).
The problem appears to be hinted at by the Cache-Control: max-age=0 in the reload request header, above.
Does anyone know why this is happening? Perhaps it is the Via header in the response that is causing the problem?
The original request has
Cache-Control: no-cache
which tells all the intermediate HTTP caches (including Firefox's) that you don't want to use a cached response, you want to get the response from the origin web server itself.
The response says:
Cache-Control: public, max-age=31536000
which tells everyone that as far as the origin server is concerned, the response may be cached. The server seems to be configured to enable the PNG image to be cached: HTTP 1.1 (section 14.21) says:
Note: if a response includes a
Cache-Control field with the max-age
directive (see section 14.9.3), that
directive overrides the Expires field.
Your second request says:
Cache-Control: max-age=0
which tells all the intermediate HTTP caches that you won't take any cached response older than 0 seconds.
One thing to watch out for: if you hit the Reload button in Firefox, you are asking to reload from the origin web server. To test the caching of the image, navigate away from the page and back, or open it up in a new tab. Not sure why you saw no-cache the first time and max-age=0 the second though.
BTW, I like the FireBug plug-in for Firefox. You can take a look at the request and response headers with it and all sorts of other good stuff.
My previous answer was only partially correct.
The problem is the way FireFox 3 handles reload events. Apparently, it almost always requests content again from the origin server. Thus the Cache-Control: max-age=0 request header.
Firefox does use cached images to render a page on reload, but then it still makes all the requests to update them "in the background". It then replace them as they come in.
Therefore, the page renders fast, YSlow reports cached content. But the server is still getting nailed.
The resolution is to interrogate the incoming headers in the dynamic server script and determine if a 'If-Modified-Since' header is provided. If this is the case, and it is determined the content has not changed, an HTTP_NOT_MODIFIED (304) response is returned.
This is not optimal -- I'd rather Firefox not make the requests at all -- but it cuts the page load time in half, and greatly reduces bandwidth. Given the way Firefox works on reload, this appears the best solution.
Other Comments: Jim Ferran's point about navigating away from page and returning has merit -- the cache is always used, and no requests are outgoing (+1 to Jim). Also, content that is dynamically added (e.g. AJAX calls after the initial load) appear to use the cache as well.
Hope this helps someone besides me :)
Looks like solved it:
Removed the proxy via header
Added a Last-Modified header
Added a far-future expires date
Firebug still shows 200 responses from the origin server, however, YSlow recognizes the images as cached. According to YSlow, total image download size when fresh is greater than 500K; with the cache primed, it shows 0K download size.
Here is the response header from the Origin server which does the trick:
Date: Tue, 24 Nov 2009 08:54:24 GMT
Server: Apache/2.2.11 (Ubuntu) mod_apreq2-20051231/2.6.0 mod_perl/2.0.4 Perl/v5.10.0
Last-Modified: Sun, 22 Nov 2009 07:28:25 GMT
Expires: Tue, 30 Nov 2010 19:00:25 GMT
Content-Length: 10883
Content-Type: image/jpeg
Keep-Alive: timeout=15, max=89
Connection: Keep-Alive
Because of the way I'm requesting the images, it really should not matter if these dates are static; my app knows the last mod time before requesting the image and appends this to the request URL on the client side to create a unique URL for each image version, e.g. http://myserver.com/img/125.jpg?20091122 (the info comes from a AJAX JSON feed). I could, for example, make the last modified date 01 Jan 2000, and the Expires date sometime in the year 2050.
If YSlow is correct -- and performance testing implies it is -- then FireBug should really report these local cache hits instead of a 200 response.

Resources