I was wondering if there was any way to re-order HTTP headers that are being sent by our browser, before getting sent back to the web server?
Since the order of the headers leaves some kind of "fingerprinting", see this post and this post, I was thinking about using MITMProxy (with Inline Scripting, I guess) to modify headers on-the-fly. Is this possible?
How would one achieve that?
Note: I'm looking for a method that could be scripted, not a method using a graphical tool like the Burp Suite (although Burp is known to be able to re-order headers)
I'm open to suggestions. Perhaps NGINX might come to the rescue as well?
EDIT: I should be more specific, by giving an example...
Let's say I'm using Firefox. With the use of a funky add-on, I'm spoofing my user-agent to "look" like a Chrome browser. But then if I test my browser with ip-check.info, the "signature" of my browser remains the one of Firefox, even though my spoofed user-agent shows "Chrome".
So the solution, in this specific case, should be to re-order the HTTP headers in the same manner as Chrome does.
How can this be done?
For the record, the order of the HTTP headers should not matter at all according to RFC 7230. But now that you have asked... this can be done in mitmproxy as follows:
import random
def request(context, flow):
# flow.request.headers.fields is a tuple of (name, value) header tuples.
h = list(flow.request.headers.fields)
random.shuffle(h)
flow.request.headers.fields = tuple(h)
See the mitmproxy documentation on netlib.http.Headers for more details.
There are tons of way to reorder them as you wish:
def reorder(headers, header_order=["Host","User-Agent","Accept"]):
lines = []
for name in header_order: # add existing headers in the specified order
if name in headers:
lines.extend(headers.get_all(name))
del headers[name]
lines.extend(headers.fields) # all other headers
return lines
request.headers.fields = reorder(request.headers)
Related
I'm trying to use python's requests library to log in to a website. It's a pretty simple code, and you can really get the gist of requests just by going on its website. I, however, want to check if I'm successfully logged in via the url. The problem I've encountered is when I initiate the post requests and give it (the variable p) a url, whether the html has changed or not I'm still passed the same url when I type print(p.url). Is there any way for me to refresh the browser or update the url to whatever it's currently set at?
(I can add a line for checking the url against itself later, but for now I just want to get the correct url)
#!usr/bin/env python3
import requests
payload = {'login': 'USERNAME,
'password': 'PASSWORD'}
with requests.Session() as s:
p = s.post('WEBSITE', data=payload)
#print p.text
print(p.url)
The usuage of python-requests may not as complex as you think. It will automatically handle the redirect of your post ( or session.get()).
Here, session.post() method return a response object:
r = s.post('website', data=payload)
which means r.url is current url you are looking for.
If you still want to refresh current page, just use:
s.get(r.url)
To verify whether you has login successfully, one solution is to do the login in your browser.
Based on the title or content of the webpage returned (i.e, use the content in r.text), you can judge whether you have made it.
BTW, python-requests is a great library, enjoy it.
I've been scratching my head the whole day yesterday about this and to my surprise, can't seem to find an easy way to check this.
I am using Python's Requests library to pass my proxy such as:
def make_request(url):
with requests.Session() as s:
s.mount("http://", HTTPAdapter(max_retries=3))
s.mount("https://", HTTPAdapter(max_retries=3))
page = None
d.rotate(-1) #d contains a dict of my proxies. this allows to rotate through the proxies everytime make_request is called.
s.proxies = d[0]
page = s.get(url, timeout=3)
print('proxy used: ' + str(d[0]))
return page.content
Problem is, I can't seem to make the request fail when the proxy is not expected to work. It seems there is always a fallback on my internet ip if the proxy is not working.
For example: I tried passing a random proxy ip like 101.101.101.101:8800 or removing the ip authentication that is needed on my proxies, the request is still passed, even though it should'nt.
I thought adding the timeout parameters when passing the request would do the trick, but obviously it didn't.
So
Why does this happen?
How can I check from which ip a request is being made?
From what I have seen so far, you should use the form
s.get(url, proxies = d)
This should use the proxies in the dict d to make a connection.
This form allowed me to check with working proxies and non-working proxies the status_code
print(s.status_code)
I will update once I find out whether it just circulates over the proxies in the dict to match a working one, or one is able to actually select which one to be used.
[UPDATE]
Tried to work around the dict in proxies, to use different proxy if I wanted to. However, proxies must be a dict to work. So I used a dict in the form of:
d = {"https" : 'https://' + str(proxy_ips[n].strip('\n'))}
This seems to work and allow me to use an ip I want to. Although it seems quite dull, I hope someone might come and help!
The proxies used can be seen through:
requests.utils.getproxies()
or
requests.utils.get_environ_proxies(url)
I hope that helps, obviously quite old question, but still!
I'm developing a web crawler in nodejs. I've created a unique list of the urls in the website crawle body. But some of them have extensions like jpg,mp3, mpeg ... I want to avoid crawling those who have extensions. Is there any simple way to do that?
Two options stick out.
1) Use path to check every URL
As stated in comments, you can use path.extname to check for a file extension. Thus, this:
var test = "http://example.com/images/banner.jpg"
path.extname(test); // '.jpg'
This would work, but this feels like you'll wind up having to create a list of file types you can crawl or you must avoid. That's work.
Side note -- be careful using path. Typically, url is your best tool for parsing links because path is aimed at files/directories, not urls. On some systems (Windows), using path to manipulate a url can result in drama because of the slashes involved. Fair warning!
2) Get the HEAD for each link & see if content-type is set to text/html
You may have reasons to avoid making more network calls. If so, this isn't an option. But if it is OK to make additional calls, you could grab the HEAD for each link and check the MIME type stored in content-type.
Something like this:
var headersOptions = {
method: "HEAD",
host: "http://example.com",
path: "/articles/content.html"
};
var req = http.request(headersOptions, function (res) {
// you will probably need to also do things like check
// HTTP status codes so you handle 404s, 301s, and so on
if (res.headers['content-type'].indexOf("text/html") > -1) {
// do something like queue the link up to be crawled
// or parse the link or put it in a database or whatever
}
});
req.end();
One benefit is that you only grab the HEAD, so even if the file is a gigantic video or something, it won't clog things up. You get the HEAD, see the content-type is a video or whatever, then move along because you aren't interested in that type.
Second, you don't have to keep track of file names because you're using a standard MIME type to differentiate html from other data formats.
I'm very new to LINUX working with node.js. Its just my 2nd day. I use node-curl for curl request. In the link below I have found example with Get request. Can anybody provide me a Post request example using node-curl.
https://github.com/jiangmiao/node-curl/blob/master/examples/low-level.js
You need to use setopt in order to specify POST options for a cURL request. The options you should start looking at first are CURLOPT_POST and CURLOPT_POSTFIELDS. From the libcurl documentation linked from node-curl:
CURLOPT_POST
A parameter set to 1 tells the library to do a regular HTTP post. This will also make the library use a "Content-Type: application/x-www-form-urlencoded" header. (This is by far the most commonly used POST method).
Use one of CURLOPT_POSTFIELDS or CURLOPT_COPYPOSTFIELDS options to specify what data to post and CURLOPT_POSTFIELDSIZE or CURLOPT_POSTFIELDSIZE_LARGE to set the data size.
Optionally, you can provide data to POST using the CURLOPT_READFUNCTION and CURLOPT_READDATA options but then you must make sure to not set CURLOPT_POSTFIELDS to anything but NULL. When providing data with a callback, you must transmit it using chunked transfer-encoding or you must set the size of the data with the CURLOPT_POSTFIELDSIZE or CURLOPT_POSTFIELDSIZE_LARGE option. To enable chunked encoding, you simply pass in the appropriate Transfer-Encoding header, see the post-callback.c example.
CURLOPT_POSTFIELDS
... [this] should be the full data to post in a HTTP POST operation. You must make sure that the data is formatted the way you want the server to receive it. libcurl will not convert or encode it for you. Most web servers will assume this data to be url-encoded.
This POST is a normal application/x-www-form-urlencoded kind (and libcurl will set that Content-Type by default when this option is used), which is the most commonly used one by HTML forms. See also the CURLOPT_POST. Using CURLOPT_POSTFIELDS implies CURLOPT_POST.
If you want to do a zero-byte POST, you need to set CURLOPT_POSTFIELDSIZE explicitly to zero, as simply setting CURLOPT_POSTFIELDS to NULL or "" just effectively disables the sending of the specified string. libcurl will instead assume that you'll send the POST data using the read callback!
With that information, you should be able add the following options to the low-level example to have it make a POST request:
var fieldsStr = '{}';
curl.setopt('CURLOPT_POST', 1); // true?
curl.setopt('CURLOPT_POSTFIELDS', fieldsStr);
You will need to tweak the contents of fieldsStr to match the format the server is expecting. Per the documentation you may also need to url-encode the data - which should be as simple as using encodeURIComponent according to this post.
I'm accessing GMail via IMAP using OAuth2 authentication and Zend_Mail_Protocol_Imap.
It all works great.
What I need to do is present emails in thread form just like the GMail interface. Google make this really easy because they have an X-GM-THRID header that links a conversation with a 64-bit unsigned integer.
My problem is: when presented with a single email, how do I find out what X-GM-THRID it belongs to?
First off Google says that there is a server extension X-GM-EXT-1 which is active. You can check it is there using the CAPABILITY command (and I have).
All the information suggests that if this is active then the X-GM-THRID will simply be returned as a header, but it isn't.
Perhaps I need to ask Google to return it via the fetch command. Google does describe a simple fetch process here:
https://developers.google.com/google-apps/gmail/imap_extensions
My code is sending TAG5 FETCH 3673 (FLAGS RFC822.HEADER X-GM-THRID) but the headers do not include an entry for X-GM-THRID.
I've even simplified it to TAG6 FETCH 3673 (X-GM-THRID) to be exactly as described in the google example. In this case no headers are returned.
I'm not massively familiar with IMAP commands and I'm not sure if Zend_Mail_Protocol_Imap is abstracting some handling which means this header is being removed.
But I do know that this is driving me mad.
Am I missing something? Is it not a header?
Okay, so it looks like it is not a header. It is an attribute in the IMAP command and response.
The standard fetch command sent by Zend_Mail_Protocol_Imap is "TAG5 FETCH 3673 (FLAGS RFC822.HEADER)"
The code that handles the response only expects to be dealing with 'FLAGS' and 'RFC822.HEADER'. It passes this information to a Zend_Mail_Message object which extends Zend_Mail_Part.
Zend_Mail_Part parses information about flag. It also parses the header.
The additional 'X-GM-THRID' attribute that I added does actually get a response. but since it is not passed back to Zend_Mail_Message there is no way for me to use it. It gets lost in the ether (at around line 171 of Zend_Mail_Storage_Imap in my Zend Library to be exact).
So I've hacked the core... Zend_Mail_Storage_Imap::getMessage now expects $data['X-GM-THRID'] and passes it to the constructor Zend_Mail_Part. And I now have a method Zend_Mail_Part::getXGmThrid which solves all my problems. I'll obviously refactor them into my own classes extending Zend_Mail_Storage_Imap and Zend_Mail_Part in the not too distant... but for now I know this works.