How to scrape the HTTP header? (Redirect URL?)

How to scrape the HTTP header? (Redirect URL?) - excel

I've installed an add-on in my browser which let me see all the HTTP headers that go back and forth from my Firefox and a web server.
This is a simple code I've used:
Set WinHttpReq = CreateObject("MSXML2.ServerXMLHTTP")
WinHttpReq.Open "GET", "Initial_URL.com", False
WinHttpReq.Send
Below is the http header obtained:
https://Initial_URL.com
Host: Initial_URL.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
GET: HTTP/1.1 302 Moved Temporarily
content-length: 2361
content-type: text/html
date: Wed, 09 Jan 2019 19:46:06 GMT
location: Redirected_to.com
Set-Cookie: PD-S-SESSION-ID=1_2_0_ud2NmPnh++62VQCPVkwxb8xp0wuMBDhqmfZiqltbPrgksUAf;
First part of the header is what the HTTP header has sent to the server, and then the answer from the server.
I need to get some of these fields (e.g. location, Set-Cookie).
I've parsed the WinHttpReq.ResponseText, but it is pure HTTP code, no headers at all.
Any guidance?

If you want the header information, you can get that from the getAllResponseHeaders method.
Here is a small demo:
Sub GetHeaders()
Dim headers As String
With CreateObject("MSXML2.ServerXMLHTTP")
.Open "GET", "https://stackoverflow.com/questions/54118535/excel-vba-how-to-scrape-the-http-header-webserver-answer-redirect-url"
.send
headers = .getAllResponseHeaders
Debug.Print headers
End With
End Sub

Related

How can I specify the exact http method with python requests?

In Burp Suite the first line of a captured request is usually GET / HTTP/1.1. However, I am currently practicing Host Header injection using the method of supplying an absolute URL in order to something like this:
GET https://vulnerable-website.com/ HTTP/1.1
Host: bad-stuff-here
In python I am using the requests library and am unable to specify the exact GET request I need.
import requests
burp0_url = "https://vulnerable-website.com:443/"
burp0_cookies = {[redacted]}
burp0_headers = {"Host": "bad-stuff-here", "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "https://vulnerable-website.com/", "Connection": "close", "Upgrade-Insecure-Requests": "1"}
output = requests.get(burp0_url, headers=burp0_headers, cookies=burp0_cookies)
print(output, output.text)
I have tried specifying the GET request in the header dictionary (header = {"GET":" / HTTP/1.1", ...}), however this only results in a GET Header not r
Request on the 6th line being sent:
GET / HTTP/1.1
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
Accept-Encoding: gzip, deflate
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Connection: close
GET: /
Host: bad-stuff-here
Accept-Language: en-US,en;q=0.5
Referer: https://vulnerable-website.com/
Upgrade-Insecure-Requests: 1
Cookie: [redacted]
This is a very specific problem and I'm not sure if anyone has had the same issues but any help is appreciated. Maybe a workaround with urllib or something I'm missing. Thanks.

requests uses urllib3 under the hood.
You have to craft the request yourself because of non of the clients [urlib, requests, http.client] won't allow you to insert a control character by design.
You can use a plain socket for this
msg = 'GET / HTTP/1.1\r\n\r\n'
s = socket.create_connection(("vulnerable-website.com", 80))
with closing(s):
s.send(msg)
buf = ''.join(iter(partial(s.recv, 4096), ''))

Downloading files from protected website with VBA-Excel

I'm trying to download a file from a protected webpage (from my job, so I cannot post the url).
When I save the bytes (I'm using a WinHTTP request), in an .xls, the file appears to be corrupted.
My code (without sensitive information) and a little code taken by "HTTP Header Live".
option explicit
Sub SaveFileFromURL()
Dim FileNum As Long
Dim FileData() As Byte
Dim WHTTP As Object
Dim SetCookieString As String
Dim mainUrl As String
Dim fileUrl As String
Dim SavePath As String
Dim strAuthenticate As String
'this is the url to login, extracted with HTTP Header Live.
'however, the url shown in the browser is: "https://www.website.com/retro/default.asp?idioma=ES"
mainUrl = "https://www.website.com/retro/logincheck.asp"
'this is the url for to download the file, but need to send a cookie as credentials
'(if you login via browser and paste the link, it will popup the saveas dialog, but if you don't login, it will return an internal error)
fileUrl = "https://www.website.com/retro/VerBordero.asp?id=27348&p=3º Trimestre 2019&n=0&m=UNKNOWN&con=CIRCULAR&fmt=xls"
'the path where to save the file with the extension I know it will have.
SavePath = "C:\Users\Victor\Desktop\bordero.xls"
'authetication should be:
strAuthenticate = "txtUser=MYUSER&txtpwd=MYPASS&lg=es"
'I login to catch the cookie that it is suppose to allow me to download the file.
Set WHTTP = CreateObject("WinHTTP.WinHTTPrequest.5.1")
WHTTP.Open "POST", mainUrl, False
WHTTP.Send strAuthenticate
'if it is correct:
If WHTTP.Status = 200 Then
'I get the cookie
SetCookieString = WHTTP.getResponseHeader("Set-Cookie")
'Then you have to GET direct file url
WHTTP.Open "GET", fileUrl, False
WHTTP.setRequestHeader "Content-Type", "application/x-msexcel"
WHTTP.setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
WHTTP.setRequestHeader "Accept-Language", "es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3"
WHTTP.setRequestHeader "Cookie", SetCookieString
WHTTP.Send
'if i get the file
If WHTTP.Status = 200 Then
'download bytes
FileData = WHTTP.responseBody
Set WHTTP = Nothing
'Save the bytes into file
FileNum = FreeFile
Open SavePath For Binary Access Write As #FileNum
Put #FileNum, 1, FileData
Close #FileNum
End If
End If
End Sub
The code runs. I got "OK" in all checked status but when I try to open the file (that have 622kb, the same as if I download it manually), Excel prompts me with a message saying that there is too much data for a cell. If I click on "open anyway", I can read some parts.
It looks like the file is being saved in different encoding.
Some code from HTTP Header Live.
LOGIN:
https://www.website.com/retro/logincheck.asp
Host: www.website.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate, br
Content-Type: application/x-www-form-urlencoded
Content-Length: 38
Origin: https://www.website.com
Connection: keep-alive
Referer: https://www.website.com/retro/default.asp?idioma=ES
Cookie: cc_social=yes; cc_analytics=yes; cc_advertising=yes; cc_necessary=yes; _ga=GA1.2.859443177.1570897852; __utma=136336428.859443177.1570897852.1570897852.1570897852.1; __utmz=136336428.1570897852.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic| utmctr=(not%20provided); ASPSESSIONIDQWSBSQAD=LDIIIENCGKDHMEJEGKJKBDMM; cookieconsent=dismiss; ASPSESSIONIDQWQCRRBD=PCOIENHDNLHCKNODMJLKAFGM
Upgrade-Insecure-Requests: 1
txtUser=MYUSER&txtpwd=MYPASS&lg=es
POST: HTTP/2.0 302 Found
date: Wed, 04 Dec 2019 22:20:17 GMT
server: Microsoft-IIS/8.5
cache-control: private
content-type: text/html
expires: Wed, 04 Dec 2019 22:20:18 GMT
location: retro.asp
content-length: 130
via: 2.0 www.website.com
X-Firefox-Spdy: h2
FILE:
https://www.website.com/retro/VerBorderoGRxls.asp?id=27348&p=3º Trimestre 2019&n=0&m=UNKNOWN&con=CIRCULAR&fmt=xls
Host: www.website.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Referer: https://www.website.com/retro/borderos_resumen.asp
Cookie: cc_social=yes; cc_analytics=yes; cc_advertising=yes; cc_necessary=yes; _ga=GA1.2.859443177.1570897852; __utma=136336428.859443177.1570897852.1570897852.1570897852.1; __utmz=136336428.1570897852.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic| utmctr=(not%20provided); ASPSESSIONIDQWSBSQAD=LDIIIENCGKDHMEJEGKJKBDMM; cookieconsent=dismiss; ASPSESSIONIDQWQCRRBD=PCOIENHDNLHCKNODMJLKAFGM
Upgrade-Insecure-Requests: 1
GET: HTTP/2.0 200 OK
date: Wed, 04 Dec 2019 22:21:40 GMT
server: Microsoft-IIS/8.5
cache-control: private
content-type: application/x-msexcel
expires: Wed, 04 Dec 2019 22:21:40 GMT
content-length: 637440
via: 2.0 www.website.com
X-Firefox-Spdy: h2

Python POST request to retrieve base64 encode File

Im trying to POST request using Python to retreive a specific File. Since the URL is behind a server with authorized access theres no use posting it here
However the form data contains a field called base64 and lengthy which I cant figure out if its a form data value or base64 encoding of post request
Here are browser parameters
General:
Request URL: http://exampleapi.com/api/Document/Export
Request Method: POST
Status Code: 200 OK
Remote Address: XX.XXX.XXX.XX:XX
Referrer Policy: no-referrer-when-downgrade
Response Headers:
Access-Control-Allow-Origin: http://example.com
Cache-Control: no-cache
Content-Disposition: attachment; filename=location-downloads.xlsx
Content-Length: 7148
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Date: Tue, 23 Jul 2019 21:00:18 GMT
Expires: -1
Pragma: no-cache
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Request Headers :
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 10162
Content-Type: application/x-www-form-urlencoded
Cookie: abcConnection=!UA7tkC3iZCmVNGRUyRpDWARVBWk/lY6SZvgxLlaygsQKk+vuwA1NxvhwE9ph4i+3NZlKeepIfuHhUvyQjl68fhhrT9ueqMx/3mBKUDcT
DNT: 1
Host: exampleapi.com
Origin: http://example.com
Referer: http://example.com/
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36
Form Data:
fileName: location-downloads.xlsx
contentType: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
base64: UEsDBAoAAAAAAAh4904AAAAAAAAAAAAAAAAJAAAAZG9jUHJvcHMvUEsDBAoAAAAIAAh490(shortened for simplicity)
Here is what I tried
url='http://example.com'
urllib3.disable_warnings()
headers = {
"Content-Type": "application/x-www-form-urlencoded",
"User-Agent": "Mozilla/5.0",
}
with requests.session() as s:
r=s.get(url,headers={"User-Agent":"Mozilla/5.0"},verify=False)
data=r.content
soup=BeautifulSoup(data,'html.parser')
form_data = {
"fileName":"location-downloads.xlsx",
"contentType":"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
}
r2=s.post('http://exampleapi.com/api/Document/Export',data=json.dumps(form_data,ensure_ascii=True).encode('utf-8'),headers=headers,verify=False)
print(r2.status_code)
Any idea how i should proceed. My status code also shows 500 for the post here

Expressjs Route contains weird characters

What could possibly be the reason for expressjs route to contain the following data? I am expecting it to return JSON data. I am making an ajax call to the server(expressjs) which gives me the below data with weird characters. Is this data gzipped? I have set the headers and contentType as follows:
headers: {"Access-Control-Allow-Origin":"*"}
contentType: 'application/json; charset=utf-8'
�=O�0�b��K�)�%7�܈9���G��%NOU���O'6��k�~6��S.���,��/�wأ%6�K�)��e�
The HTTP response is as follows:
General:
Request URL: http://localhost/expressRoute.js
Request Method: GET
Status Code: 200 OK
Remote Address: [::1]:80
Referrer Policy: no-referrer-when-downgrade
Response Headers:
Accept-Ranges: bytes
Connection: Keep-Alive
Content-Length: 29396
Content-Type: application/javascript
Date: Thu, 22 Nov 2018 00:50:36 GMT
ETag: "72d4-57b124e0c372e"
Keep-Alive: timeout=5, max=100
Last-Modified: Tue, 20 Nov 2018 05:57:12 GMT
Server: Apache/2.4.34 (Win32) OpenSSL/1.1.0i PHP/7.2.10
Request Headers:
Accept: */*
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cache-Control: no-cache
Connection: keep-alive
Host: localhost
Pragma: no-cache
Referer: http://localhost/index.html
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36

Access to a web page via a robot

I need to occasionally access an HTML page to update a database. This page is easily accessible via a web browser, but when I try to access it via a node.js application it doesn't work (the website detect that the request is made by a robot). However,
The robot request contains the same headers (including the
user-agent) that the web browser request.
The robot request doesn't contains referer header or cookie header, but the browser request either.
The IP of the robot is the same that the IP that I use
to browse the website.
In my eyes the robot request and the browser request are strictly identical. Nevertheless they are processed differently.
I'm running out of ideas... Maybe the request contains metadata like "this request was sent by node.js" but it would be really weird.
EDIT, here is a code sample :
// callback (error, responseContent)
function getPage (callback){
let options = {
protocol : 'https:',
hostname : 'xxx.yyy.fr',
port : 443,
path : '/abc/def',
agent : false,
headers : {
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding' : 'gzip, deflate, br',
'Accept-Language' : 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3',
'Cache-Control' : 'no-cache',
'Connection' : 'keep-alive',
'DNT' : '1',
'Host' : 'ooshop.carrefour.fr',
'Pragma' : 'no-cache',
'Upgrade-Insecure-Requests' : '1',
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'
}
};
https.get (options, function (res){
if (res.statusCode !== 200){
res.resume ();
callback ('Error : res code != 200, res code = ' + res.statusCode);
return;
}
res.setEncoding ('utf-8');
let content = '';
res.on ('data', chunk => content += chunk);
res.on ('end', () => callback (null, content));
}).on ('error', e => callback (e));
}
EDIT : here is a comparison of the requests/responses :
Mozilla Firefox
request headers :
GET /3274080001005/eau-de-source-cristaline HTTP/1.1
Host: ooshop.carrefour.fr
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate, br
DNT: 1
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache
response headers :
HTTP/2.0 200 OK
date: Wed, 11 Jul 2018 21:25:25 GMT
server: Unknown
content-type: text/html; charset=UTF-8
age: 0
x-varnish-cache: MISS
accept-ranges: bytes
set-cookie: visid_incap_1213048=G8a0mWzmQYi0GKuT2Ht7YeQ9QVsAAAAAQkIPAAAAAADvVZnsZHK18dQQxHakBprg; expires=Thu, 11 Jul 2019 11:17:56 GMT; path=/; Domain=.carrefour.fr
incap_ses_466_1213048=/2NKHS4HXU0T7FpkwpJ3BsV1RlsAAAAAAY3wbUkXacAceu2NkgUrhw==; path=/; Domain=.carrefour.fr
x-iinfo: 7-11020186-11020187 NNNN CT(1 2 0) RT(1531344324722 0) q(0 0 0 0) r(4 4) U12
x-cdn: Incapsula
content-encoding: gzip
X-Firefox-Spdy: h2
response content : expected HTML page
Node.js robot
request headers :
Accept : text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding : gzip, deflate, br
Accept-Language : fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Cache-Control : no-cache
Connection : keep-alive
DNT : 1
Host : ooshop.carrefour.fr
Pragma : no-cache
Upgrade-Insecure-Requests : 1
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0
response headers :
content-type : text/html
connection : close, close
cache-control : no-cache
content-length : 210
x-iinfo : 1-17862634-0 0NNN RT(1531344295049 65) q(0 -1 -1 0) r(0 -1) B10(4,314,0) U19
set-cookie : incap_ses_466_1213048=j34jMBWkPFYT7FpkwpJ3Bqd1RlsAAAAAVBfoZBShAvoun/M8UFxPPA==; path=/; Domain=.carrefour.fr
response content :
<html>
<head>
<META NAME="robots" CONTENT="noindex,nofollow">
<script src="/_Incapsula_Resource?SWJIYLWA=5074a744e2e3d891814e9a2dace20bd4,719d34d31c8e3a6e6fffd425f7e032f3">
</script>
<body>
</body></html>

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to scrape the HTTP header? (Redirect URL?) - excel

Related

How can I specify the exact http method with python requests?

Downloading files from protected website with VBA-Excel

Python POST request to retrieve base64 encode File

Expressjs Route contains weird characters

Access to a web page via a robot

Categories

Resources