I am dealing with server, which is not accepting uncapitalized headers and unfortunately I can't do much with it.
var headers = {};
headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36';
headers['Accept'] = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8';
headers['Connection'] = 'keep-alive';
headers['Cache-Control'] = 'max-age=0';
headers['Upgrade-Insecure-Requests'] = '1';
headers['Accept-Encoding'] = 'gzip, deflate';
headers['Accept-Language'] = 'en-US,en;q=0.9,ru;q=0.8,hy;q=0.7';
request.post({url: 'http://10.10.10.10/login', headers: headers, ...
this in fact sending out the following
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,ru;q=0.8,hy;q=0.7
DNT: 1
host: 10.10.10.10
cookie: vvv=765936875155218941
cookie and host are lower cased. How can I alter request, to send out capitalized headers?
This is not Node.js issue but a supposed issue with particular library, request. In fact, not an issue at all because HTTP headers are case-insensitive. request uses caseless package to enforce lower-cased headers, so it's expected that user headers will be lower-case if consistency is required.
These headers may be left as is, as they should be handled correctly by remote server according to the specs.
It may be necessary to specific header case if a request is supposed to mimic real client request. In this case header object can be traversed manually before a request, e.g.:
const normalizeHeaderCase = require("header-case-normalizer");
const req = request.post('...', { headers: ... });
for (const [name, value] of Object.entries(req.headers)) {
delete req.headers[name];
req.headers[normalizeHeaderCase(name)] = value;
}
req.on('response', function(response) {...});
Related
from chrome browser I fetch some header and values as below
accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
cookies:_gcl_au=<some number>; _ga=<some number>; _gid=<some string>;
csrftoken=<some string>; sessionid=<some string>
I want to fetch (HTTP GET)same url using python request library and use same cookies and session id. My csrftoken and sessionid is already defined that I want to use for requests.get . Is it possible and if yes then how to do it?
Thanks
Here is my attempt to download excel file ##----------
How Do i make it work. Can someone please help me to fix last call
import requests
from bs4 import BeautifulSoup
url = "http://lijekovi.almbih.gov.ba:8090/SpisakLijekova.aspx"
useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36 Edg/97.0.1072.76"
headers={
"User-Agent":useragent
}
session = requests.session() #session
r = session.get(url,headers=headers) #request to get cookies
soup = BeautifulSoup(r.text,"html.parser") #parsing values
viewstate = soup.find('input', {'id': '__VIEWSTATE'}).get('value')
viewstategenerator =soup.find('input', {'id': '__VIEWSTATEGENERATOR'}).get('value')
eventvalidation =soup.find('input', {'id': '__EVENTVALIDATION'}).get('value')
cookies = session.cookies.get_dict()
cookie=""
for k,v in cookies.items():
cookie+=k+"="+v+";"
cookie = cookie[:-1]
#header copied from the requests.
headers={
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.9',
'Connection':'keep-alive',
'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36 Edg/97.0.1072.76',
'X-KL-Ajax-Request':'Ajax_Request',
'X-MicrosoftAjax':'Delta=true',
'X-Requested-With':'XMLHttpRequest',
'Cookie':cookie
}
#post request data submission
data={
'ctl00$smMain':'ctl00$MainContent$ReportGrid$ctl103$ReportGrid_top_4',
'__EVENTTARGET':'ctl00$MainContent$ReportGrid$ctl103$ReportGrid_top_4',
'__VIEWSTATE':viewstate,
'__VIEWSTATEGENERATOR':viewstategenerator,
'__EVENTVALIDATION':eventvalidation,
'__ASYNCPOST':'true'
}
#need help with this part
result = requests.get(url,headers=headers,data=data)
print(result.headers)
data = {
"__EVENTTARGET":'ctl00$MainContent$btnExport',
'__VIEWSTATE':viewstate,
}
#remove ajax request for the last call to download excel file
del headers['X-KL-Ajax-Request']
del headers['X-MicrosoftAjax']
del headers['X-Requested-With']
result = requests.post(url,headers=headers,data=data,allow_redirects=True)
print(result.headers)
print(result.status_code)
#print(result.text)
with open("test.xlsx","wb") as f:
f.write(result.content)
I am trying to export excel file without selenium help, but I am not able to get the last step. I need help to convert xmlhttprequest to pure requests using python without any selenium
I want get information about documents when enter company id 0000000155
My pseudo code I did know where i should pass company id.
url = "https://ekrs.ms.gov.pl/rdf/pd/search_df"
payload={}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)
First of all- you forgot to close the string after the 'Accept' dictionary value. That is to say, your headers should look like this:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
}
As for the payload, after checking the website you linked, I noticed that the ID is sent in the unloggedForm:krs2 parameter. You can add this to the payload like so:
payload={
'unloggedForm:krs2': 0000000155
}
However, in reality, it's nearly impossible to scrape the website like so, because there is ReCaptcha built into the website. Your only options now are either to use Selenium and hope that ReCaptcha doesn't block you, or to somehow reverse engineer ReCaptcha (unlikely).
I have function getBody, which gets body from url, on some url (I don't know exactly of which one) I always get error:
_http_outgoing.js:494
throw new TypeError('The header content contains invalid characters');
Those urls contains mostly danish accents characters, this is maybe problem. I have set header : 'Content-Type': 'text/plain; charset=UTF-8', which set charset to UTF-8. Probably header host is problem.
I have tried using punycode, or url, which converts url to ASCII, but those converted URLs did not work.
function getBody(n) {
var url = n; //urls[n];
url = (url.indexOf('http://')==-1 && url.indexOf('https://')==-1) ? 'http://'+url : url;
instance.get(url,
{
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
'Content-Type': 'text/plain; charset=UTF-8'
},
}
}
I try to open url with python3:
import urllib.request
fp = urllib.request.urlopen("http://lebed.com/")
mybytes = fp.read()
mystr = mybytes.decode("utf8")
fp.close()
print(mystr)
But it hangs on second line.
What's the reason of this problem and how to fix it?
I suppose the reason is that the url does not support robot visiting a site visit. You need to fake a browser visit by sending browser headers along with your request
import urllib.request
url = "http://lebed.com/"
req = urllib.request.Request(
url,
data=None,
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
)
f = urllib.request.urlopen(req)
Tried this one on my system and it works.
Agree with Arpit Solanki. Shown output for a failed request vs successful.
Failed
GET / HTTP/1.1
Accept-Encoding: identity
Host: www.lebed.com
Connection: close
User-Agent: Python-urllib/3.5
Success
GET / HTTP/1.1
Accept-Encoding: identity
Host: www.lebed.com
Connection: close
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36