Web-Scraping in Node.Js passing in Cookies/Headers - node.js

Not sure if this is really a stackoverflow-ey question, but here goes:
I'm trying to web-scrape a page that requires you to be logged in to view the extra data. I already have an account, and am trying to replicate the POST of the login page to log me in and get a cookie to use for the rest of the pages.
If I chrome debug the POST, the request headers are:
POST /authenticate/login?ReturnUrl=%2Fauthorize%3Fresponse_type%3Dcode%26client_id%3Dc82cb4c9-7cfa-4483-938b-2d3c61efabea%26redirect_uri%3Dhttps%253A%252F%252Fthenuel.com%252Fsignin-nuel%26scope%3Didentity%2520offline%26state%3Db9lB42PxZ6AZU-cmP-zOOLjaPHsif9z5yI7mVfxlLtiv00R_4O-FDtsh-GmFMYvZa7-mw6WdJMGWd1owC2SABiQOKJXdGvPC7XahPStVpsdoVypeGl3Rk-oDSNU7V4700LRV2D9URjtmIpCfAwE1WjLeUXJOZZ2GjQzF_UdBz9BtdlHR7hdR4iUMmebLKpZU2Y-vGuocOhT1D3G_gxcJ0aE9jw_PhPuFe3IGnpno86XKEsQlcviK5aYj2vyhXXTs HTTP/1.1
Host: login.thenuel.com
Connection: keep-alive
Content-Length: 177
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Origin: https://login.thenuel.com
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: https://login.thenuel.com/authenticate/login?ReturnUrl=%2Fauthorize%3Fresponse_type%3Dcode%26client_id%3Dc82cb4c9-7cfa-4483-938b-2d3c61efabea%26redirect_uri%3Dhttps%253A%252F%252Fthenuel.com%252Fsignin-nuel%26scope%3Didentity%2520offline%26state%3Db9lB42PxZ6AZU-cmP-zOOLjaPHsif9z5yI7mVfxlLtiv00R_4O-FDtsh-GmFMYvZa7-mw6WdJMGWd1owC2SABiQOKJXdGvPC7XahPStVpsdoVypeGl3Rk-oDSNU7V4700LRV2D9URjtmIpCfAwE1WjLeUXJOZZ2GjQzF_UdBz9BtdlHR7hdR4iUMmebLKpZU2Y-vGuocOhT1D3G_gxcJ0aE9jw_PhPuFe3IGnpno86XKEsQlcviK5aYj2vyhXXTs
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8
Cookie: ARRAffinity=f02c8a40711ffa249ac8dcf17e82c47021b4939e86cdd8caa8a1729b4a81838d; _ga=GA1.2.592576149.1458220828; _gat=1; __RequestVerificationToken=-t6-ReUs7KPoo2ioYs4h3OQ-2VLJYE5IRq5GcEGBG5YKGe84VXNdJO4taMK4CCV_HXbFJI_ZflWZqALWjrA29pLZcjWakodi19rtT0sJwiQ1; ARRAffinity=e310baf6f2079f1b7c40c521ea7e13fd41184f9683f30fea9f5312b081e077ba
and I also need to pass in some form data:
__RequestVerificationToken=tTi9aRzgeb0zA1z3QMZ1iWbGuC4ajR9Ke2VctCLnUlaTKFg1m-70WSOEsZf3PLEUgRdr4n1rEPVvwmfvN6RwrGMKvqnQvjP_gWsAAxAHPY1&UserName=email%40email.com&Password=asdas
I know this is a bit complicated, but I am completely stuck, so i'll post my code and try to run through it as cleanly as I can:
'I am using Superagent-cache and Cheerio'
var url = "https://login.thenuel.com/authenticate/login?ReturnUrl=%2Fauthorize%3Fresponse_type%3Dcode%26client_id%3Dc82cb4c9-7cfa-4483-938b-2d3c61efabea%26redirect_uri%3Dhttps%253A%252F%252Fthenuel.com%252Fsignin-nuel%26scope%3Didentity%2520offline%26state%3DiNj9r1juVrmUK5DyLfXnjq6bM6Fci5E1seI-faOadJQsfBKC9PQJJA-wve3TrusBfhrcjNk8C932FDA_vgQIyrlg36K6ucoC3HZkAO-Yn-mRXmaVqZcdKPRvgwYr55UkeETK4ZsjyuOXNixzk0Z3AslC2ZVN2dqiqoPfpoYtz_n-xgtJlvN5WwRt6cEAvzSwhHkFX4UPUF_1OalC8J4aYO-FHfUjTp8Bv4xBe7w0j0exmjcsMIjpmnp4qbN3qz7u";
request
.get(url)
.end(function(err, res) {
if (err) {
console.log(err);
} else {
// This is for getting the response headers from the GET page
console.log("~~~GET~~~~~~~~~~~~~~~~~~~~~~~~")
$ = cheerio.load(res.headers);
console.log(res.headers);
var secondRequestVToken = res.headers['set-cookie'][0]
console.log("~~~Second RequestVerificationToken: " + secondRequestVToken);
var firstARRAffinity = "ARRAffinity=f02c8a40711ffa249ac8dcf17e82c47021b4939e86cdd8caa8a1729b4a81838d; _ga=GA1.2.592576149.1458220828; _gat=1;"
var secondARRAffinity = res.headers['set-cookie'][1]
console.log("~~~Second ARRAfinnity: " + secondARRAffinity);
// This is the getting of the post Request data from the login page
console.log("~~~POST~~~~~~~~~~~~~~~~~~~~~~~~");
$ = cheerio.load(res.text)
var postUrl = "https://login.thenuel.com" + $('body > div > div > div.content-pane.login > form').attr("action");
console.log("~~~URL POST: " + postUrl);
var firstRequestVToken = $('body > div > div > div.content-pane.login > form > input[type="hidden"]:nth-child(1)').attr("value");
console.log("~~~First POST RequestVerificationToken: " + firstRequestVToken);
var formData= {
UserName:'email#email.com',
Password:'Password',
__RequestVerificationToken:firstRequestVToken
}
console.log(formData);
console.log("~~~COOKIE STUFF~~~")
secondRequestVToken = secondRequestVToken.slice(0, -7);
console.log(secondRequestVToken);
secondARRAffinity = secondARRAffinity.slice(0, -31)
console.log(secondARRAffinity);
var finalCookieString = firstARRAffinity + " "
+ secondRequestVToken + " " + secondARRAffinity;
console.log("~~~Final Cookie String: " + finalCookieString);
request
.post(postUrl)
.send(formData)
.set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
.set("Accept-Encoding", "gzip, deflate")
.set("Accept-Language", "en-US,en;q=0.8")
.set("Cache-Control", "max-age=0")
.set("Connection", "keep-alive")
.set("Content-Length", 177)
.set("Content-Type", "application/x-www-form-urlencoded")
.set("Cookie", finalCookieString)
.set("Host", "login.thenuel.com")
.set("Origin", "https://login.thenuel.com")
.set("Referer", postUrl)
.set("Upgrade-Insecure-Requests", 1)
.set("User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36")
.end(function(err, res) {
// Do respone here
if (err) {
console.log("~~~Hello failure world!");
console.log(err);
} else {
console.log("Hello success world!");
console.log(res);
}
})
So I am getting a GET of the login page to grab the 'RequestVerificationToken' and the second ARRAfinnity string - both to add to the cookie string I send to the POST.
I am then getting the POST Url from the Login Form Action, and the 'RequestVerificationToken' that gets put in the form data to send along with my username and password.
I then remove some of the guff from the tokens and stuff to make it the same as the request my browser sends off to that POST. I then make the POST call.
Here is my log with the console.logs:
Server Running...
~~~GET~~~~~~~~~~~~~~~~~~~~~~~~
{ 'cache-control': 'private',
'content-length': '1777',
'content-type': 'text/html; charset=utf-8',
'content-encoding': 'gzip',
vary: 'Accept-Encoding',
server: 'Microsoft-IIS/8.0',
'set-cookie':
[ '__RequestVerificationToken=SL9MYCWkPdY3dI66vBq7BKt4wxfzmNQCO6IEg8EteTdCIe-BCiKbBNCIbWtb3jD9ZbNSR
ZmUIlVxzICnKGX5PpPOsQvp5me7NJoc4BHu1Ew1; path=/',
'ARRAffinity=e310baf6f2079f1b7c40c521ea7e13fd41184f9683f30fea9f5312b081e077ba;Path=/;Domain=login
.thenuel.com' ],
'x-aspnetmvc-version': '5.2',
'x-aspnet-version': '4.0.30319',
date: 'Wed, 30 Mar 2016 13:48:55 GMT',
connection: 'close',
prev: null,
next: null,
root:
{ type: 'root',
name: 'root',
attribs: {},
children: [ [Circular] ],
next: null,
prev: null,
parent: null },
parent: null }
~~~Second RequestVerificationToken: __RequestVerificationToken=SL9MYCWkPdY3dI66vBq7BKt4wxfzmNQCO6IEg8E
teTdCIe-BCiKbBNCIbWtb3jD9ZbNSRZmUIlVxzICnKGX5PpPOsQvp5me7NJoc4BHu1Ew1; path=/
~~~Second ARRAfinnity: ARRAffinity=e310baf6f2079f1b7c40c521ea7e13fd41184f9683f30fea9f5312b081e077ba;Pa
th=/;Domain=login.thenuel.com
~~~POST~~~~~~~~~~~~~~~~~~~~~~~~
~~~URL POST: https://login.thenuel.com/authenticate/login?ReturnUrl=%2Fauthorize%3Fresponse_type%3Dcod
e%26client_id%3Dc82cb4c9-7cfa-4483-938b-2d3c61efabea%26redirect_uri%3Dhttps%253A%252F%252Fthenuel.com%
252Fsignin-nuel%26scope%3Didentity%2520offline%26state%3DiNj9r1juVrmUK5DyLfXnjq6bM6Fci5E1seI-faOadJQsf
BKC9PQJJA-wve3TrusBfhrcjNk8C932FDA_vgQIyrlg36K6ucoC3HZkAO-Yn-mRXmaVqZcdKPRvgwYr55UkeETK4ZsjyuOXNixzk0Z
3AslC2ZVN2dqiqoPfpoYtz_n-xgtJlvN5WwRt6cEAvzSwhHkFX4UPUF_1OalC8J4aYO-FHfUjTp8Bv4xBe7w0j0exmjcsMIjpmnp4q
bN3qz7u
~~~First POST RequestVerificationToken: bqsa4HwniG2PeJVTBKWjg2ux0S4zUJ-Y1U4C_YF93za33dPnNortSOTeHZyyWW
dT_WECqgr44IbJ_FjUSfu9_N3ITdjnJuJhuMvBp_dYQ4c1
{ UserName: 'email#email.com',
Password: 'Password',
__RequestVerificationToken: 'bqsa4HwniG2PeJVTBKWjg2ux0S4zUJ-Y1U4C_YF93za33dPnNortSOTeHZyyWWdT_WECqgr
44IbJ_FjUSfu9_N3ITdjnJuJhuMvBp_dYQ4c1' }
~~~COOKIE STUFF~~~
__RequestVerificationToken=SL9MYCWkPdY3dI66vBq7BKt4wxfzmNQCO6IEg8EteTdCIe-BCiKbBNCIbWtb3jD9ZbNSRZmUIlV
xzICnKGX5PpPOsQvp5me7NJoc4BHu1Ew1;
ARRAffinity=e310baf6f2079f1b7c40c521ea7e13fd41184f9683f30fea9f5312b081e077ba;
~~~Final Cookie String: ARRAffinity=f02c8a40711ffa249ac8dcf17e82c47021b4939e86cdd8caa8a1729b4a81838d;
_ga=GA1.2.592576149.1458220828; _gat=1; __RequestVerificationToken=SL9MYCWkPdY3dI66vBq7BKt4wxfzmNQCO6I
Eg8EteTdCIe-BCiKbBNCIbWtb3jD9ZbNSRZmUIlVxzICnKGX5PpPOsQvp5me7NJoc4BHu1Ew1; ARRAffinity=e310baf6f2079f1
b7c40c521ea7e13fd41184f9683f30fea9f5312b081e077ba;
Hello success world!
{ body: {},
text: '<!doctype html>\r\n<html>\r\n<head>\r\n <link rel="stylesheet" type="text/css" href="//nu
el-ui-playgroundofwonders.azurewebsites.net/Content/nuel-reset.css" async />\r\n <link rel="stylesh
eet" type="text/css" href="//nuel-ui-playgroundofwonders.azurewebsites.net/Content/nuel-base.css" asyn
c />\r\n <title>Sorry :( - The NUEL</title>\r\n</head>\r\n<body>\r\n <div class="stretch" style=
"width: 23rem; padding: 1.3rem 1rem; background-color: #ffffff; box-shadow: #e0e0e0 0px 0px 1px; margi
n: 3rem auto 0px;">\r\n <h1 class="non-content">Something\'s up!</h1>\r\n <p style="line
-height: 1.3rem;">An error occured during your last request, if you were logging in - try clearing you
r cookies and then retrying. You can <a href="https://support.google.com/chrome/answer/95647?hl=en-GB"
target="_blank">find out how to do that here</a>.</p>\r\n <p style="line-height: 1.3rem;">If t
hat doesn\'t work then just drop us an email at issues#thenuel.com and we\'ll get back to you ASAP.</p
>\r\n <a class="button minimal primary small" href="https://thenuel.com/">Go to thenuel.com</a>
\r\n </div>\r\n</body>\r\n</html>',
headers:
{ 'content-length': '765',
'content-type': 'text/html',
'content-encoding': 'gzip',
'last-modified': 'Wed, 28 Oct 2015 23:22:44 GMT',
'accept-ranges': 'bytes',
etag: '"1c315c8cd711d11:0"',
vary: 'Accept-Encoding',
server: 'Microsoft-IIS/8.0',
'set-cookie': [ 'ARRAffinity=e310baf6f2079f1b7c40c521ea7e13fd41184f9683f30fea9f5312b081e077ba;Pat
h=/;Domain=login.thenuel.com' ],
date: 'Wed, 30 Mar 2016 13:48:57 GMT' },
statusCode: 200,
status: 200,
ok: true }
I would've hoped that it would have returned a cookie for being logged
The Verification tokens change each time. The correct verification token is being sent to the form and the correct one is being sent to the cookie string.
I know this is super complicated, but I'm completely stumped.
The website is login.thenuel.com.
Any help is appreciated, but I think this might just be a keep on trying sort of thing! And if there's any other info I need to provide just ask - and it might be easier to just try and do it yourself and see if you can do it!
Thanks

Related

recreating cURL request with -I, -H flags in nodeJS

On the command line, I can do a request like: curl -I -H "Fastly-Debug: 1"
and it will return a lot of helpful information from the CDN serving that URL, in this case, Fastly:
cache-control: public, max-age=0, must-revalidate
last-modified: Tue, 20 Apr 2021 21:17:46 GMT
etag: "4c5cb3eb0ddb001584dad329b8727a9a"
content-type: text/html
server: AmazonS3
surrogate-key: /v3.6/tutorial/nav/alerts-and-monitoring/
accept-ranges: bytes
date: Fri, 30 Apr 2021 20:50:15 GMT
via: 1.1 varnish
age: 0
fastly-debug-path: (D cache-lga21923-LGA 1619815815) (F cache-lga21940-LGA 1619815815)
fastly-debug-ttl: (M cache-lga21923-LGA - - 0)
fastly-debug-digest: 04c3e527819b6a877de6577f7461e132b97665100a63ca8f667d87d049092233
x-served-by: cache-lga21923-LGA
x-cache: MISS
x-cache-hits: 0
x-timer: S1619815815.944515,VS0,VE136
vary: Accept-Encoding
content-length: 65489
How can I do this in node?
This is my attempt:
const headers = {
'Fastly-Key': environment.getFastlyToken(),
'Accept': 'application/json',
'Content-Type': 'application/json',
'Fastly-Debug': 1
};
async retrieveSurrogateKey(url) {
console.log("this is the url: ", url)
try {
request({
method: `GET`,
url: url,
headers: headers,
}, function(err, response, body) {
if (err){
console.trace(err)
}
console.log(request.headers)
})
} catch (error) {
console.log("error in retrieval: ", error)
}
}
Is there a way for me to pass in the -I and -H flags?
The -H (header) flag allows you to specify a custom header in cURL. Your code already does this - great! All that's left is emulating the -I (head) flag.
From cURL manpage for the -I option:
Fetch the headers only! HTTP-servers feature the command HEAD which this uses to get nothing but the header of a document.
To use the HEAD method, you need to specify it instead of GET:
method: `HEAD`
And finally, the headers returned by the server in the response can be obtained from response.headers.

Python requests text only returning  instead of HTML

I'm trying to scrape the link to a file to download later from a website.
My code:
outage_page = 'https://www.oasis.oati.com/cgi-bin/webplus.dll?script=/woa/woa-planned-outages-report.html&Provider=MISO'
s = requests.Session()
req = s.get(outage_page, stream=True, verify='my cert path is here')
print(req, '\n', req.headers, '\n', req.raw, '\n', req.encoding, '\n', req.content, '\n', req.text)
This is the output I get:
{'Content-Type': 'text/html', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding', 'Server': 'Microsoft-IIS/7.5', 'X-Powered-By': 'ASP.NET', 'X-Content-Type-Options': 'nosniff', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'Date': 'Mon, 26 Aug 2019 15:48:39 GMT', 'Content-Length': '136'}
ISO-8859-1
b'\xef\xbb\xbf\xef\xbb\xbf\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n \r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n'

Process finished with exit code 0
I expected req.text to return the html I could scrape, but it only returns . The other print statements are just for reference here. What am I doing wrong?
I'm going to go ahead and post my solution. So I converted my certificate file from .cer to .pem, included the cert in the session instead of the get and added headers to the request. I changed verify to false because it refers to server side certificate not client side.
# create the connection
s = requests.Session()
s.cert = 'path/to/cert.pem'
head = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36'
}
req = s.get(outage_page, headers=head, verify=False)

Python POST request to retrieve base64 encode File

Im trying to POST request using Python to retreive a specific File. Since the URL is behind a server with authorized access theres no use posting it here
However the form data contains a field called base64 and lengthy which I cant figure out if its a form data value or base64 encoding of post request
Here are browser parameters
General:
Request URL: http://exampleapi.com/api/Document/Export
Request Method: POST
Status Code: 200 OK
Remote Address: XX.XXX.XXX.XX:XX
Referrer Policy: no-referrer-when-downgrade
Response Headers:
Access-Control-Allow-Origin: http://example.com
Cache-Control: no-cache
Content-Disposition: attachment; filename=location-downloads.xlsx
Content-Length: 7148
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Date: Tue, 23 Jul 2019 21:00:18 GMT
Expires: -1
Pragma: no-cache
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Request Headers :
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 10162
Content-Type: application/x-www-form-urlencoded
Cookie: abcConnection=!UA7tkC3iZCmVNGRUyRpDWARVBWk/lY6SZvgxLlaygsQKk+vuwA1NxvhwE9ph4i+3NZlKeepIfuHhUvyQjl68fhhrT9ueqMx/3mBKUDcT
DNT: 1
Host: exampleapi.com
Origin: http://example.com
Referer: http://example.com/
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36
Form Data:
fileName: location-downloads.xlsx
contentType: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
base64: UEsDBAoAAAAAAAh4904AAAAAAAAAAAAAAAAJAAAAZG9jUHJvcHMvUEsDBAoAAAAIAAh490(shortened for simplicity)
Here is what I tried
url='http://example.com'
urllib3.disable_warnings()
headers = {
"Content-Type": "application/x-www-form-urlencoded",
"User-Agent": "Mozilla/5.0",
}
with requests.session() as s:
r=s.get(url,headers={"User-Agent":"Mozilla/5.0"},verify=False)
data=r.content
soup=BeautifulSoup(data,'html.parser')
form_data = {
"fileName":"location-downloads.xlsx",
"contentType":"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
}
r2=s.post('http://exampleapi.com/api/Document/Export',data=json.dumps(form_data,ensure_ascii=True).encode('utf-8'),headers=headers,verify=False)
print(r2.status_code)
Any idea how i should proceed. My status code also shows 500 for the post here

Getting the API Key from ServiceStack request

Have a simple get Customer api that's returning list of customers fine.
Setting up for service to service authentication, if I make this [Authenticated] and try to implement using ApiKeyAuthProvider, the req.GetApiKey returns null and I get an error;
Microsoft.AspNetCore.Hosting.Internal.WebHost:Information: Request starting HTTP/1.1 POST https://localhost:44347/api/customers application/json 0
Microsoft.AspNetCore.Hosting.Internal.WebHost:2019-07-01 16:50:34,004 [16] INFO Microsoft.AspNetCore.Hosting.Internal.WebHost - Request starting HTTP/1.1 POST https://localhost:44347/api/customers application/json 0
The thread 0x42cc has exited with code 0 (0x0).
The thread 0x302c has exited with code 0 (0x0).
ServiceStack.ServiceStackHost:2019-07-01 17:01:14,601 [16] ERROR ServiceStack.ServiceStackHost - ServiceBase<TRequest>::Service Exception
System.ArgumentOutOfRangeException: Length cannot be less than zero.
Parameter name: length
at System.String.Substring(Int32 startIndex, Int32 length)
at ServiceStack.Host.HttpRequestAuthentication.GetBasicAuth(IRequest httpReq) in C:\BuildAgent\work\3481147c480f4a2f\src\ServiceStack\Host\HttpRequestAuthentication.cs:line 45
at ServiceStack.Host.HttpRequestAuthentication.GetBasicAuthUserAndPassword(IRequest httpReq) in C:\BuildAgent\work\3481147c480f4a2f\src\ServiceStack\Host\HttpRequestAuthentication.cs:line 50
at ServiceStack.Auth.ApiKeyAuthProvider.PreAuthenticate(IRequest req, IResponse res) in C:\BuildAgent\work\3481147c480f4a2f\src\ServiceStack\Auth\ApiKeyAuthProvider.cs:line 232
at ServiceStack.AuthenticateAttribute.PreAuthenticate(IRequest req, IEnumerable`1 authProviders) in C:\BuildAgent\work\3481147c480f4a2f\src\ServiceStack\AuthenticateAttribute.cs:line 96
at ServiceStack.AuthenticateAttribute.ExecuteAsync(IRequest req, IResponse res, Object requestDto) in C:\BuildAgent\work\3481147c480f4a2f\src\ServiceStack\AuthenticateAttribute.cs:line 74
at ServiceStack.Host.ServiceRunner`1.ExecuteAsync(IRequest req, Object instance, TRequest requestDto) in C:\BuildAgent\work\3481147c480f4a2f\src\ServiceStack\Host\ServiceRunner.cs:line 127
Microsoft.AspNetCore.Hosting.Internal.WebHost:Information: Request finished in 640574.8754ms 400 application/json; charset=utf-8
Microsoft.AspNetCore.Hosting.Internal.WebHost:2019-07-01 17:01:14,607 [16] INFO Microsoft.AspNetCore.Hosting.Internal.WebHost - Request finished in 640574.8754ms 400 application/json; charset=utf-8
Clearly I have missed something obvious...any pointers appreciated.
// Register ORMLite connection
container.Register<IDbConnectionFactory>(dbFactory);
//Tell ServiceStack you want to persist User Auth Info in SQL Server
container.Register<IAuthRepository>(c => new OrmLiteAuthRepository(dbFactory));
// See https://docs.servicestack.net/api-key-authprovider
Plugins.Add(new AuthFeature(() => new AuthUserSession(),
new IAuthProvider[] {
new ApiKeyAuthProvider(AppSettings) {
SessionCacheDuration = TimeSpan.FromMinutes(10),
AllowInHttpParams = true, // Whether to allow API Keys in 'apikey' QueryString or FormData (e.g. `?apikey={APIKEY}`)
RequireSecureConnection = true,
},
}
) {
IncludeRegistrationService = true,
});
GlobalRequestFilters.Add((req, res, dto) =>
{
LastApiKey = req.GetApiKey();
});
Request
POST https://localhost:44347/api/customers HTTP/1.1
Host: localhost:44347
Connection: keep-alive
Content-Length: 2
Accept: application/json
Origin: https://localhost:44347
Authorization: yDOr26HsxyhpuRB3qbG07qfCmDhqutnA-yDOr26HsxyhpuRB3qbG07qfCmDhqutnA-yDOr26HsxyhpuRB3qbG07qfCmDhqutnA
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
Content-Type: application/json
Referer: https://localhost:44347/swagger-ui/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-GB,en;q=0.9,en-US;q=0.8
{}
Response
HTTP/1.1 400 ArgumentOutOfRangeException
Transfer-Encoding: chunked
Content-Type: application/json; charset=utf-8
Vary: Accept,Origin
Server: Microsoft-IIS/10.0
X-Powered-By: ServiceStack/5.50 NetCore/Windows
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, PATCH, OPTIONS
Access-Control-Allow-Headers: Content-Type
X-Startup-Errors: 1
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Content-Disposition
X-SourceFiles: =?UTF-8?B?QzpcUmVwb3NcTUJXZWJccnZhcGlcUnZXZWJcUnZBcGlcYXBpXGN1c3RvbWVycw==?=
X-Powered-By: ASP.NET
Date: Wed, 03 Jul 2019 08:07:40 GMT
13e
{"responseStatus":{"errorCode":"ArgumentOutOfRangeException","message":"Length cannot be less than zero.\r\nParameter name: length","errors":[{"errorCode":"ArgumentOutOfRangeException","fieldName":"length","message":"Length cannot be less than zero.\r\n"}]},"responseCreatedUtcDateTime":"2019-07-03T08:07:40.7955827Z"}
0
Your client is sending an invalid Authorization Bearer Token, it needs to have the Authroization Bearer Token format:
Authorization: Bearer {Token}
If you're sending an Authenticated API Key or JWT Request via Open API it needs to have the Bearer prefix as per the Open API docs:
OK I had manually created a User and APIKey in the underlying tables and had used a UserAuthId 'SomeAuthId' i.e. letter in them, and the ORM repository code is expecting these to be integers. Its cool that I can see the code in github and debug this myself - thanks for the comment as it got me thinking and looking into my Auth setup.

Access to a web page via a robot

I need to occasionally access an HTML page to update a database. This page is easily accessible via a web browser, but when I try to access it via a node.js application it doesn't work (the website detect that the request is made by a robot). However,
The robot request contains the same headers (including the
user-agent) that the web browser request.
The robot request doesn't contains referer header or cookie header, but the browser request either.
The IP of the robot is the same that the IP that I use
to browse the website.
In my eyes the robot request and the browser request are strictly identical. Nevertheless they are processed differently.
I'm running out of ideas... Maybe the request contains metadata like "this request was sent by node.js" but it would be really weird.
EDIT, here is a code sample :
// callback (error, responseContent)
function getPage (callback){
let options = {
protocol : 'https:',
hostname : 'xxx.yyy.fr',
port : 443,
path : '/abc/def',
agent : false,
headers : {
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding' : 'gzip, deflate, br',
'Accept-Language' : 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3',
'Cache-Control' : 'no-cache',
'Connection' : 'keep-alive',
'DNT' : '1',
'Host' : 'ooshop.carrefour.fr',
'Pragma' : 'no-cache',
'Upgrade-Insecure-Requests' : '1',
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0'
}
};
https.get (options, function (res){
if (res.statusCode !== 200){
res.resume ();
callback ('Error : res code != 200, res code = ' + res.statusCode);
return;
}
res.setEncoding ('utf-8');
let content = '';
res.on ('data', chunk => content += chunk);
res.on ('end', () => callback (null, content));
}).on ('error', e => callback (e));
}
EDIT : here is a comparison of the requests/responses :
Mozilla Firefox
request headers :
GET /3274080001005/eau-de-source-cristaline HTTP/1.1
Host: ooshop.carrefour.fr
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate, br
DNT: 1
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache
response headers :
HTTP/2.0 200 OK
date: Wed, 11 Jul 2018 21:25:25 GMT
server: Unknown
content-type: text/html; charset=UTF-8
age: 0
x-varnish-cache: MISS
accept-ranges: bytes
set-cookie: visid_incap_1213048=G8a0mWzmQYi0GKuT2Ht7YeQ9QVsAAAAAQkIPAAAAAADvVZnsZHK18dQQxHakBprg; expires=Thu, 11 Jul 2019 11:17:56 GMT; path=/; Domain=.carrefour.fr
incap_ses_466_1213048=/2NKHS4HXU0T7FpkwpJ3BsV1RlsAAAAAAY3wbUkXacAceu2NkgUrhw==; path=/; Domain=.carrefour.fr
x-iinfo: 7-11020186-11020187 NNNN CT(1 2 0) RT(1531344324722 0) q(0 0 0 0) r(4 4) U12
x-cdn: Incapsula
content-encoding: gzip
X-Firefox-Spdy: h2
response content : expected HTML page
Node.js robot
request headers :
Accept : text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding : gzip, deflate, br
Accept-Language : fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Cache-Control : no-cache
Connection : keep-alive
DNT : 1
Host : ooshop.carrefour.fr
Pragma : no-cache
Upgrade-Insecure-Requests : 1
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0
response headers :
content-type : text/html
connection : close, close
cache-control : no-cache
content-length : 210
x-iinfo : 1-17862634-0 0NNN RT(1531344295049 65) q(0 -1 -1 0) r(0 -1) B10(4,314,0) U19
set-cookie : incap_ses_466_1213048=j34jMBWkPFYT7FpkwpJ3Bqd1RlsAAAAAVBfoZBShAvoun/M8UFxPPA==; path=/; Domain=.carrefour.fr
response content :
<html>
<head>
<META NAME="robots" CONTENT="noindex,nofollow">
<script src="/_Incapsula_Resource?SWJIYLWA=5074a744e2e3d891814e9a2dace20bd4,719d34d31c8e3a6e6fffd425f7e032f3">
</script>
<body>
</body></html>

Resources