Downloading files through puppeteer, need to maintain cookies/state - node.js

I'm currently downloading files using the request package using the following code:
request
.get({
url,
headers: {
Authorization: "base64"
}
})
.pipe(fs.createWriteStream('test.xlsx'))
This works nice and all when the authentication is a simple username/password or doesn't have one at all, but once 2 factor authentication comes in, it's going to be a real hassle since this method doesn't keep track of your cookies or login state (or however it's tracked).
So how would I get the buffer/data during puppeteer's run time then pipe it into another filestream (note I will need to do this recursively for several files).

I think you can construct the cookies header from puppeteer like this:
cookies = await page.cookies();
cookie_str = "";
for(var i = 0; i < cookies.length; i+=1){
a = cookies[i];
cookie_str += a.name + "=" + a.value + ";";
}
and then use request with a cookie header:
request.get({
url: download_link,
headers: {
"cookie": cookie_str,
}
}).pipe(fs.createWriteStream("ofname"))

Related

post request successful with Postman - unsucessful with fetch-api

I have been bashing my head against the wall for the last 2 days with the following problem.
This is the scenario: When I make a GET request by browsing to a particular website, this website sends a cookie called PHPSESSION="xyz" it then prompts the user to enter a password and subsequently makes a post request to the same URL sending this particular cookie and a hidden form element alongside for verification and upon success sends a pdf.
I can successfully replicate this in Postman.
I make a get request - it sets the cookie - I have password filled into my form-data responds body and manually add the secret string that is added to the form for verification -> send... and I get the pdf - so far so good.
However, I would like to automate this process so that I don't have to painstakingly extract the value of the hidden form by hand but use node.js to make these requests so I wrote the following code:
// making the get request to the URL above
// extract the cookie PHPSESSION value
const sessionString = String(response.headers.get('set-cookie')).substring(10,36)
// parse the body
const htmlBody = await response.text()
let doc = new DOMParser().parseFromString(htmlBody)
// extract the verification token from the form
const formToken = await doc.getElementById('verification__token').getAttribute('value')
let formData = new FormData();
formData.append('verification[char_1]',0)
formData.append('verification[char_2]',6)
formData.append('verification[char_3]',4)
formData.append('verification[char_4]',5)
formData.append('verification[char_5]',8)
formData.append('verification[char_6]',1)
formData.append('verification[char_7]',7)
formData.append('verification[char_8]',6)
formData.append('verification[_token]',formToken)
const obj = {
headers:{
"Cookie" : `PHPSESSID=${sessionString};`,
"Content-Type": "application/x-www-form-urlencoded",
"User-Agent": "PostmanRuntime/7.29.2",
"Accept-Encoding": "gzip, deflate, br",
"credentials": "include"
},
method: "POST",
body: formData
}
const postResponse = await fetch("https://url...",obj)
const r = await postResponse.text()
Unfortunately, the post requests fails in node.js - the website is simply redirecting me to back to the form in which I have to type in the password.
I am suspecting it has something to do with the headers / cookie but I simply don't know.
Does anyone spots an obvious mistake?
Thank you
Solved... after sacrificing the entire weekend to this lovely task.
If anyone comes across a similar problem here is the solution - or at lest what helped me.
https://reqbin.com/curl
https://curlconverter.com
So basically make your request work with curl and then port it.
In my case that looked like this:
const x = await fetch('https://yourURL', {
method: 'POST',
headers: {
'Cookie': 'PHPSESSID=lfjdd2uba1bmecr064rt7chvu3; Path=/; Secure; HttpOnly;',
'Content-Type': 'application/x-www-form-urlencoded'
},
body: 'verification[_token]=5d5e4d8783daf952d5.UZ661yMyOUtJSQeG1Td7cUtxWqnI2Oaot-xMQevly4o.acH9hXR9SRkwGm30kE9WIggDNpqdl6Ln2rQnOIG9pcEp1tOiYnNLJggZcA&verification[char_1]=0&verification[char_2]=6&verification[char_3]=4&verification[char_4]=5&verification[char_5]=8&verification[char_6]=1&verification[char_7]=7&verification[char_8]=6'
});

How to add custom headers in getDocument() request of pdfjs

Trying to add custom headers to the pdfjs getDocument request.
Based on the GitHub suggestion have tried to add it.
Even while Debugging it is being shown but I am not sure why it is not working.
Below is my js code
var parameter = {
url: this.url,
httpHeaders: { Authorization: `password` },
withCredentials: true,
}
var loadingTask = pdfjsLib.getDocument(parameters);
This is my chrome network request
I believe the problem is happening, because you've defined the Authorization Header wrongly. Instead of putting the password, you must define the Authorization Type you are using. For instance, suppose you are using Basic Authorization, so the Authorization Header should be:
{
Authorization: 'BASIC <BASE64 ENCODED OF USERNAME:PASSWORD>'
}
If you are using it correctly, try to verify if your version (PDFJS) has already the patch which has fixed the problem. Just go to pdf.worker.js and verify the object NetworkManager. Verify if it has the httpHeaders and withCredentials properties defined. Something like this:
function NetworkManager(url, args) {
this.url = url;
args = args || {};
this.isHttp = /^https?:/i.test(url);
this.httpHeaders = (this.isHttp && args.httpHeaders) || {};
this.withCredentials = args.withCredentials || false;
...
}

Amazon MWS SubmitFeed Content-MD5 HTTP header did not match the Content-MD5 calculated by Amazon

I know this question is not new but all the solution I get for this are in PHP or my issue is different from them.
I am using MWS feed API to submit flat file for Price and Quantity Updates and always get the following error:
the Content-MD5 HTTP header you passed for your feed did not match the
Content-MD5 we calculated for your feed
I would like to ask 3 questions here:-
ContentMD5Value parameter is optional as given in doc, but if i not passed that than it will say that you must enter ContentMD5Value.
As in doc the ContentFeed which we are given to Amazon. Amazon create contentMD5 for that file and then compares that contentMD5 value with the contentMD5 value we send to Amazon.
If both match then OK, otherwise it will throw an error. But if suppose I will not send the file then also the same errors come that MD5 does not match. How is that possible? Which file are they calculating the MD5 for? Because I haven't send the file in ContentFeed.
If I send the contentMD5 in a header as well as parameter and sending the ContentFeed in body, I still get the error.
Note:- I am sending the contentMD5 in a header as well as in a parameters in form using request module and also calculating the signature with that and then pass the contentFeed in body.
I am using JavaScript (Meteor), I calculate the md5 using the crpyto module.
First, I think that my md5 is wrong but then I tried with an online website that will give me the md5 for a file the md5.
for my file is:
MD5 value: d90e9cfde58aeba7ea7385b6d77a1f1e
Base64Encodevalue: ZDkwZTljZmRlNThhZWJhN2VhNzM4NWI2ZDc3YTFmMWU=
The flat file I downloaded from for Price and Quantity Updates:-
https://sellercentral.amazon.in/gp/help/13461?ie=UTF8&Version=1&entries=0&
I calculated the signature also by giving ContentMD5Value while calculating the signature.
FeedType:'_POST_FLAT_FILE_PRICEANDQUANTITYONLY_UPDATE_DATA_'
As, I read documentation for that I passed the MD5-header in headers and also send as parameter.
Amazon doc says:
Previously, Amazon MWS accepted the MD5 hash as a Content-MD5 header
instead of a parameter. Passing it as a parameter ensures that the MD5
value is part of the method signature, which prevents anyone on the
network from tampering with the feed content.
Amazon MWS will still accept a Content-MD5 header whether or not a
ContentMD5Value parameter is included. If both a header and parameter
are used, and they do not match, you will receive an
InvalidParameterValue error.
I am using the request module for http requests.
I am passing all the required keys, seller id, etc. in form of request module and passing the FeedContent in body.
I tried sending the file as follows:
Method for submitFeed is:-
submitFeed : function(){
console.log("submitFeedAPI running..");
app = mwsReport({auth: {sellerId:'A4TUFSCXD64V3', accessKeyId:'AKIAJBU3FTBCJUIZWF', secretKey:'Eug7ZbaLljtrnGKGFT/DTH23HJ' }, marketplace: 'IN'});
app.submitFeedsAPI({FeedType:'_POST_FLAT_FILE_PRICEANDQUANTITYONLY_UPDATE_DATA_'},Meteor.bindEnvironment(function(err,response){
if(err){
console.log("error in submit feed...")
console.log(err)
}
else{
console.log("suuccess submit feed....")
console.log(response);
}
}))
Method that call Amazon submitFeedAPI is:-
var submitFeedsAPI = function(options, callback){
console.log("submitFeedsAPI running...");
var fileReadStream = fs.createReadStream('/home/parveen/Downloads/test/testting.txt');
var contentMD5Value = crypto.createHash('md5').update(file).digest('base64');
var reqForm = {query: {"Action": "SubmitFeed", "MarketplaceId": mpList[mpCur].id, "FeedType":options.FeedType,"PurgeAndReplace":false,"ContentMD5Value":contentMD5Value}};
mwsReqProcessor(reqForm, 'submitFeedsAPI', "submitFeedsAPIResponse", "submitFeedsAPIResult", "mwsprod-0000",false,file, callback);
}
also try
var fileReadStream = fs.createReadStream('/home/parveen/Downloads/test/testting.txt');
var base64Contents = fileReadStream.toString('base64');
var contentMD5Value = crypto.createHash('md5').update(base64Contents).digest('base64');
mwsReqProcessor function is as below:-
mwsReqProcessor = function mwsReqProcessor(reqForm, name, responseKey, resultKey, errorCode,reportFlag,file, callback) {
reqOpt = {
url: mwsReqUrl,
method: 'POST',
timeout: 40000,
body:{FeedContent: fs.readFileSync('/home/parveen/feedContentFile/Flat.File.PriceInventory.in.txt')},
json:true,
form: null,
headers: {
// 'Transfer-Encoding': 'chunked',
//'Content-Type': 'text/xml',
// 'Content-MD5':'ZDkwZTljZmRlNThhZWJhN2VhNzM4NWI2ZDc3YTFmMWU=',
// 'Content-Type': 'text/xml; charset=iso-8859-1'
'Content-Type':'text/tab-separated-values;charset=UTF-8'
},
}
reqOpt.form = mwsReqQryGen(reqForm);
var r = request(reqOpt, function (err, res, body){
console.log(err)
console.log(res)
})
// var form = r.form();
//form.append('FeedContent',fs.createReadStream('/home/parveen/feedContent//File/Flat.File.PriceInventory.in.txt'))
}
Method for mwsReqQryGen generation:-
mwsReqQryGen = function mwsReqQryGen(options) {
var method = (options && options.method) ? ('' + options.method) : 'POST',
host = (options && options.host) ? ('' + options.host) : mwsReqHost,
path = (options && options.path) ? ('' + options.path) : mwsReqPath,
query = (options && options.query) ? options.query : null,
returnData = {
"AWSAccessKeyId": authInfo.accessKeyId,
"SellerId": authInfo.sellerId,
"SignatureMethod": "HmacSHA256",
"SignatureVersion": "2",
"Timestamp": new Date().toISOString(),
"Version":"2009-01-01",
},
key;
if(query && typeof query === "object")
for(key in query)
if(query.hasOwnProperty(key)) returnData[key] = ('' + query[key]);
if(authInfo.secretKey && method && host && path) {
// Sort query parameters
var keys = [],
qry = {};
for(key in returnData)
if(returnData.hasOwnProperty(key)) keys.push(key);
keys = keys.sort();
for(key in keys)
if(keys.hasOwnProperty(key)) qry[keys[key]] = returnData[keys[key]];
var sign = [method, host, path, qs.stringify(qry)].join("\n");
console.log("..................................................")
returnData.Signature = mwsReqSignGen(sign);
}
//console.log(returnData); // for debug
return returnData;
};
I also tried with following:-
reqOpt = {
url: mwsReqUrl,
method: 'POST',
timeout: 40000,
json:true,
form: null,
body: {FeedContent: fs.createReadStream('/home/parveen/feedContentFile/Flat.File.PriceInventory.in.txt')},
headers: {
// 'Transfer-Encoding': 'chunked',
//'Content-Type': 'text/xml',
// 'Content-MD5':'ZDkwZTljZmRlNThhZWJhN2VhNzM4NWI2ZDc3YTFmMWU=',
// 'Content-Type': 'text/xml; charset=iso-8859-1'
},
}
I also tried without JSON and directly send the file read stream in the
body, i.e:
reqOpt = {
url: mwsReqUrl,
method: 'POST',
timeout: 40000,
form: null,
body: fs.createReadStream('/home/parveen/feedContentFile/Flat.File.PriceInventory.in.txt'),
headers: {
// 'Transfer-Encoding': 'chunked',
//'Content-Type': 'text/xml',
// 'Content-MD5':'ZDkwZTljZmRlNThhZWJhN2VhNzM4NWI2ZDc3YTFmMWU=',
// 'Content-Type': 'text/xml; charset=iso-8859-1'
},
}
But same error comes every time:
the Content-MD5 HTTP header you passed for your feed did not match the
Content-MD5 we calculated for your feed
I want to know where I am doing wrong or what is the right way to submit feed API and sending the file using request module.
I also tried with the code given on MWS to generate the MD5 but same
error occurred each time.
My .txt file as follows:
sku price quantity
TP-T2-00-M 2
Any help is much appreciated
finally i got the solution as Ravi said above. Actually there are few points i want to clear here for you all who are facing the same issue:-
Amazon marketplace API doc is not giving proper information and example. Even i guess the documentation is not updated . As in doc they said that ContentMD5Value parameter value is optional on this page
You can check there they clearly mention that the field is not required but if you not pass than they gives the error that you must pass content MD5 value.
So that is wrong. ContentMD5 is required attribute.
They said in the same doc that you need to send file data weather its a xml or flat-file in the field key name i.e. FeedContent.
But that is also not needed you can send the file with any name no
need to give FeedContent key for the file you just need to send the
file in stream.
They will give the same error of contentMD5 not match weather you send file or not because if they not found file than the contentMD5 you send will not match to that. SO if you are getting the ContentMD5 not match error than check the following:-
Check that you are generating the right MD5 code for your file you can check whether you are generating the right code or not by there java code they given on doc . You can get that from this link
Don't trust on online websites for generating the MD5 hash and base64 encoding.
If your MD5 is matched with the MD5 generated from Java code they given than one thing is clear that your MD5 is right so no need to change on that.
Once your MD5 is correct and after that also if you get the same error that is:-
Amazon MWS SubmitFeed Content-MD5 HTTP header did not match the
Content-MD5 calculated by Amazon
ContentMD5 not matched .Than you need to check only and only you file uploading mechanism.
Because now the file you are sending to Amazon is not either correct or you are not sending it in the right way.
Check for file upload
For checking whether or not you are sending the right file you need to check with following:-
You need to send the required parameters like sellerId, marketplaceId, AWSAccessKey etc. as query params.
You need to send the file in the form-data as multipart , if you are using the request module of node.js than you can see the above code given by Ravi.
you need to set the header as only:-
'Content-Type': 'application/x-www-form-urlencoded'
No need to send the header as chunked or tab separated etc because i don't need them any more they are even confuse me because somewhere someone write use this header on other place someone write use this header.
So finally as i am abel to submit this API i didn't need any of the header rather than application/x-www-form-urlencoded.
Example:-
reqOpt = {
url: mwsReqUrl,
method: 'POST',
formData: {
my_file: fs.createReadStream('file.txt')
},
headers: {
'Content-Type': 'application/x-www-form-urlencoded'
},
qs: { }// all the parameters that you are using while creating signature.
Code for creating the contentMD5 is:-
var fileData= fs.readFileSync('/home/parveen/Downloads/test/feed.txt','utf8');
var contentMD5Value = crypto.createHash('md5').update(fileData).digest('base64');
As i am facing the issue that is because i am using form and form-data simultaneously via request module so i convert my form data with qs(query string) and file in form-data as multipart.
So in this way you can successfully submit the API for submit feed.
Amazon requires the md5 hash of the file in base64 encoding.
Your code:
var fileReadStream = fs.createReadStream('/path/to/file.txt');
var file = fileReadStream.toString('base64'); //'[object Object]'
var contentMD5Value = crypto.createHash('md5').update(file).digest('base64');
wrongly assumes that a readStream's toString() will produce the file contents, when, in fact, this method is inherited from Object and produces the string '[object Object]'.
Base64-encoding that string always produces the 'FEGnkJwIfbvnzlmIG534uQ==' that you mentioned.
If you want to properly read and encode the hash, you can do the following:
var fileContents = fs.readFileSync('/path/to/file.txt'); // produces a byte Buffer
var contentMD5Value = crypto.createHash('md5').update(fileContents).digest('base64'); // properly encoded
which provides results equivalent to the following PHP snippet:
$contentMD5Value = base64_encode(md5_file('/path/to/file.txt', true));
Hey sorry for late reply but why don't you try to send the file in multipart in the form-data request and other queryStrings in 'qs' property of request module.
You can submit the request as follows:-
reqOpt = {
url: mwsReqUrl,
method: 'POST',
formData: {
my_file: fs.createReadStream('file.txt')
},
headers: {
'Content-Type': 'application/x-www-form-urlencoded'
},
qs: {
AWSAccessKeyId: '<your AWSAccessKeyId>',
SellerId: '<your SellerId>',
SignatureMethod: '<your SignatureMethod>',
SignatureVersion: '<your SignatureVersion>',
Timestamp: '<your Timestamp>',
Version: '<your Version>',
Action: 'SubmitFeed',
MarketplaceId: '<your MarketplaceId>',
FeedType: '_POST_FLAT_FILE_PRICEANDQUANTITYONLY_UPDATE_DATA_',
PurgeAndReplace: 'false',
ContentMD5Value: '<your file.txt ContentMD5Value>',
Signature: '<your Signature>'
}
}
request(reqOpt, function(err, res){
})
Probably, I'm too late, but here are key points for C#:
1) Multipart form-data didn't work at all. Finished with the following (simplified):
HttpContent content = new StringContent(xmlStr, Encoding.UTF8, "application/xml");
HttpClient client = new HttpClient();
client.PostAsync(query, content)
2) About query:
UriBuilder builder = new UriBuilder("https://mws.amazonservices.com/");
NameValueCollection query = HttpUtility.ParseQueryString(builder.Query);
query["AwsAccessKeyId"] = your_key_str;
query["FeedType"] = "_POST_ORDER_FULFILLMENT_DATA_";
... other required params
query["ContentMD5Value"] = Md5Base64(xmlStr);
builder.Query = query.ToString();
query = builder.ToString();
3) About Md5base64
public static string Md5Base64(string xmlStr)
{
byte[] plainTextBytes = Encoding.UTF8.GetBytes(xmlStr);
MD5CryptoServiceProvider provider = new MD5CryptoServiceProvider();
byte[] hash = provider.ComputeHash(plainTextBytes);
return Convert.ToBase64String(hash);
}

What is the correct syntax for winjs.xhr data

I tried to use winjs.xhr to POST some data to a URL with no success. I got it working by essentially doing the same thing with XMLHttpRequest. This just doesn't feel right, as winjs.xhr, I thought, wraps XMLHttpRequest anyway. Can anyone explain how I do this in winjs.xhr?
Not working winjs.xhr code
Passing everything in as a URL encoded string
var url = "http://localhost/paramecho.php";
var targetUri = "http://localhost/paramecho.php";
var formParams = "username=foo&password=bar" //prefixing with a '?' makes no difference
//ends up with same response passing an object(below) or string (above)
//var formObj = {username: "foo", password: "bar"}
WinJS.xhr({
type: "post",
url: targetUri,
data: formParams
}).then(function (xhr) {
console.log(xhr.responseText);
});
I end up with my receiving PHP file getting none of the parameters, as though I'd sent no data in the first place.
I tried a few things but the code above is the simplest example. If I was to pass an object into the data parameter it behaves the same way (commented out). I've used a FormData object as well as a plain JSON object.
I changed my app manifest to have the correct network capabilities - and the working example below was done in the same app, so I'm confident it's not capability-related.
Working version using XMLHttpRequest
var username = "foo";
var password = "bar";
var request = new XMLHttpRequest();
try {
request.open("POST", "http://localhost/paramecho.php", false);
request.setRequestHeader('Content-type', "application/x-www-form-urlencoded");
request.send("username=" + encodeURIComponent(username) + "&password=" + encodeURIComponent(password));
console.log(request.responseText);
} catch (e) {
console.log("networkError " + e.message + " " + e.description);
}
And this successfully calls my PHP server-side function, with the parameters I expected.
So, the question is...how do I achieve what I have working in XMLHttpRequest with winjs.xhr? It feels like winjs.xhr the way this is supposed to work (I'm a newbie at Windows 8 app development so I'm happy to be corrected)
You're completely right WiredPrairie. Passing in the header is all that is needed - I think I'd assumed that was the default for a post.
Working version:
WinJS.xhr({
type: "post",
url: targetUri,
data: formParams,
headers: {"Content-type": "application/x-www-form-urlencoded"}
}).then(function (xhr) {
console.log(xhr.responseText);
});

Reddit api session cookie format?

I'm trying to use reddit api to login with a bot using node.js and request.js. I've managed to figure out that I need both the modhash and the session cookie that are returned from the /api/login request. I have accessed both with the following code:
request.post({
uri: 'http://www.reddit.com/api/login',
json: true,
headers: agentheader,
qs : logincreds
},function(err,response,body){
if(err){
throw err;
} else {
if(response.statusCode == 200){
mhash = body.json.data.modhash;
session_cookie = body.json.data.cookie;
console.log(body.json.data.cookie);
console.log("login OK, modhash: "+mhash);
console.log("Session cookie: "+session_cookie);
agentheader = {
'user-agent': 'base10bot made by /u/01011110',
'X-Modhash' : mhash,
'Cookie' : 'reddit_session='+session_cookie
};
...
and in that last bit I set the custom headers with the modhash and the cookie in the way that someone on /r/redditdev told me it should be done. The session cookie doesn't work at all without the "reddit_session=" part, but with it everything I do returns a 403 forbidden.
I'm pretty sure the "cookie" is formatted wrong because when I log it, it shows up as a numerical id, a time-stamp, and a hash all comma delimited. Can someone help me figure out the right way to send this cookie header? Everything I find on google is either using python requests or bash.
I guess you are not parsing body as JSON. Try this -
data = JSON.parse(body);
session_cookie = data.json.data.cookie;

Resources