How can I create an authenticated scraper for Amazon product detail in Node.js? - node.js

I'm creating a script that grabs all the shipped items from Amazon and notifies me.
Authentication is needed to see the products though.
I've already tried sending a post request through "request" which returns an error because of the cookies and extra parameters needed.
It would be easy using cheerio afterwards to get the data if the authentication works.
Does anyone have any idea on how we can authenticate successfully?
The link from the email is: https://www.amazon.com/ap/signin/185-3199906-8918341?_encoding=UTF8&accountStatusPolicy=P1&openid.assoc_handle=usflex&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.mode=checkid_setup&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0&openid.ns.pape=http%3A%2F%2Fspecs.openid.net%2Fextensions%2Fpape%2F1.0&openid.pape.max_auth_age=0&openid.return_to=https%3A%2F%2Fwww.amazon.com%2Fgp%2Fyour-account%2Forder-details%2F185-3199906-8918341%3Fie%3DUTF8%26eoid%3D1%253A1%253Arv%252FYwjiYmnOZY9MYltVnDyf2l6p5pMkMx9deoUeiiw%252FKpPrtZrWqs5l1GGQPVb%2520qaJqHXyCkPEpLZnmDZamKkVDWhtu3dKlW5Gx7Uvxtzs0xlPJ25vduijJrPpHt79P%2520RRZHopOtAyOP4s82VLoeeiDQgq%2520FCP540H%2520UYAV7goZQxB29WObWAVh8VveTwEeWenY3sTx8ZI9%252FBLM2BSqS3IUIURW8mzMnAB9t7wglUiAcoR%252FcUhSIx%25201eNV4MspVAp7fLkeANag72BxgmsjFfRhnsxfji1VhZXLawqFeK9SBnvbUfkNWUC%2520IXWh6VcuoStBG3x%2520ZUkzGHw1ORi4J%2520Hg%253D%253D%26orderID%3D105-6914722-5422613%26ref_%3DTE_simp_on_T1&pageId=webcs-yourorder&showRmrMe=1

You cannot guarantee any of the form input values of the sign in page. So you must also scrape the login form.
Here is the process:
In your server, make Request to the URL in your question
Using Cheerio parse the DOM and grab all of the form fields from "#ap_signin_form".
Add in your data (Username/Pass) then make a POST request to the form action "https://www.amazon.com/ap/signin" (This should also be scraped)
Hopefully that will get you past the login screen. You will need to ensure all future requests pass the cookies set from login.
Now this kind of thing is clearly against most TOS's so I would urge caution in doing this kind of thing often.

Related

Cakephp 3 Stripe payment form

I'm hoping someone out there can help me understand how Stripe should work with Cakephp 3.
I have a form with the Stripe payment fields and a couple of fields for my cake app. From the Stripe documentation this seems to be an acceptable way to set it up.
The HTML is fairly standard, but note how every input for sensitive
data—number, CVC, expiration, and postal code—omits the name
attribute. By omitting a name, the user-supplied data in those fields
won't be passed to your server when the form is submitted. Each
element also includes a data-stripe attribute, to be discussed later.
I'm using cakephp 3 now which doesn't seem to allow me to remove properties from the HTML generated by the form helper. I can only make the 'name' property be blank. I queried this with Stripe support and they were a little noncommittal. They are generally very good but in this instance the answer seemed to be 'better safe than sorry.'
My main question is this: does it really matter if you don't use the form helper for Stripe fields. The main benefit i can find in the cake docs is that the CSRF component will act on those fields. I am using the CSRF component throughout my app, but since the Stripe fields aren't even sent to the server the CSRF component is irrelevant. Isn't it?
Here's an excerpt from the Cakephp manual:
The CsrfComponent works by setting a cookie to the user’s browser.
When forms are created with the Cake\View\Helper\FormHelper, a hidden
field is added containing the CSRF token. During the
Controller.startup event, if the request is a POST, PUT, DELETE, PATCH
request the component will compare the request data & cookie value. If
either is missing or the two values mismatch the component will throw
aCake\Network\Exception\InvalidCsrfTokenException.
I can still use the form helper for the few fields that do get submitted to the database, and just add the Stripe fields with HTML?
Does that make sense?
Stripe support did suggest having two separate forms, one for the cake data and one for the Stripe data, but since their docs say you can add the Stripe fields to a form that gets submitted to the server that seems a bit odd.
I would really appreciate some input on this as it seems even Stripe themselves aren't sure how to structure a cakephp payment form!
Yes raw HTML appears to be the way to go.
Here's what i did.
Used the form helper to start and end the form (This means form tampering and CSRF will work for your non Stripe fields)
Added the Stripe fields within the Cake form using HTML (I haven't tested the HTML fields to see if the form tampering works on them. I'll test that later and post back)
Used the Form helper to unlock the stripeToken field so it could be added to the form without the form tampering blackholing the request.
Once i set all this up I used echo debug($_POST) in my controller to see what the form was submitting to the server and the only Stripe field that was showing up was stripeToken.
So it appears to me that this is working as it should.

How does body-parser know if we sign up or log in?

I'm building my user authentication and the tutorials don't really get into details in 100%.
The examples always show how you get logged in if you submit data, assuming it's in the database already.
But on many HTML pages, you see sign in and log in button next to each other. How can I make body parser recognize the required function depending on which button is pressed? Both cases involve submitted data after all.
Thanks a lot!
PS. I'm using EJS view engine.
Body parser has nothing to do with log in or sign in.
It's purpose is to parse request's body by checking request header Content-type and using appropriate method to parse it. For example if content is json it will use JSON.parse and assign the result to req.body.
The buttons could be anchors with href set to /login or /signin so on your server you need login and signin routes to handle each of the actions

Docusign API - on Cancel get url to return to

Using the API, I generate the document and place my tags. User goes to sign and decides to finish later. It passes an event="cancelled" to my callback. I make note in my system that they cancelled it. But how can I get them back to the form later? There is no "authentication" with this. I do all the authentication on my end. So I need a URL to send them back to. Using the URL the api gives me to load in iframe expires. So how do I sent them Back to document?
I use PHP by the way.
Here are 2 possible causes (there may be others):
A 'Signing ceremony UX URL' has a short time to live (300 seconds or less I believe) so ensure you are making the call to retrieve a URL right when the user intends to perform the signing session.
A 'Signing ceremony UX URL' can only be rendered once. Verify that you are in fact retrieving a new URL each time and not reusing the original URL.
See this recipe from the DocuSign Developer Center. Repeat step 3 to retrieve a new URL to present to your user.

How do I submit and process request params with an Oauth request?

I am using the MeanJS stack to develop a web application. The issue I'm having, is that my regular signup process has some unique parameters that are not common to an Oauth user profile. So, when I have the user go to signup with facebook, I move them to a new Signup form, that has them fill in the extra parameters, and then click "signup with facebook."
The routes are the same as the common MeanJS routes found here:
https://github.com/meanjs/mean/blob/master/app/routes/users.server.routes.js
Specifically these lines:
app.route('/auth/facebook').get(passport.authenticate('facebook', {
scope: ['email']
}));
app.route('/auth/facebook/callback').get(users.oauthCallback('facebook'));
What I would like to do, is have the extra parameters attached to the request object, so that when the auth process reaches the exports.saveOAuthUserProfile inside of: https://github.com/meanjs/mean/blob/master/app/controllers/users/users.authentication.server.controller.js
this function will be able to access those parameters and save them as part of the user model.
I have tried attaching parameters to the Get request and accessing
req.params.paramId
but this will not work, because you cannot register a param loaded request with the facebook api (or at least it seems to be that way).
And I have read elsewhere on StackOverflow that you can load the request State, but that seems really odd to me. Here's the link for that: Facebook OAuth: custom callback_uri parameters
So, any guidance on how to load the extra data into the Oauth request, so that when I save the user profile I can access it and save it, would be great.
Thanks guys.
It's not really clear to me what you want to achieve. What are the "extra" fields, and why don't you derive the fields from the user's profile directly?. About 98% of the FB Login implementations I've seen just use the "Login with Facebook" button and get the user's data during the OAuth process itself, and not manually entered by the user.
With passport-facebook for example, it's possible that you
a) Configure custom scopes
https://github.com/jaredhanson/passport-facebook#extended-permissions
b) Enrich the profile object with the custom scope's data
https://github.com/jaredhanson/passport-facebook#profile-fields
The profile object will then contain the additional requested fields automatically, and you don't need to tweak the request object itself.

Is it advisable to configure a URL with both post and get?

I have a link on my page that redirects to another page but the request is sent through POST method. Now when the user refreshes the new page the request is sent through GET method. The URL is just used to display a page. My question is, is it advisable to use both POST and GET for the same URL call or will it cause problems related to security or any other? If so please do explain how.
No. Use POST if the link is executing an unsafe action (e.g. logout, change password) or it is sending sensitive data you do not want displayed in the URL (URLs are logged by default in the browser history and by many appliances, proxies and web servers). POST can also be used when a large amount of information is to be sent.
Otherwise use GET.

Resources