I know it sounds a lot like other issues here in Stackoverflow, bear with me, it's not (not that I could tell)
I have a scraping app (using Puppeteer) that I use to scrape an Amazon public page.
It works great, I've debugged it by setting the headless: false and I see it works, and it gives me back the expected result.
The same app fails on Heroku, but the problem is not with launching or using Puppeteer (I have several indications), but probably because I'm being identified as a robot.
The error returned is:
waiting for selector `#link_continue input` failed: timeout 30000ms exceeded
Important to say that the error is a generic Puppeteer error that indicates that the selector I'm waiting for just doesn't appear on-page.
I know it should as it's a selector on the first page I navigate to, and it works locally (as mentioned before) - the selector always exists if the page loads.
I had the exactly same error when I've tried to run the scraping on my local machine before setting a User-Agent header. But at that time I could use the headless:false so I saw in my eyes that I'm being rejected due to illegal operations on their page (robots-like operations) so I was redirected to an error page that didn't contain this selector on it.
For this reason, I suspect it recognizes me as a robot, but I don't know how to debug it, it drives me crazy.
Now, if you'd like to reproduce the problem:
You need to wait for the mentioned selector on this site:
https://sellercentral.amazon.com/hz/fba/profitabilitycalculator/index
and then deploy it to Heroku and try to run it maybe 2-3 times
** Two questions: **
How can I proceed from here, I'm 99.9% sure it's the same issue I had previously, but I can't verify... any suggestions?
Given that this is actually the problem, can anyone suggest an easy-to-use/deploy hosting that also allow easy VPN configuration? I think Heroku doesn't give you to do that unless you have an enterprise account
Thanks
I would like to point out that Amazon is very good at blocking IPs. It is very likely that they already blacklisted IPs of cloud services like Heroku, Azure, etc... Previously I have observed services like Cloudflare, Akamai etc... blacklisting these known IPs.
In this scenario Rotating proxies could help you to avoid getting blocked.
I'm using Heroku to host a node.js server where a variable that stores the number of times every user that used the site has clicked something on it. When clicked, the variable gets increased by 1. However, Heroku does this thing where inactivity for 15 mins causes the site to go to sleep and everything is reset. I tried to use node.js to write to a file and save it but it seems the files are also reset. Does anyone know a way to get the data saved even after Heroku declares it inactive?
There is no way around it, since Heroku gets rid of files after inactivity. You need some external storage like a MongoDB set up somewhere else.
I need to find the best possible way to do the following task.
I have different users let's say (over 500) and all users have a scheduled function that need to be run twice every day. But if any of the user's phone is off. Then of course that function wont work since its code is written on client side.
now what i want to do is Run a scheduled function in the backend using Node js, but idk how to run that for every user. (note : every user has different schedules). Thats why i wrote that in client side, but with with a possibility of phone might be switched off so its bit off to do that.
What should i do in this scenario? any leads?
I am randomly getting a login prompt asking me to log in to localhost:3000 while working on my project (in localhost). Sometimes it does not show up the whole day and next morning I get 10 prompts in a row (canceling through pressing escape), and then it starts working properly again.
These are the errors I get in the console when it has been canceled:
I suspect one of the following, but really I have no idea:
Webpack is not configured correctly
There is some resource that requires authentication (but I can't figure out which), I am running everything on localhost
There is something wrong with the node.js server thingy...
Any tips on what might be going on are very welcome.
Edit: My backend is mocked in localhost.
Edit 2: I found this: Receiving login prompt using integrated windows authentication and it is possible that the thing that is causing the error is some faulty configuration in my IIS. But it is difficult to verify since it only shows up randomly.
Webpack has nothing to do with your API calls.
401 errors are due to auth credentials failing.
It means that your backend ask you to update your token(s) used for auth in API calls.
Before I asked, I`ve searched a lot and there are many articles about it. But my question is a little deeper.
I have an application using Nodejs Expressjs + MongoDB + Reddit + PM2 clustering mode + Bitcoin and card getaway + API system.
My problem is when I'm developing this application in real mode and it`s really awful. sometimes I release little updates in codes, and I press "pm2 log" it shows me some error in syntax or something else and I try to fix that and release again. During this time, the application with many users is down.
Also, I have to say something such as Bitcoin payment, needs real tests. Needs request and response from Blockchain. How can I have a test environment that I can test everything exactly same as real mode and then if everything was fine, then deploy that to real mode?
An environment that easy to code and test then easy to deploy? Can Mocha help me exactly what I need? I`m using PM2 clustering mode.
Your question is not a proper question, but rather a layer of questions, some opinion-based, some too broad to answer. But let's try breaking it down.
The stated problem is when I'm developing this application in real mode.... I release updates ... it shows me syntax error... application is down. I will read it as the main problem is that you're developing on a production environment. Let's forget for a while how lousy a practice that is, and let's focus on something constructive.
Let's define rough steps to take.
Live environment
The most pressing problem seems to be that you work on live app, where crashing it during dev means crashing it for your users as well. Let's deal with that.
Immediatelly change all your access codes, keys, usernames and passwords so that you store them in an environment file (which is safe to encrypt and backup, but not commited in source code), say, environment-prod.env.
Then create a second set of credentials for all the services that you use. For MongoDB, e.g. it's easy, just create a local database instance, called, say, test_database. For Reddit, create a second app, cal it my-app-test, for example. Some services might have an option to create a set of test credentials right there in the app, with others you'll simply have one app for test, one for production.
Create a new environment file, e.g. test-environment.env, with all the same keys (e.g. REDDIT_APPID, REDDIT_SECRET, MONGODB_URL, BLOCKCHAIN_GATEWAY_KEY etc), but new values.
Now, for one, you have a test environment. Make an alias, e.g. alias dev="cd $HOME/projects/my-reddit-bitcoin-app && source test-environment.env". Every day you come to work on the app, type dev, then you can start pm2 etc and work safely in dev environment. Your users will never see your crashes.
Only when you're sure you have a new feature or bugfix completed, switch environment (source environment-production.env) and then deploy the new app to the server where it runs, and pm2 restart or whatever you use for these deployments. Switch back to test env immediately before working on the code anew.
Read up more on how to separate test/prod environments. Read a bit on git workflows (e.g. branch off of latest master to a feature-branch or bugfix branch, when tested, merge it back in. Then tag it "release-" and deploy to production. Then go automate all that if possible.)
Testing
Mocha is perfectly suitable for running tests for a Node/Express app. It's the tests that matter.
You say bitcoin payment....Needs request and response. Let's see how to do that.
Add [nock])(https://www.npmjs.com/package/nock) to your app (npm i -D nock).
import it and put it on top of your test file. E.g. at the top of the some-test.spec.js file:
const nock = require('nock')
start recording requests e.g. add this in your before() block in the app:
describe('My tests', function () {
before(function () {
nock.recorder.rec();
});
// ... tests
Now, run one test at a time (e.g. write one test that does one specific task from your app) and check what's in the console. E.g. if you make a request (request.post('http://reddit.com/api/submit', jsonData)), you'll see nock printing the exact response (in JSON format) in console, as the test runs. Copy that in the test file, e.g. put it at the bottom as:
var testResponse = <whatever was in the console in json format. Or string, whatever>. // homework is to find out why var and not const, if this is at the end of the file.
Now stop the recorder (comment it out), and in your actual test, run this instead:
const pipe = nock('http://www.example.com')
.get('/resource')
.reply(200, testResponse);
do that for all your requests.
Now what you have is a test setup so that when you change the code, it should not run against the real Reddit api, or real payment gateway api, but get your mocked responses instead. Pair it up with some good assertions and you should be fine. Make sure you mock everything. If you add new types of requests, make sure to record them, and add them to your procedure.
Now, all this is very vague. Broad. Just one way to do it. Lengthy process. Probably not the best one. Not tailored to your specific conditions. But it should get you started. Take those things, step by step, and if you get stuck, come back to Stackoverflow. But do start working on it, because your current method seems to be unsustainable in the long run.