I've been experimenting a little with nodejs, writing a webscraper that uses proxies.
What i'm trying it to load all proxies from a proxy file. "proxies.txt" in this case.
What i don't understand is, while the webscraper is running it is giving me the following message.
(node:21470) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 pipe listeners added. Use emitter.setMaxListeners() to increase limit
This does not happen if i manually make an array of several proxies.
Here is the code where i load the proxies from file to array.
var proxy_array = [];
if (! proxy_array.length ) {
var proxy_file = fs.readFileSync('./proxies.txt', 'utf8');
var split = proxy_file.split('\n');
for(var i = 0; i < split.length; i++){
var trimmed_proxy = split[i].replace('\r', ''); //removes the \r that gets added while i split the list i think?
proxy_array.push(trimmed_proxy);
}
}
console.log(proxy_array); //it does return all proxies.
Thanks for any help in advance!
Greetings
Related
When i use worker_threads to handle a lot of complex logic that irrelevant with the main thread, i found that memory on the server very high.
Below is part of my simplified code.
main.js
const worker = new Worker(process.cwd() + "/worker.js")
// My business will repeat cycle infinitely, this code just an example
for (let i = 0; i < 1000000; i++) {
worker.postMessage(i)
}
woker.js
parentPort.on("message", async data => {
// a log of logic....
})
When I run this scripts on the server, the memory will keep increasing, Maybe the thread is not shut down?
I tried using "worker.removeAllListeners()", resourceLimits option and third party library "workerpool", but still didn't solve my problem.
what should i do or using other method to solve this problem? Thanks for answering me!
I would like to test code that reports when a worker_thread OOMs. Or at least that the parent is OK despite the thread crashing due to OOM reasons. I would like to specifically test Node.js killing a worker_thread.
I'm not even sure this is particularly testable since the environment in which Node.js is running seems to make a difference. Setting low old generation size (docs) using resource limits does not behave the way I thought it would, it seems Node.js and the OS are doing a lot of clever things to keep the process from blowing up in memory. The closest I have gotten is that both Node.js AND the worker are killed. But again, this is not the behaviour I was expecting - Node.js should be killing the worker.
Is there any way to reliably test for something like this? Currently running v16.13.2.
[EDIT]
Here is some sample JS for the worker:
const { isMainThread } = require('worker_threads');
if (!isMainThread) {
let as = [];
for (let i = 0; i < 100; i++) {
let a = '';
for (let i = 0; i < 9000000; i++) a += i.toString();
as.push(a);
}
}
OK, it looks like adding maxYoungGenerationSizeMb in my case is resulting in the behaviour I was looking for: e.g. { maxOldGenerationSizeMb: 10, maxYoungGenerationSizeMb: 10 }.
I'm trying to list a bunch of products and I wanted to request data on node and build the page in a static way, so The homepage would be faster.
The problem is that when I make over 80 request on GetStaticProps.
The following code with 80 items, does work
const urlList = [];
for (let i = 1; i <= 80; i++) {
const url = `myApiUrl`;
urlList.push(url);
}
const promises = urlList.map(url => axios.get(url));
const responses = await Promise.all(promises);
return responses;
The following code with 880 items, does not work
(Note that is does work outside of GetStaticProps))
const urlList = [];
for (let i = 1; i <= 880; i++) {
const url = `myApiUrl`;
urlList.push(url);
}
const promises = urlList.map(url => axios.get(url));
const responses = await Promise.all(promises);
return responses;
erro on console:
Uncaught at TLSWrap.onStreamRead (internal/stream_base_commons.js:209:20)
webpage error:
Server Error
Error
This error happened while generating the page. Any console logs will be displayed in the terminal window.
TLSWrap.onStreamRead
internal/stream_base_commons.js (209:20)
Is there a way to handle large requests amount like that?
I'm new to hhtp requests, is there a way for me to optimize that?
There are limits to how many connections you can create to fetch content. What you're seeing is that a method like Promise.all() isn't "smart" enough to avoid running into such limits.
Basically, when you call Promise.all() you tell the computer "do all these things simultaneously, the order does not matter, and give me all the output when done. And by the way, if a single of those operations fail stop everything and throw away all other results". It's very useful in many contexts, but perhaps not when trying to fetch over 800 things from the net..
So yes, unless you can tweak the requirements like number of allowed simultaneous connections or memory the script gets to use, you'll likely have to do this in batches. Perhaps one Promise.all() for slices of 100 jobs at a time, then next slice. You could look at using the async library and the mapLimit method or roll your own way to slice the list of jobs into batches.
this could be a problem based on the node version its using
but for await could also be an option for you...
You can leverage axios.all instead of Promise.all.
const urlList = [];
for (let i = 1; i <= 80; i++) {
const url = `myApiUrl`;
urlList.push(url);
}
const promises = urlList.map(url => axios.get(url));
const responses = await axios.all(promises);
return responses;
https://codesandbox.io/s/multiple-requests-axios-forked-nx1z9?file=/src/index.js
As for step, for debugging purposes I would use Promise.allSettled instead of Promise.all. This should help you to understand what is the error returned by the HTTP socket. If you don't control the external API, it is likely that a firewall is blocking you from this "DDOS" attack.
As you said, batching the call doesn't solve the issue (if you queue 80 requests followed by 80 etc, you may encounter the rate limit in any case)
You should check for throttling issues, and use a module to speed limit your HTTP call like throttle-debounce
I have streaming data coming in with IP address. I want to translate the IP to longitude and latitude before putting the data into my database.
This is what I was doing but it is causing some issues. I also tried putting locationObject outside the for loop. That weirdly is using a lot of memory. I know this is blocking code but it should be fast. Though I see memory issue as data object is coming from a stream continuously ans each data object is huge.
for (var i ==0; i < data.length; i++){
if (data.client_ip !== null) {
var locationLookup = maxmind.openSync('./GeoIP2-City.mmdb');
var ip = data.client_ip;
var maxmindObj = locationLookup.get(ip);
locationObject.country = maxmindObj.country.names.en;
locationObject.latitude = maxmindObj.location.latitude;
locationObject.longitude = maxmindObj.location.longitude;
}
}
Again trying to put maxmind.openSync('./GeoIP2-City.mmdb'); outside fr loop is causing huge increase in memory.
The Other option is to use nonblocking code
maxmind.open('/path/to/GeoLite2-City.mmdb', (err, cityLookup) => {
var city = cityLookup.get('66.6.44.4');
});
But I don't think this is a good dea to put this inside a loop.
How can I handle this? I am getting data object every minute
https://github.com/runk/node-maxmind
I'm not sure why you think reading the database file for each iteration of the loop would be fast ("blocking code" doesn't equal "fast code"), it's much better to read the database file once and then loop over data.
maxmind.openSync() will read the entire database into memory, which is mentioned in the README:
Be careful with sync version! Since mmdb files are quite large
(city database is about 100Mb) fs.readFileSync blocks whole
process while it reads file into buffer.
If you don't have memory to spare, the only other option would be to open the file asynchronously. Again, not inside the loop, but outside of it:
maxmind.open("./GeoIP2-City.mmdb", (err, locationLookup) => {
for (var i = 0; i < data.length; i++) {
if (data.client_ip !== null) {
var ip = data.client_ip;
var maxmindObj = locationLookup.get(ip);
locationObject.country = maxmindObj.country.names.en;
locationObject.latitude = maxmindObj.location.latitude;
locationObject.longitude = maxmindObj.location.longitude;
}
}
});
The only thing I am worried is over time I call this function so many times. every time my consumers read jsonObject from kakfa (happening every minute). is there a much better way to optimize that as well. so I call this function every minute. How can I better optimize this further
function processData(jsonObject) {
maxmind.open('./GeoIP2-City.mmdb', function(err, locationLookup) {
if (err) {
logger.error('something went wrong on maxmind fetch', err);
}
for (var i = 0; i < jsonObject.length; i++) { ...}
})
}
Edited Question
I was trying to understand why there is memory leak in simple function call. why node does not release memory as local scope is ended.
Thanks in advance
function somefunction()
{
var n = 20000;
var x ={};
for(var i=0; i<n; i++){
x['some'+i] = {"abc" : ("abc#yxy.com"+i)};
}
}
// Memory Leak
var init = process.memoryUsage();
somefunction();
var end = process.memoryUsage();
console.log("memory consumed 2nd Call : "+((end.rss-init.rss)/1024)+" KB");
PREVIOUS ANSWER before the question was edited to correct a code error:
The results are invalid because this code doesn't invoke the function:
(function(){
somefunction();
});
The anonymous function is declared but not invoked. So it does not use much in the way of resources.
You need to invoke the function:
(function(){
somefunction();
}());
#Mohit, Both strategy taking same memory consumption. Run each code separately and check by yourself.
EDIT:
Wait for gc. When gc will call then memory should be free. Try to call gc explicitly then check it.