I'm trying to get all of the paginated pages from a api that allows a max of 250 records pr call/page.
But I can't make it work a 100% - I get the data but it do not resolve the right way- i hangs.
I use https.request and I don't want to use fetch or any other kind fetching.
the code:
moreThan250(nextLink) {
return new Promise((resolve, reject) => {
try {
let body = []
options.path = "/admin/api/2022-07/" + nextLink
let time = parseInt(new Date().getTime() / 100000) // del med 10000 så filnavnet ikke skifter navn undervejs men skifter fra ca hver tredje min (burde være tid nok til at hente alle produkter)
console.log("moreThan250 time", time);
const req = https.request(options, (res) => {
res.on('data', (d) => {
body.push(d)
})
res.on("end", () => {
body = Buffer.concat(body).toString()
fs.appendFile("src/data/allProducts.json", body + ",", (e) => console.log(e))
const headerLink = res.headers.link
if (headerLink) {
const match = headerLink.match(/<[^;]+\/(\w+\.json[^;]+)>;\srel="next"/);
const nextLink = match ? match[1] : false;
if (nextLink) {
this.moreThan250(nextLink)
}
else {
console.log("moreThan250 end inside", body.length);
resolve()
}
} else {
resolve()
}
})
});
req.end()
} catch (error) {
console.log("moreThan250 error", error);
reject(false)
}
})
}
async function go(){
let tmp = await this.moreThan250(`products.json?limit=250`)
if(tmp) console.log("getAllProducts hentede moreThan250", tmp);
else console.log("no tmp");
}
go()
logs out "no tmp"
What I want is the collected data/body, so how do I return all the data to tmp (the appendFile got all the right data)
Related
I am writing an API endpoint which searches Google Book's API to add book titles to the notebooks in my MongoDB Database based on their Id's. I have it working properly, but sometimes my endpoint's response array returns missing the last item. My console logs seem to indicate my endpoint response is being sent before the Https request has finished fetching all the data. I need to solve the endpoint sometimes returning an incomplete array. I think I may need to include an async await somewhere but I'm not sure where. Any help is much appreciated!
Here is my code:
//GET a user's books with notes
const getNotebooks = async (req, res) => {
...
const books = await Book.find({ userId }).sort({createdAt: -1}) //gets all notebooks as array
let results = []
for (let i = 0; i < books.length; i++){
console.log(i + " - " + books[i].gId)
const item = {
id: books[i].userId,
gId: books[i].gId,
title: '',
}
const request = https.request(apiUrl + books[i].gId, async (response) => {
let data = '';
response.on('data', (chunk) => {
data = data + chunk.toString();
});
response.on('end', () => {
const body = JSON.parse(data);
item.title = body.volumeInfo.title
results.push(item)
if (i == books.length - 1){
res.status(200).json(results)
}
});
})
request.on('error', (error) => {
console.log('An error', error);
res.status(400).json({error: error})
});
request.end()
}}
You can use async-await with Promises only, since Node's core https module does not have any build-in promise support, you will have to convert it first into the promise format, then you can use async-await with it, i am not familiar with the standard https module but i will make an example with your code
const getNotebooks = async (req, res) => {
//...
const books = await Book.find({ userId }).sort({createdAt: -1}) //gets all notebooks as array
let results = []
for (let i = 0; i < books.length; i++){
console.log(i + " - " + books[i].gId)
const item = {
id: books[i].userId,
gId: books[i].gId,
title: '',
}
const options = apiUrl + books[i].gId
// now you can use await to this function which return a promise
await makeRequest(options, res, item, results, books )
}
}
// Making this function into a promise outside of the scope from your getNotebooks function.
function makeRequest(options, res, item, results, books) {
return new Promise((resolve, rejects) => {
const request = https.request(options, (response) => {
let data = '';
response.on('data', (chunk) => {
data = data + chunk.toString();
});
response.on('end', () => {
const body = JSON.parse(data);
resolve(body);
item.title = body.volumeInfo.title
results.push(item)
if (i == books.length - 1){
res.status(200).json(results)
}
});
});
request.on('error', (error) => {
console.log('An error', error);
rejects(error);
res.status(400).json({error: error})
});
request.end()
});
};
Alternatively you can also use a more modern way to request data, for example with fetch which is now available in node, or even Axios library, if you don't want any dependency, fetch should be the way to go, an example if there is no mistake and assuming is a GET method request:
const getNotebooks = async (req, res) => {
//...
try {
const books = await Book.find({ userId }).sort({createdAt: -1}) //gets all notebooks as array
let results = []
for (let i = 0; i < books.length; i++) {
console.log(i + " - " + books[i].gId)
const item = {
id: books[i].userId,
gId: books[i].gId,
title: '',
}
// using fetch on node js is now possible without any package to install
const response = await fetch(apiUrl + books[i].gId)
const data = await response.json();
item.title = data.volumeInfo.title
results.push(item)
if (i == books.length - 1){
res.status(200).json(results)
}
}
} catch (error) {
res.status(400).json({error: error})
};
};
As you can see you would write 2 line code instead of 20+ line.
I am trying to fetch a list of all companies listed in stock market from an external API, and after getting the list, I am trying to fetch all details regarding individual companies including graph data. It was all working fine. However, today I am getting socket hangup error. I have tried going through other posts here in stackoverflow. However, none of them works.
const request = require('request');
const fetchAPI = apiPath => {
return new Promise(function (resolve, reject) {
request(apiPath, function (error, response, body) {
if (!error && response.statusCode == 200) {
resolve(body);
} else {
reject(error);
}
});
});
}
// get list of all companies listed in
const fetchCompanyDetails = () => {
return new Promise(function (resolve, reject) {
let details = [];
fetchAPI('https://api//')
.then(res => {
res = JSON.parse(res)
details.push(res);
resolve(details);
})
.catch(err => {
console.log("error at fetchcompany details" + err);
})
});
}
const getDateAndPriceForGraphData = (graphData) => {
let res = []
graphData.forEach(data => {
let d = {}
d["x"] = new Date(data.businessDate).getTime() / 1000
d["y"] = data.lastTradedPrice
res.push(d)
})
return res
}
// get graph data for individual assets
const getGraphDataForAssets = (assetID) => {
return new Promise((resolve, reject) => {
let details = {};
fetchAPI(`https://api/${assetID}`)
.then(async (res) => {
res = JSON.parse(res)
let data = await getDateAndPriceForGraphData(res)
details = data
resolve(details);
})
.catch(err => {
console.log("error at getGraphDataForAssets" + err);
})
});
}
// fetch data about individual assets
const fetchAssetDetailsOfIndividualCompanies = (assetID) => {
return new Promise((resolve, reject) => {
let details = {"assetData" : {}, "graphData": {}};
fetchAPI(`https://api/${assetID}`)
.then(async (res1) => {
res1 = JSON.parse(res1)
details["assetData"] = res1
// get graph data
var graphData = await getGraphDataForAssets(assetID)
details["graphData"] = graphData
resolve(details);
})
.catch(err => {
console.log("error at fetchAssetDetailsOfIndividualCompanies" + err);
reject(err)
})
});
}
// returns list of details of all tradeable assets (Active and Suspended but not delisted)
const fetchDetailsForEachCompany = async (companyList) => {
let result = []
await Promise.all(companyList.map(async (company) => {
try {
// return data for active and suspended assets
if(company.status != "D") {
let companyData = await fetchAssetDetailsOfIndividualCompanies(company.id)
result.push(companyData)
}
} catch (error) {
console.log('error at fetchDetailsForEachCompany'+ error);
}
}))
return result
}
exports.fetchAssetDetails = async () => {
let companyDetails = await fetchCompanyDetails()
let det = await fetchDetailsForEachCompany(companyDetails[0])
return det
}
To expand on what I meant with not needing those new Promise()s, this would be an idiomatic async function refactoring for the above code.
I eliminated getGraphDataForAssets, since it was eventually not used; fetchAssetDetailsOfIndividualCompanies fetched the same data (based on URL, anyway), and then had getGraphDataForAssets fetch it again.
const request = require("request");
function fetchAPI(apiPath) {
return new Promise(function (resolve, reject) {
request(apiPath, function (error, response, body) {
if (!error && response.statusCode === 200) {
resolve(body);
} else {
reject(error);
}
});
});
}
async function fetchJSON(url) {
return JSON.parse(await fetchAPI(url));
}
async function fetchCompanyDetails() {
return [await fetchAPI("https://api//")];
}
function getDateAndPriceForGraphData(graphData) {
return graphData.map((data) => ({
x: new Date(data.businessDate).getTime() / 1000,
y: data.lastTradedPrice,
}));
}
// fetch data about individual assets
async function fetchAssetDetailsOfIndividualCompanies(assetID) {
const assetData = await fetchJSON(`https://api/${assetID}`);
const graphData = getDateAndPriceForGraphData(assetData);
return { assetID, assetData, graphData };
}
// returns list of details of all tradeable assets (Active and Suspended but not delisted)
async function fetchDetailsForEachCompany(companyList) {
const promises = companyList.map(async (company) => {
if (company.status === "D") return null;
return fetchAssetDetailsOfIndividualCompanies(company.id);
});
const results = await Promise.all(promises);
return results.filter(Boolean); // drop nulls
}
async function fetchAssetDetails() {
const companyDetails = await fetchCompanyDetails();
return await fetchDetailsForEachCompany(companyDetails[0]);
}
exports.fetchAssetDetails = fetchAssetDetails;
I have a code to fetch directory names from first API. For every directory, need to get the file name from a second API. I am using something like this in my Node JS code -
async function main_function(req, res) {
const response = await fetch(...)
.then((response) => {
if (response.ok) {
return response.text();
} else {
return "";
}
})
.then((data) => {
dirs = ...some logic to extract number of directories...
const tempPromises = [];
for (i = 0; i < dirs.length; i++) {
tempPromises.push(getFilename(i));
}
console.log(tempPromises); // Prints [ Promise { <pending> } ]
Promise.all(tempPromises).then((result_new) => {
console.log(result_new); // This prints "undefined"
res.send({ status: "ok" });
});
});
}
async function getFilename(inp_a) {
const response = await fetch(...)
.then((response) => {
if (response.ok) {
return response.text();
} else {
return "";
}
})
.then((data) => {
return new Promise((resolve) => {
resolve("Temp Name");
});
});
}
What am I missing here?
Your getFilename() doesn't seem to be returning anything i.e it's returning undefined. Try returning response at the end of the function,
async function getFilename(inp_a) {
const response = ...
return response;
}
Thanks to Mat J for the comment. I was able to simplify my code and also learn when no to use chaining.
Also thanks to Shadab's answer which helped me know that async function always returns a promise and it was that default promise being returned and not the actual string. Wasn't aware of that. (I am pretty new to JS)
Here's my final code/logic which works -
async function main_function(req,res){
try{
const response = await fetch(...)
const resp = await response.text();
dirs = ...some logic to extract number of directories...
const tempPromises = [];
for (i = 0; i < dirs.length; i++) {
tempPromises.push(getFilename(i));
}
Promise.all(tempPromises).then((result_new) => {
console.log(result_new);
res.send({ status: "ok" });
});
}
catch(err){
console.log(err)
res.send({"status" : "error"})
}
}
async function getFilename(inp_a) {
const response = await fetch(...)
respText = await response.text();
return("Temp Name"); //
}
I'm new to Node.js and I'm creating a simple pagination page. The REST API works fine, but consuming it has left me in limbo.
Here is the REST API (other parts have been taken out for brevity)
const data = req.query.pageNo;
const pageNo =
(typeof data === 'undefined' || data < 1) ? 1 : parseInt(req.query.pageNo);
let query = {};
const total = 10;
query.skip = (total * pageNo) - total;
query.limit = total;
try {
const totalCount = await Users.countDocuments();
const pageTotal = Math.ceil(totalCount / total);
const users = await Users.find({}, {}, query);
return res.status(200).json(users);
} catch (error) {
console.log('Error ', error);
return res.status(400).send(error)
};
};
When I return the json with just the 'users' object, like so return res.status(200).json(users); the page renders correctly, but when I pass in other objects like what I have in the code, it fails. This is how I'm consuming the API:
const renderHomepage = (req, res, responseBody) => {
let message = null;
if (!(responseBody instanceof Array)) {
message = 'API lookup error';
responseBody = [];
} else {
if (!responseBody.length) {
message = 'No users found nearby';
}
}
res.render('users-list', {
title: 'Home Page',
users: responseBody,
message: message
});
}
const homelist = (req, res) => {
const path = '/api/users';
const requestOptions = {
url: `${apiOptions.server}${path}`,
method: 'GET',
json: true,
};
request(
requestOptions,
(err, {statusCode}, body) => {
if (err) {
console.log('Ther was an error ', err);
} else if (statusCode === 200 && body.length) {
renderHomepage(req, res, body);
} else if (statusCode !== 200 && !body.length) {
console.log('error ',statusCode);
}
}
);
}
I've searched extensively on both here and other resources but none of the solutions quite answers my question. I hope someone could be of help
I have written a scraper in typescript, Running on node:10.12.0,
Issue: The code goes on sleep after few hours, randomly. And I had to restart it. My best guess is it stucks on url request
Tools/Packages Using:
Puppeteer
Cheerio
Typescript
Code:
import * as cheerio from "cheerio";
import * as request from "request";
import * as fs from "fs";
import * as shell from "shelljs";
import pup = require("puppeteer");
class App {
// #ts-ignore
public browser: pup.Browser;
public appendToFile(file: string, content: string): Promise < string > {
return new Promise < string > ((resolve, reject) => {
try {
fs.appendFileSync(file, content);
resolve("DONE");
} catch (e) {
reject(e);
}
});
}
public loadPage(url: string): Promise < any > {
return new Promise < any > ((resolve, reject) => {
request.get(url, async (err, res, html) => {
if (!err && res.statusCode === 200) {
resolve(html);
} else {
if (err) {
reject(err);
} else {
reject(res);
}
}
});
});
}
public step1(url: string): Promise < string > {
return new Promise < string > (async (resolve, reject) => {
let page: pup.Page | undefined;
try {
let next = false;
let urlLink = url;
let first = true;
let header = "unknown";
let f = url.split("/");
let folder = f[f.length - 3];
folder = folder || header;
let path = "data/" + folder;
shell.mkdir("-p", path);
page = await this.browser.newPage();
await page.goto(url, {
timeout: 0
});
let count = 1;
do {
next = false;
let res = await page.evaluate(() => {
let e = document.querySelectorAll(".ch-product-view-list-container.list-view li ul > li > h6 > a");
let p: string[] = [];
e.forEach((v) => {
p.push(("https://www.link.com") + (v.getAttribute("href") as string));
});
return p;
});
// for(const l of res) {
// try {
// await this.step2(l, "" , "")
// } catch(er) {
// this.appendToFile("./error.txt", l + "::" + url + "\n").catch(e=>e)
// }
// }
let p = [];
let c = 1;
for (const d of res) {
p.push(await this.step2(d, folder, c.toString()).catch((_e) => {
console.log(_e);
fs.appendFileSync("./error-2.txt", urlLink + " ### " + d + "\n");
}));
c++;
}
await Promise.all(p);
await this.appendToFile("./processed.txt", urlLink + ":" + count.toString() + "\n").catch(e => e);
count++;
console.log(urlLink + ":" + count);
let e = await page.evaluate(() => {
let ele = document.querySelector("#pagination-next") as Element;
let r = ele.getAttribute("style");
return r || "";
});
if (e === "") {
next = true;
await page.click("#pagination-next");
// console.log('waitng')
await page.waitFor(1000);
// console.log('done wait')
// await page.waitForNavigation({waitUntil: 'load'}).catch(e=> console.log(e));
// await Promise.all([
// page.click("#pagination-next"),
// page.waitForNavigation({ waitUntil: 'networkidle0'}), // ]);
}
} while (next);
// await page.close();
resolve("page all scrapped");
} catch (errrr) {
reject(errrr);
} finally {
if (page !== undefined) {
await page.close().catch(e => e);
}
}
});
}
public step2(url: string, folder: string, file: string): Promise < string > {
return new Promise < string > (async (resolve, reject) => {
try {
let html = await this.loadPage(url).catch(e => reject(e));
let $ = cheerio.load(html);
let ress: any = {};
let t = $(".qal_title_heading").text();
if (t) {
ress.header = t.replace(/"/g, "'").replace(/\n|\r|\t/g, "");
}
let d = $("div.ch_formatted_text.qal_thread-content_text.asker").html();
if (d) {
ress.body = d.replace(/"/g, "'").replace(/\n|\r|\t/g, "");
}
// let sprit = "-------------------------------";
let filename = "data" + file + ".json"; // ((t.replace(/[^\w\s]/gi, "")).substring(0,250)+".txt")
let data = JSON.stringify(ress) // t +sprit + d + "\n---end---\n"; await this.appendToFile("./data/"+ folder + "/" +filename, data+",\n")
.then((r) => {
resolve(r);
});
} catch (err) {
reject(err);
}
});
}
}
async function main() {
process.on("SIGTERM", () => {
console.log("SigTerm received");
process.exit(1);
});
process.on("SIGINT", () => {
console.log("SigInt received");
process.exit(1);
});
let path = "data/unknown";
shell.mkdir("-p", path);
let c = new App();
let list: string[] = [];
console.log(process.argv[2]);
require("fs").readFileSync(process.argv[2], "utf-8").split(/\r?\n/).forEach((line: string) => {
list.push(line);
});
console.log("total links->" + list.length);
c.browser = await pup.launch({
headless: true
});
for (const l of list) {
await c.step1(l).then(e => {
fs.appendFileSync("./processed.txt", l);
}).catch(e => {
fs.appendFileSync("./error.txt", l);
});
}
}
main();
Let me know if you need something else from me. Also this is all the code.
So , I figured two problems.
The chrome (under puppeteer) consumes high CPU, which gives the trend like this:
at start it's on moderate usage. and it gradually increases. My trend was it started off with 4% usage and after a day, it reached 100%. I've submitted an issue on their git
I did not specify the timeout in request
was:
request.get(url, async (err, res, html) => {
should be:
request.get(url,{timeout: 1500} async (err, res, html) => {
So far my code is running fine for more than a day now. only issue is high cpu usage. But it's none of my concern as for now.