How do I pass parameters to Apify BasicCrawler handleRequestFunction? - node.js

I'm trying to migrate an existing function to use it inside an Apify actor.
Originally, the function loads a given URL, reads its JSON response, and according to some supplied parameters, extracts some data and returns an object with results.
If you ask, it's not scraping anything "final" at this point. Its results are temporary and will be used to create other URLs which will be scraped then (with another crawler) for actual, useful results.
The current function that executes the crawler is something like this:
let url = new URL('/content', someBaseURL);
url.searchParams.set('search', someKeyword);
const reqList = new apify.RequestList({
sources: [ { url: url.toString() } ]
});
await reqList.initialize();
const crawler = new apify.BasicCrawler({
requestList: reqList,
handleRequestFunction: reqHandler
});
// How do I set the inputs for reqHandler() here ?
await crawler.run();
// How do I get the output from reqHandler() here ?
And the reqHandler code is something like this:
async function reqHandler(options) {
const response = await apify.utils.requestAsBrowser({
url: options.request.url
});
// How do I read parameters from the caller here ?
let searchResults = JSON.parse(response.body);
// ... result object creation logic goes here ...
// How do I return a result to the caller here ?
}
I am pretty new to this Apify thing and lost in the documentation.
Thanks for your help.

handleRequestFunction doesn't take any external input or produce any outputs. Simply use it as a closure and capture inputs from the surrounding code or you can wrap it in a different function.
Normally we do it like this:
const context = {}; // put your inputs here
const crawler = new apify.BasicCrawler({
requestList: reqList,
handleRequestFunction: async () => {
// use context here
// output data
await Apify.pushData(results);
}
});
EDIT: I forgot to mention a use-case on how to pass input. You need to do it via the request.userData object when adding to a queue or a list.
// The same userData is available in request list.
await requestQueue.addRequest({
url: 'https://example.com',
userData: { myInput: 'any-data' }
});
// Then in handleRequestFunction
handleRequestFunction: async (( request }) => {
const { myInput } = request.userData;
// ...
}

Related

Dynamic Slash Command Options List via Database Query?

Background:
I am building a discord bot that operates as a Dungeons & Dragons DM of sorts. We want to store game data in a database and during the execution of certain commands, query data from said database for use in the game.
All of the connections between our Discord server, our VPS, and the VPS' backend are functional and we are now implementing slash commands since traditional ! commands are being removed from support in April.
We are running into problems making the slash commands though. We want to set them up to be as efficient as possible which means no hard-coded choices for options. We want to build those choice lists via data from the database.
The problem we are running into is that we can't figure out the proper way to implement the fetch to the database within the SlashCommandBuilder.
Here is what we currently have:
const {SlashCommandBuilder} = require('#discordjs/builders');
const fetch = require('node-fetch');
const {REST} = require('#discordjs/rest');
const test = require('../commonFunctions/test.js');
var options = async function getOptions(){
let x = await test.getClasses();
console.log(x);
return ['test','test2'];
}
module.exports = {
data: new SlashCommandBuilder()
.setName('get-test-data')
.setDescription('Return Class and Race data from database')
.addStringOption(option =>{
option.setName('class')
.setDescription('Select a class for your character')
.setRequired(true)
for(let op of options()){
//option.addChoice(op,op);
}
return option
}
),
async execute(interaction){
},
};
This code produces the following error when start the npm for our bot on our server:
options is not a function or its return value is not iterable
I thought that maybe the function wasn't properly defined, so I replaced the contents of it with just a simple array return and the npm started without errors and the values I had passed showed up in the server.
This leads me to think that the function call in the modules.exports block is immediatly attempting to get the return value of the function and as the function is async, it isn't yet ready and is either returning undefined or a promise or something else not iteratable.
Is there a proper way to implement the code as shown? Or is this way too complex for discord.js to handle?
Is there a proper way to implement the idea at all? Like creating a json object that contains the option data which is built and saved to a file at some point prior to this command being registered and then having the code above just pull in that file for the option choices?
Alright, I found a way. Ian Malcom would be proud (LMAO).
Here is what I had to do for those with a similar issues:
I had to basically re-write our entire application. It sucks, I know, but it works so who cares?
When you run your index file for your npm, make sure that you do the following things.
Note: you can structure this however you want, this is just how I set up my js files.
Setup a function that will setup the data you need, it needs to be an async function as does everything downstream from this point on relating to the creation and registration of the slash commands.
Create a js file to act as your application setup "module". "Module" because we're faking a real module by just using the module.exports method. No package.jsons needed.
In the setup file, you will need two requires. The first is a, as of yet, non-existent data manager file; we'll do that next. The second is a require for node:fs.
Create an async function in your setup file called setup and add it to your module.exports like so:
module.exports = { setup }
In your async setup function or in a function that it calls, make a call to the function in your still as of yet non-existent data manager file. Use await so that the application doesn't proceed until something is returned. Here is what mine looks like, note that I am writing my data to a file to read in later because of my use case, you may or may not have to do the same for yours:
async function setup(){
console.log('test');
//build option choice lists
let listsBuilt = await buildChoiceLists();
if (listsBuilt){
return true;
} else {
return false;
}
}
async function buildChoiceLists(){
let classListBuilt = await buildClassList();
return true;
}
async function buildClassList(){
let classData = await classDataManager.getClassData();
console.log(classData);
classList = classData;
await writeFiles();
return true;
}
async function writeFiles(){
fs.writeFileSync('./CommandData/classList.json', JSON.stringify(classList));
}
Before we finish off this file, if you want to store anything as a property in this file and then get it later on, you can do so. In order for the data to return properly though, you will need to define a getter function in your exports. Here is an example:
var classList;
module.exports={
getClassList: () => classList,
setup
};
So, with everything above you should have something that looks like this:
const classDataManager = require('./DataManagers/ClassData.js')
const fs = require('node:fs');
var classList;
async function setup(){
console.log('test');
//build option choice lists
let listsBuilt = await buildChoiceLists();
if (listsBuilt){
return true;
} else {
return false;
}
}
async function buildChoiceLists(){
let classListBuilt = await buildClassList();
return true;
}
async function buildClassList(){
let classData = await classDataManager.getClassData();
console.log(classData);
classList = classData;
await writeFiles();
return true;
}
async function writeFiles(){
fs.writeFileSync('./CommandData/classList.json', JSON.stringify(classList));
}
module.exports={
getClassList: () => classList,
setup
};
Next that pesky non-existent DataManager file. For mine, each data type will have its own, but you might want to just combine them all into a single .js file for yours.
Same with the folder name, I called mine DataManagers, if you're combining them all into one, you could just call the file DataManager and leave it in the same folder as your appSetup.js file.
For the data manager file all we really need is a function to get our data and then return it in the format we want it to be in. I am using node-fetch. If you are using some other module for data requests, write your code as needed.
Instead of explaining everything, here is the contents of my file, not much has to be explained here:
const fetch = require('node-fetch');
async function getClassData(){
return new Promise((resolve) => {
let data = "action=GetTestData";
fetch('http://xxx.xxx.xxx.xx/backend/characterHandler.php', {
method: 'post',
headers: { 'Content-Type':'application/x-www-form-urlencoded'},
body: data
}).then(response => {
response.json().then(res => {
let status = res.status;
let clsData = res.classes;
let rcData = res.races;
if (status == "Success"){
let text = '';
let classes = [];
let races = [];
if (Object.keys(clsData).length > 0){
for (let key of Object.keys(clsData)){
let cls = clsData[key];
classes.push({
"name": key,
"code": key.toLowerCase()
});
}
}
if (Object.keys(rcData).length > 0){
for (let key of Object.keys(rcData)){
let rc = rcData[key];
races.push({
"name": key,
"desc": rc.Desc
});
}
}
resolve(classes);
}
});
});
});
}
module.exports = {
getClassData
};
This file contacts our backend php and requests data from it. It queries the data then returns it. Then we format it into an JSON structure for use later on with option choices for the slash command.
Once all of your appSetup and data manager files are complete, we still need to create the commands and register them with the server. So, in your index file add something similar to the following:
async function getCommands(){
let cmds = await comCreator.appSetup();
console.log(cmds);
client.commands = cmds;
}
getCommands();
This should go at or near the top of your index.js file. Note that comCreator refers to a file we haven't created yet; you can name this require const whatever you wish. That's it for this file.
Now, the "comCreator" file. I named mine deploy-commands.js, but you can name it whatever. Once again, here is the full file contents. I will explain anything that needs to be explained after:
const {Collection} = require('discord.js');
const {REST} = require('#discordjs/rest');
const {Routes} = require('discord-api-types/v9');
const app = require('./appSetup.js');
const fs = require('node:fs');
const config = require('./config.json');
async function appSetup(){
console.log('test2');
let setupDone = await app.setup();
console.log(setupDone);
console.log(app.getClassList());
return new Promise((resolve) => {
const cmds = [];
const cmdFiles = fs.readdirSync('./commands').filter(f => f.endsWith('.js'));
for (let file of cmdFiles){
let cmd = require('./commands/' + file);
console.log(file + ' added to commands!');
cmds.push(cmd.data.toJSON());
}
const rest = new REST({version: '9'}).setToken(config.token);
rest.put(Routes.applicationGuildCommands(config.clientId, config.guildId), {body: cmds})
.then(() => console.log('Successfully registered application commands.'))
.catch(console.error);
let commands = new Collection();
for (let file of cmdFiles){
let cmd = require('./commands/' + file);
commands.set(cmd.data.name, cmd);
}
resolve(commands);
});
}
module.exports = {
appSetup
};
Most of this is boiler plate for slash command creation though I did combine the creation and registering of the commands into the same process. As you can see, we are grabbing our command files, processing them into a collection, registering that collection, and then resolving the promise with that variable.
You might have noticed that property, was used to then set the client commands in the index.js file.
Config just contains your connection details for your discord server app.
Finally, how I accessed the data we wrote for the SlashCommandBuilder:
data: new SlashCommandBuilder()
.setName('get-test-data')
.setDescription('Return Class and Race data from database')
.addStringOption(option =>{
option.setName('class')
.setDescription('Select a class for your character')
.setRequired(true)
let ops = [];
let data = fs.readFileSync('./CommandData/classList.json','utf-8');
ops = JSON.parse(data);
console.log('test data class options: ' + ops);
for(let op of ops){
option.addChoice(op.name,op.code);
}
return option
}
),
Hopefully this helps someone in the future!

Using global variables in node js in different async functions

I'm trying to get more familiar with best practices in NodeJS. Currently, I have a asynchronous function that scrapes some data off a website and stores the values retrieved in an object. What I would like to do, is use the value in a different function to extract data from Yahoo Finance to retrieve specific values. I am unsure how to pass in this value to the other functions. I'm thinking possibly setting the value, that is passed in to to other functions, as a global variable. Would that be best practice in the NodeJS world of programming? Any opinions or advice would helpful. Below is the code I currently have:
const cheerio = require('cheerio');
const axios = require("axios");
async function read_fortune_500() {
try {
const { data } = await axios({ method: "GET", url: "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies", })
const $ = cheerio.load(data)
const elemSelector = '#constituents > tbody > tr > td:nth-child(1)'
const keys = ['symbol']
$(elemSelector).each((parentIndex, parentElem) => {
let keyIndex = 0
const stockObject = {}
if (parentIndex <= 9){
$(parentElem).children().each((childIndex, childElem) => {
const tdValue = $(childElem).text()
if (tdValue) {
stockObject[keys[keyIndex]] = tdValue
}
})
console.log(stockObject)
}
})
} catch (err) {
console.error(err)
}
}
async function getCurrentPrice() {}
read_fortune_500()
Sounds more like a JavaScript question then a NodeJS specific question.
NodeJS: I would say you can store the result of scraping the website in session data. Or pass it along in the response and call next(). Or create some middleware to scrape the website before calling the Yahoo route.
Javascript: You can call a async function to scrape the data on your site and await the response. once it is done you can call your next function passing the data retrieved from async result. See below for basic example.
async function scrapeWebsite(){
let webScrapeReults;
// logic to scrape site
return webScrapeReults;
}
function getYahooMarket(){
let results;
let webData = await scrapeWebsite();
// use webData to get reults for yahooMarket
return results;
}

Compare API response against itself

I am trying to:
Poll a public API every 5 seconds
Store the resulting JSON in a variable
Store the next query to this same API in a second variable
Compare the first variable to the second
Print the second variable if it is different from the first
Else: Print the phrase: 'The objects are the same' if they haven't changed
Unfortunately, the comparison part appears to fail. I am realizing that this implementation is probably lacking the appropriate variable scoping but I can't put my finger on it. Any advice would be highly appreciated.
data: {
chatters: {
viewers: {
},
},
},
};
//prints out pretty JSON
function prettyJSON(obj) {
console.log(JSON.stringify(obj, null, 2));
}
// Gets Users from Twitch API endpoint via axios request
const getUsers = async () => {
try {
return await axios.get("http://tmi.twitch.tv/group/user/sixteenbitninja/chatters");
} catch (error) {
console.error(error);
}
};
//Intended to display
const displayViewers = async (previousResponse) => {
const usersInChannel = await getUsers();
if (usersInChannel.data.chatters.viewers === previousResponse){
console.log("The objects are the same");
} else {
if (usersInChannel.data.chatters) {
prettyJSON(usersInChannel.data.chatters.viewers);
const previousResponse = usersInChannel.data.chatters.viewers;
console.log(previousResponse);
intervalFunction(previousResponse);
}
}
};
// polls display function every 5 seconds
const interval = setInterval(function () {
// Calls Display Function
displayViewers()
}, 5000);```
The issue is that you are using equality operator === on objects. two objects are equal if they have the same reference. While you want to know if they are identical. Check this:
console.log({} === {})
For your usecase you might want to store stringified version of the previousResponse and compare it with stringified version of the new object (usersInChannel.data.chatters.viewers) like:
console.log(JSON.stringify({}) === JSON.stringify({}))
Note: There can be issues with this approach too, if the order of property changes in the response. In which case, you'd have to check individual properties within the response objects.
May be you can use npm packages like following
https://www.npmjs.com/package/#radarlabs/api-diff

various axios call relative at previous one

i have an array of variable number of urls and i must merge the data get with axion
the problem is then every axios call is relative to the data of the previus
if i have a fixed number of ulrs i can nest axion calls and live with that
i think to use something like this
var urls = ["xx", "xx", "xx"];
mergeData(urls);
function mergeData(myarray, myid = 0, mydata = "none") {
var myurl = "";
if (Array.isArray(mydata)) {
myurl = myarray[myid];
// do my stuff with data and modify the url
} else {
myurl = myarray[myid];
}
axios.get(myurl)
.then(response => {
// do my stuff and get the data i need and put on an array
if (myarray.length < myid) {
mergeData(myarray, myid + 1, data);
} else {
// show result on ui
}
})
.catch(error => {
console.log(error);
});
}
but i dont like it
there is another solution?
(be kind, i'm still learning ^^)
just to be clear
i need to optain
http request to "first url", parse the the json, save some data(some needed for the output)
another http request to "second url" with one or more parameter from previous data, parse the the json, save some data(some needed for the output)
... and so on, for 5 to 10 times
If your goal is to make subsequent HTTP calls based on information you get from previous calls, I'd utilize async/await and for...of to accomplish this instead of relying on a recursive solution.
async function mergeData(urls) {
const data = [];
for (const url of urls) {
const result = await axios.get(url).then(res => res.data);
console.log(`[${result.id}] ${result.title}`);
// here, do whatever you want to do with
// `result` to make your next call...
// for now, I am just going to append each
// item to `data` and return it at the end
data.push(result);
}
return data;
}
const items = [
"https://jsonplaceholder.typicode.com/posts/1",
"https://jsonplaceholder.typicode.com/posts/2",
"https://jsonplaceholder.typicode.com/posts/3"
];
console.log("fetching...")
mergeData(items)
.then(function(result) {
console.log("done!")
console.log("final result", result);
})
.catch(function(error) {
console.error(error);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/axios/0.19.2/axios.min.js"></script>
Using async/await allows you to utilize for...of which will wait for each call to resolve or reject before moving onto the next one.
To learn more about async/await and for...of, have a look here:
developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function
developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for...of
Hope this helps.

How to I extract the contents of a variable and place them into a constant? Node.js

Im trying to extract the contents of variable topPost and place it into const options under url. I cant seem to get it to work. Im using the snoowrap/Reddit API and image-downloader.
var subReddit = r.getSubreddit('dankmemes');
var topPost = subReddit.getTop({time: 'hour' , limit: 1,}).map(post => post.url).then(console.log);
var postTitle = subReddit.getTop({time: 'hour' , limit: 1 }).map(post => post.title).then(console.log);
const options = {
url: topPost,
dest: './dank_memes/photo.jpg'
}
async function downloadIMG() {
try {
const { filename, image } = await download.image(options)
console.log(filename) // => /path/to/dest/image.jpg
} catch (e) {
console.error(e)
}
}
the recommended formatting for the image downloader is as follows:
const options = {
url: 'http://someurl.com/image.jpg',
dest: '/path/to/dest'
}
async function downloadIMG() {
try {
const { filename, image } = await download.image(options)
console.log(filename) // => /path/to/dest/image.jpg
} catch (e) {
console.error(e)
}
}
downloadIMG()
so it looks like i have to have my url formatted in between ' ' but i have no idea how to get the url from var topPost and place it in between those quotes.
any ideas would be greatly appreciated.
Thanks!
topPost is a Promise, not the final value.
Promises existence is to work with asynchronous data easily. Asynchronous data is data that returns at a point in the future, not instantly, and that's why they have a then method. When a Promise resolves to a value, the then callback is called.
In this case, the library will connect to Reddit and download data from it, which is not something that can done instantly, so the code will continue running and later will call the then callback, when the data has finished downloading. So:
var subReddit = r.getSubreddit('dankmemes');
// First we get the top posts, and register a "then" callback to receive all these posts
subReddit.getTop({time: 'hour' , limit: 1,}).map(post => post.url).then((topPost) => {
// When we got the top posts, we connect again to Reddit to get the top posts title.
subReddit.getTop({time: 'hour' , limit: 1 }).map(post => post.title).then((postTitle) => {
// Here you have both topPost and postTitle (which will be both arrays. You must access the first element)
console.log("This console.log will be called last");
});
});
// The script will continue running at this point, but the script is still connecting to Reddit and downloading the data
console.log("This console.log will be called first");
With this code you have a problem. You first connect to Reddit to get the top post URL, and then you connect to Reddit again to get the post Title. Is like pressing F5 in between. Simply think that if a new post is added between those queries, you will get the wrong title (and also you are consuming double bandwidth consumption, which is not optimal too). The correct way of doing this is to get both the title and the url on the same query. How to do so?, like this:
var subReddit = r.getSubreddit('dankmemes');
// We get the top posts, and map BOTH the url and title
subReddit.getTop({time: 'hour' , limit: 1,}).map(post => {
return {
url: post.url,
title: post.title
};
}).then((topPostUrlAndTitle) => {
// Here you have topPostUrlAndTitle[0].url and topPostUrlAndTitle[0].title
// Note how topPostUrlAndTitle is an array, as you are actually asking for "all top posts" although you are limiting to only one.
});
BUT this is also weird to do. Why don't you just get the post data directly? Like so:
var subReddit = r.getSubreddit('dankmemes');
// We get the top posts
subReddit.getTop({time: 'hour' , limit: 1,}).then((posts) => {
// Here you have posts[0].url and posts[0].title
});
There's a way to get rid of JavaScript callback hell with async/await, but I'm not going to enter into matter because for a newbie is a bit difficult to explain why is not synchronous code although it seems to look like so.

Resources