Handling Async Functions and Routing in Formidable / Extracting text in PDFReader - node.js

I'm creating an application where users upload a pdf and extracts the text into JSON format. I am able to access the text, but I can't hold the response until the PDF extraction is complete. I'm unfamiliar with Formidable and I may be missing something entirely.
I am using Formidable for uploading and PDFReader for text extraction. The front-end and back-end are on separate servers, and the app is only intended for local use, so that shouldn't be an issue. I'm able to console.log the text perfectly. I would like to work with the text in JSON format in some way. I would like to append the text to the response back to the front-end, but I can't seem to hold it until the response is sent.
const IncomingForm = require("formidable").IncomingForm;
const { PdfReader } = require('pdfreader');
const test = new PdfReader(this,1);
module.exports = function upload(req, res) {
let str = ''
let form = new IncomingForm();
form.parse(req, () => {
console.log('parse')
});
form.on("file", (field, file) => {
test.parseFileItems(file.path, (err, item) => {
if (err){
console.log(err)
}
else if (item){
if (item.text){
console.log(item.text)
str += item.text
}
}
})
});
form.on("end", () => {
console.log("reached end/str: ", str)
});
};
I've attempted a number of different ways of handling the async functions, primarily within form.on('file'). The following attempts at form.on('file') produce the same effect (the text is console.logged correctly but only after form.on('end") is hit:
//Making the callback to form.on('file') async then traditional await
form.on("file", async (field, file) => {
//...
await test.parseFileItems(...)
//...
console.log(str) //After end of PDFReader code, shows blank
//Making cb async, then manually creating promise
form.on("file", async (field, file) => {
//...
let textProm = await new Promise ((res, rej) => //...
I've also attempted to convert the text manually from the Buffer using fs.readFile, but this also produces the same effect; I can only access text after form.end is hit.
A few things I see is that form.on('file') is hit first, then form.parse. It seems maybe I'm attempting to parse the document twice (Formidable and Pdfreader), but this is probably necessary.
Also, after reading through the docs/stackoverflow, I think I'm mixing the built-in middleware with form.parse/form.on/form.end with manual callbacks, but I was unsure of how to stick with just one, and I'm still able to access the text.
Finally, PDFReader accesses text one line at a time, so parseFileItems is run for every line. I've attempted to resolve a Promise.all with the PdfReader instance, but I couldn't get it to work.
Any help would be greatly appreciated!

Related

Dynamic Slash Command Options List via Database Query?

Background:
I am building a discord bot that operates as a Dungeons & Dragons DM of sorts. We want to store game data in a database and during the execution of certain commands, query data from said database for use in the game.
All of the connections between our Discord server, our VPS, and the VPS' backend are functional and we are now implementing slash commands since traditional ! commands are being removed from support in April.
We are running into problems making the slash commands though. We want to set them up to be as efficient as possible which means no hard-coded choices for options. We want to build those choice lists via data from the database.
The problem we are running into is that we can't figure out the proper way to implement the fetch to the database within the SlashCommandBuilder.
Here is what we currently have:
const {SlashCommandBuilder} = require('#discordjs/builders');
const fetch = require('node-fetch');
const {REST} = require('#discordjs/rest');
const test = require('../commonFunctions/test.js');
var options = async function getOptions(){
let x = await test.getClasses();
console.log(x);
return ['test','test2'];
}
module.exports = {
data: new SlashCommandBuilder()
.setName('get-test-data')
.setDescription('Return Class and Race data from database')
.addStringOption(option =>{
option.setName('class')
.setDescription('Select a class for your character')
.setRequired(true)
for(let op of options()){
//option.addChoice(op,op);
}
return option
}
),
async execute(interaction){
},
};
This code produces the following error when start the npm for our bot on our server:
options is not a function or its return value is not iterable
I thought that maybe the function wasn't properly defined, so I replaced the contents of it with just a simple array return and the npm started without errors and the values I had passed showed up in the server.
This leads me to think that the function call in the modules.exports block is immediatly attempting to get the return value of the function and as the function is async, it isn't yet ready and is either returning undefined or a promise or something else not iteratable.
Is there a proper way to implement the code as shown? Or is this way too complex for discord.js to handle?
Is there a proper way to implement the idea at all? Like creating a json object that contains the option data which is built and saved to a file at some point prior to this command being registered and then having the code above just pull in that file for the option choices?
Alright, I found a way. Ian Malcom would be proud (LMAO).
Here is what I had to do for those with a similar issues:
I had to basically re-write our entire application. It sucks, I know, but it works so who cares?
When you run your index file for your npm, make sure that you do the following things.
Note: you can structure this however you want, this is just how I set up my js files.
Setup a function that will setup the data you need, it needs to be an async function as does everything downstream from this point on relating to the creation and registration of the slash commands.
Create a js file to act as your application setup "module". "Module" because we're faking a real module by just using the module.exports method. No package.jsons needed.
In the setup file, you will need two requires. The first is a, as of yet, non-existent data manager file; we'll do that next. The second is a require for node:fs.
Create an async function in your setup file called setup and add it to your module.exports like so:
module.exports = { setup }
In your async setup function or in a function that it calls, make a call to the function in your still as of yet non-existent data manager file. Use await so that the application doesn't proceed until something is returned. Here is what mine looks like, note that I am writing my data to a file to read in later because of my use case, you may or may not have to do the same for yours:
async function setup(){
console.log('test');
//build option choice lists
let listsBuilt = await buildChoiceLists();
if (listsBuilt){
return true;
} else {
return false;
}
}
async function buildChoiceLists(){
let classListBuilt = await buildClassList();
return true;
}
async function buildClassList(){
let classData = await classDataManager.getClassData();
console.log(classData);
classList = classData;
await writeFiles();
return true;
}
async function writeFiles(){
fs.writeFileSync('./CommandData/classList.json', JSON.stringify(classList));
}
Before we finish off this file, if you want to store anything as a property in this file and then get it later on, you can do so. In order for the data to return properly though, you will need to define a getter function in your exports. Here is an example:
var classList;
module.exports={
getClassList: () => classList,
setup
};
So, with everything above you should have something that looks like this:
const classDataManager = require('./DataManagers/ClassData.js')
const fs = require('node:fs');
var classList;
async function setup(){
console.log('test');
//build option choice lists
let listsBuilt = await buildChoiceLists();
if (listsBuilt){
return true;
} else {
return false;
}
}
async function buildChoiceLists(){
let classListBuilt = await buildClassList();
return true;
}
async function buildClassList(){
let classData = await classDataManager.getClassData();
console.log(classData);
classList = classData;
await writeFiles();
return true;
}
async function writeFiles(){
fs.writeFileSync('./CommandData/classList.json', JSON.stringify(classList));
}
module.exports={
getClassList: () => classList,
setup
};
Next that pesky non-existent DataManager file. For mine, each data type will have its own, but you might want to just combine them all into a single .js file for yours.
Same with the folder name, I called mine DataManagers, if you're combining them all into one, you could just call the file DataManager and leave it in the same folder as your appSetup.js file.
For the data manager file all we really need is a function to get our data and then return it in the format we want it to be in. I am using node-fetch. If you are using some other module for data requests, write your code as needed.
Instead of explaining everything, here is the contents of my file, not much has to be explained here:
const fetch = require('node-fetch');
async function getClassData(){
return new Promise((resolve) => {
let data = "action=GetTestData";
fetch('http://xxx.xxx.xxx.xx/backend/characterHandler.php', {
method: 'post',
headers: { 'Content-Type':'application/x-www-form-urlencoded'},
body: data
}).then(response => {
response.json().then(res => {
let status = res.status;
let clsData = res.classes;
let rcData = res.races;
if (status == "Success"){
let text = '';
let classes = [];
let races = [];
if (Object.keys(clsData).length > 0){
for (let key of Object.keys(clsData)){
let cls = clsData[key];
classes.push({
"name": key,
"code": key.toLowerCase()
});
}
}
if (Object.keys(rcData).length > 0){
for (let key of Object.keys(rcData)){
let rc = rcData[key];
races.push({
"name": key,
"desc": rc.Desc
});
}
}
resolve(classes);
}
});
});
});
}
module.exports = {
getClassData
};
This file contacts our backend php and requests data from it. It queries the data then returns it. Then we format it into an JSON structure for use later on with option choices for the slash command.
Once all of your appSetup and data manager files are complete, we still need to create the commands and register them with the server. So, in your index file add something similar to the following:
async function getCommands(){
let cmds = await comCreator.appSetup();
console.log(cmds);
client.commands = cmds;
}
getCommands();
This should go at or near the top of your index.js file. Note that comCreator refers to a file we haven't created yet; you can name this require const whatever you wish. That's it for this file.
Now, the "comCreator" file. I named mine deploy-commands.js, but you can name it whatever. Once again, here is the full file contents. I will explain anything that needs to be explained after:
const {Collection} = require('discord.js');
const {REST} = require('#discordjs/rest');
const {Routes} = require('discord-api-types/v9');
const app = require('./appSetup.js');
const fs = require('node:fs');
const config = require('./config.json');
async function appSetup(){
console.log('test2');
let setupDone = await app.setup();
console.log(setupDone);
console.log(app.getClassList());
return new Promise((resolve) => {
const cmds = [];
const cmdFiles = fs.readdirSync('./commands').filter(f => f.endsWith('.js'));
for (let file of cmdFiles){
let cmd = require('./commands/' + file);
console.log(file + ' added to commands!');
cmds.push(cmd.data.toJSON());
}
const rest = new REST({version: '9'}).setToken(config.token);
rest.put(Routes.applicationGuildCommands(config.clientId, config.guildId), {body: cmds})
.then(() => console.log('Successfully registered application commands.'))
.catch(console.error);
let commands = new Collection();
for (let file of cmdFiles){
let cmd = require('./commands/' + file);
commands.set(cmd.data.name, cmd);
}
resolve(commands);
});
}
module.exports = {
appSetup
};
Most of this is boiler plate for slash command creation though I did combine the creation and registering of the commands into the same process. As you can see, we are grabbing our command files, processing them into a collection, registering that collection, and then resolving the promise with that variable.
You might have noticed that property, was used to then set the client commands in the index.js file.
Config just contains your connection details for your discord server app.
Finally, how I accessed the data we wrote for the SlashCommandBuilder:
data: new SlashCommandBuilder()
.setName('get-test-data')
.setDescription('Return Class and Race data from database')
.addStringOption(option =>{
option.setName('class')
.setDescription('Select a class for your character')
.setRequired(true)
let ops = [];
let data = fs.readFileSync('./CommandData/classList.json','utf-8');
ops = JSON.parse(data);
console.log('test data class options: ' + ops);
for(let op of ops){
option.addChoice(op.name,op.code);
}
return option
}
),
Hopefully this helps someone in the future!

"Failed to pipe. The response has been emitted already" when reading a stream (nodejs)

So my code is supposed to read some lines from a CSV file, convert them to an array of JSON objects, and return that array.
To read the file as a stream, I am using got, and then using it in fast-csv.
In order to return the resulting array, I put the entire thing into a Promise like this:
async GetPage() : Promise<{OutputArray:any[], StartingIndex:number}>{
return new Promise(async (resolve, reject) => {
const output:any[] = [];
const startingIndex = this.currentLocation;
try{
parseStream(this.source, {headers:true, maxRows:this.maxArrayLength, skipRows:this.currentLocation, ignoreEmpty:true, delimiter:this.delimiter})
.on('error', error => console.log(`parseStream: ${error}`))
.on('data', row => {
const obj = this.unflatten(row); // data is flattened JSON, need to unflatten it
output.push(obj); // append to output array
this.currentLocation++;
})
.on('end', (rowCount: number) => {
console.log(`Parsed ${this.currentLocation} rows`);
resolve({OutputArray:output, StartingIndex:startingIndex});
});
}
catch(ex){
console.log(`parseStream: ${ex}`);
throw new Error(ex);
}
})
}
Now when I call this once (await GetPage()) it works perfectly fine.
The problem is when I'm calling it a second time in a row. I'm getting the following:
UnhandledPromiseRejectionWarning: Error: Failed to pipe. The response has been emitted already.
I've seen a similar case over here: https://github.com/sindresorhus/file-type/issues/342 but from what I gather this is a different case, or rather if it's the same I don't know how to apply the solution here.
The GetPage is a method inside a class CSVStreamParser which is given a Readable in the constructor, and I create that Readable like this: readable:Readable = got.stream(url)
What confuses me is that my first version of GetPage did not include a Promise, but rather accepted a callback (I just sent console.log to test it) and when I called it several times in a row there was no error, but it could not return a value so I converted it to a Promise.
Thank you! :)
EDIT: I have managed to make it work by re-opening the stream at the start of GetPage(), but I am wondering if there is a way to achieve the same result without having to do so? Is there a way to keep the stream open?
First, remove both of the async, since you are already returning a Promise.
Then remove the try/catch block and throw since you shouldn't throw in a promise. Instead use the reject function.
GetPage() : Promise<{OutputArray:any[], StartingIndex:number}>{
return new Promise((resolve, reject) => {
const output:any[] = [];
const startingIndex = this.currentLocation;
parseStream(this.source, {headers:true, maxRows:this.maxArrayLength, skipRows:this.currentLocation, ignoreEmpty:true, delimiter:this.delimiter})
.on('error', error => reject(error))
.on('data', row => {
const obj = this.unflatten(row); // data is flattened JSON, need to unflatten it
output.push(obj); // append to output array
this.currentLocation++;
})
.on('end', (rowCount: number) => {
console.log(`Parsed ${this.currentLocation} rows`);
resolve({OutputArray:output, StartingIndex:startingIndex});
});
});
}
Here's some resources to help you learn about async functions and promises.

Node JS: How to catch the individual errors while reading files, in case multiple files are read on Promise.all?

I am having 10 different files and I need to read their content and merge it in one object (in NodeJS). I am successfully doing that with the code below:
const fs = require('fs');
const path = require('path');
const { promisify } = require("util");
const readFileAsync = promisify(fs.readFile);
let filePathArray = ['path/to/file/one', ... , 'path/to/file/ten'];
Promise.all(
filePathArray.map(filePath => {
return readFileAsync(filePath);
})
).then(responses => { //array of 10 reponses
let combinedFileContent = {};
responses.forEach((itemFileContent, index) => {
let tempContent = JSON.parse(itemFileContent);
//merge tempContent into combinedFileContent
}
});
But what I wonder is, how to catch if there is some error while trying to read the files? When reading a single file, this works like:
fs.readFile(singleFilePath, (singleFileErr, singleFileContent) => {
if (singleFileErr) {
//do something on error, while trying to read the file
}
});
So my question here is, how can I access to the error inn the first code snippet, which corresponds to singleFileErr from this second code snippet?
The issue I am facing is: in case some of the files does not exists, I want to check the error and to skip this file, but since I can not detect the error with current implementation, my whole block crashes and I am not able to merge the other 9 files because of this one. I want to use the error check I mentioned in the second snippet.
Check out the Promise.allSettled function, which will run every Promise passed to it, and will tell you at the end which ones succeeded and which ones failed.
Maybe try something like this:
in the map() callback, return a promise that resolves to null if the file is not found.
Introduce a middle stage in the promise chain filtering out null responses.
This would look something like this:
Promise.all(
filePathArray.map(filePath => {
return readFileAsync(filePath).catch(function(error){
if(isErrorFileDoesNotExist(error)) return null
throw error;
})
});
).then(responses => {
return responses.filter(response => response != null)
})
.then(filteredResponses => {
// .. do something
});
Would that work for you? Note this presupposes you are actually able to discriminate between missing file errors from other errors the promise returned by readFileAsync() may reject - presumably via the isErrorFileDoesNotExist() function in this snippet.

How to I extract the contents of a variable and place them into a constant? Node.js

Im trying to extract the contents of variable topPost and place it into const options under url. I cant seem to get it to work. Im using the snoowrap/Reddit API and image-downloader.
var subReddit = r.getSubreddit('dankmemes');
var topPost = subReddit.getTop({time: 'hour' , limit: 1,}).map(post => post.url).then(console.log);
var postTitle = subReddit.getTop({time: 'hour' , limit: 1 }).map(post => post.title).then(console.log);
const options = {
url: topPost,
dest: './dank_memes/photo.jpg'
}
async function downloadIMG() {
try {
const { filename, image } = await download.image(options)
console.log(filename) // => /path/to/dest/image.jpg
} catch (e) {
console.error(e)
}
}
the recommended formatting for the image downloader is as follows:
const options = {
url: 'http://someurl.com/image.jpg',
dest: '/path/to/dest'
}
async function downloadIMG() {
try {
const { filename, image } = await download.image(options)
console.log(filename) // => /path/to/dest/image.jpg
} catch (e) {
console.error(e)
}
}
downloadIMG()
so it looks like i have to have my url formatted in between ' ' but i have no idea how to get the url from var topPost and place it in between those quotes.
any ideas would be greatly appreciated.
Thanks!
topPost is a Promise, not the final value.
Promises existence is to work with asynchronous data easily. Asynchronous data is data that returns at a point in the future, not instantly, and that's why they have a then method. When a Promise resolves to a value, the then callback is called.
In this case, the library will connect to Reddit and download data from it, which is not something that can done instantly, so the code will continue running and later will call the then callback, when the data has finished downloading. So:
var subReddit = r.getSubreddit('dankmemes');
// First we get the top posts, and register a "then" callback to receive all these posts
subReddit.getTop({time: 'hour' , limit: 1,}).map(post => post.url).then((topPost) => {
// When we got the top posts, we connect again to Reddit to get the top posts title.
subReddit.getTop({time: 'hour' , limit: 1 }).map(post => post.title).then((postTitle) => {
// Here you have both topPost and postTitle (which will be both arrays. You must access the first element)
console.log("This console.log will be called last");
});
});
// The script will continue running at this point, but the script is still connecting to Reddit and downloading the data
console.log("This console.log will be called first");
With this code you have a problem. You first connect to Reddit to get the top post URL, and then you connect to Reddit again to get the post Title. Is like pressing F5 in between. Simply think that if a new post is added between those queries, you will get the wrong title (and also you are consuming double bandwidth consumption, which is not optimal too). The correct way of doing this is to get both the title and the url on the same query. How to do so?, like this:
var subReddit = r.getSubreddit('dankmemes');
// We get the top posts, and map BOTH the url and title
subReddit.getTop({time: 'hour' , limit: 1,}).map(post => {
return {
url: post.url,
title: post.title
};
}).then((topPostUrlAndTitle) => {
// Here you have topPostUrlAndTitle[0].url and topPostUrlAndTitle[0].title
// Note how topPostUrlAndTitle is an array, as you are actually asking for "all top posts" although you are limiting to only one.
});
BUT this is also weird to do. Why don't you just get the post data directly? Like so:
var subReddit = r.getSubreddit('dankmemes');
// We get the top posts
subReddit.getTop({time: 'hour' , limit: 1,}).then((posts) => {
// Here you have posts[0].url and posts[0].title
});
There's a way to get rid of JavaScript callback hell with async/await, but I'm not going to enter into matter because for a newbie is a bit difficult to explain why is not synchronous code although it seems to look like so.

JSON file can't be modified

I am currently creating a web that uses a variable that I can store in a JSON format. My plan is to modify the value of the JSON every time there's a connection to a certain route. The problem is it just won't write.
I have tried to use fs.writeFile and fs.writeFileSync but none of them seem to work.
// Code I Have tried
const kwitansi = require('./no_kwitansi.json')
app.get('', async (req, res) => {
kwitansi.no_kwitansi += await 1
await fs.writeFile('../no_kwitansi.json', JSON.stringify(kwitansi, null, 2), function (e) {
if (e) {
throw new Error
} else {
console.log('Wrote to file')
}
})
await console.log(kwitansi)
await res.send(kwitansi)
})
// An Example of my JSON File
{
"no_kwitansi":4
}
You are trying to write to a place where you do not have permission. Note that you opened ./no_kwitansi.json, but you are trying to write to ../no_kwitansi.json (one directory back). If you are sure you can replace the original file, remove the extra . in the write line.
If the error persists, you also need to be sure that you have the proper permissions to write the file. If you are using *nix or mac, you can check this link.

Resources