Getting error testing readstream response with Jest - node.js

I am getting timeout error while testing a function that returns readStream from google cloud store.
// Test case
import { createResponse } from 'node-mocks-http';
it('getImage from google cloud storage', async done => {
let res: Response = createResponse();
let data = await controller.getImage({ image: "text data"},object, response);
});
// Function that calls google gloud storage and return object.
public async getImage(imageReqDto, baseDto, response){
let bucket = this.storage.bucket(process.env.IMAGE_BUCKET);
let file = bucket.file(image);
return file.createReadStream().pipe(response);;
}
Any solution how to pass the test case successfully after receiving the buffer data.

It depends on what you're testing for. If you're just wanting to make sure a ReadStream is returned, try using .toBeInstanceOf():
it('readstream is returned, when .getImage() is invoked with image data', async () => {
const response: Response = createResponse();
const stream = await controller.getImage({image: "text data"}, object, response);
expect(stream).toBeInstanceOf(ReadStream)
});
However a better test would be to know the expected response and assert that the ReadStream matches an expected value. If it's a small image you could load into a buffer and compare. If the image is large, you may consider storing it as base64 and comparing the actual with the expected.

Related

Why Does my AWS lambda function randomly fail when using private elasticache network calls as well as external API calls?

I am trying to write a caching function that returns cached elasticcache data or makes an api call to retrieve that data. However, the lambda function seems to be very unrealiable and timing out often.
It seems that the issue is having redis calls as well as public api calls causes the issue. I can confirm that I have setup aws correctly with a subnet with an internet gateway and a private subnet with a nat gateway. The function works, but lonly 10 % of the time.The remaining times exceution is stopped right before making the API call.
I have also noticed that the api calls fail after creating the redis client. If I make the external api call prior to making the redis check it seems the function is a lot more reliable and doesn't time out.
Not sure what to do. Is it best practice to seperate these 2 tasks or am I doing something wrong?
let data = null;
module.exports.handler = async (event) => {
//context.callbackWaitsForEmptyEventLoop = false;
let client;
try {
client = new Redis(
6379,
"redis://---.---.ng.0001.use1.cache.amazonaws.com"
);
client.get(event.token, async (err, result) => {
if (err) {
console.error(err);
} else {
data = result;
await client.quit();
}
});
if (data && new Date().getTime() / 1000 - eval(data).timestamp < 30) {
res.send(`({
"address": "${token}",
"price": "${eval(data).price}",
"timestamp": "${eval(data).timestamp}"
})`);
} else {
getPrice(event); //fetch api data
}
```
There a lot of misunderstand in your code. I'll try to guide you to fix it and understand how to do that correctly.
You are mixing asynchronous and synchronous code in your function.
You should use JSON.parse instead of eval to parse the data because eval allows arbitrary code to be executed in your function
You're using the res.send to return response to the client instead of callback. Remember the usage of res.send is only in express and you're using a lambda and to return the result to client you need to use callback function
To help you in this task, I completely rewrite your code solving these misundersand.
const Redis = require('ioredis');
module.exports.handler = async (event, context, callback) => {
// prefer to use lambda env instead of put directly in the code
const client = new Redis(
"REDIS_PORT_ENV",
"REDIS_HOST_ENV"
);
const data = await client.get(event.token);
client.quit();
const parsedData = JSON.parse(data);
if (parsedDate && new Date().getTime() / 1000 - parsedData.timestamp < 30) {
callback(null, {
address: event.token,
price: parsedData.price,
timestamp: parsedData.timestamp
});
} else {
const dataFromApi = await getPrice(event);
callback(null, dataFromApi);
}
};
There another usage with lambdas that return an object instead of pass a object inside callback, but I think you get the idea and understood your mistakes.
Follow the docs about correctly usage of lambda:
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/using-lambda-functions.html
To undestand more about async and sync in javascript:
https://www.freecodecamp.org/news/synchronous-vs-asynchronous-in-javascript/
JSON.parse x eval: JSON.parse vs. eval()

How do I pass parameters to Apify BasicCrawler handleRequestFunction?

I'm trying to migrate an existing function to use it inside an Apify actor.
Originally, the function loads a given URL, reads its JSON response, and according to some supplied parameters, extracts some data and returns an object with results.
If you ask, it's not scraping anything "final" at this point. Its results are temporary and will be used to create other URLs which will be scraped then (with another crawler) for actual, useful results.
The current function that executes the crawler is something like this:
let url = new URL('/content', someBaseURL);
url.searchParams.set('search', someKeyword);
const reqList = new apify.RequestList({
sources: [ { url: url.toString() } ]
});
await reqList.initialize();
const crawler = new apify.BasicCrawler({
requestList: reqList,
handleRequestFunction: reqHandler
});
// How do I set the inputs for reqHandler() here ?
await crawler.run();
// How do I get the output from reqHandler() here ?
And the reqHandler code is something like this:
async function reqHandler(options) {
const response = await apify.utils.requestAsBrowser({
url: options.request.url
});
// How do I read parameters from the caller here ?
let searchResults = JSON.parse(response.body);
// ... result object creation logic goes here ...
// How do I return a result to the caller here ?
}
I am pretty new to this Apify thing and lost in the documentation.
Thanks for your help.
handleRequestFunction doesn't take any external input or produce any outputs. Simply use it as a closure and capture inputs from the surrounding code or you can wrap it in a different function.
Normally we do it like this:
const context = {}; // put your inputs here
const crawler = new apify.BasicCrawler({
requestList: reqList,
handleRequestFunction: async () => {
// use context here
// output data
await Apify.pushData(results);
}
});
EDIT: I forgot to mention a use-case on how to pass input. You need to do it via the request.userData object when adding to a queue or a list.
// The same userData is available in request list.
await requestQueue.addRequest({
url: 'https://example.com',
userData: { myInput: 'any-data' }
});
// Then in handleRequestFunction
handleRequestFunction: async (( request }) => {
const { myInput } = request.userData;
// ...
}

Firebase function Node.js transform stream

I'm creating a Firebase HTTP Function that makes a BigQuery query and returns a modified version of the query results. The query potentially returns millions of rows, so I cannot store the entire query result in memory before responding to the HTTP client. I am trying to use Node.js streams, and since I need to modify the results before sending them to the client, I am trying to use a transform stream. However, when I try to pipe the query stream through my transform stream, the Firebase Function crashes with the following error message: finished with status: 'response error'.
My minimal reproducible example is as follows. I am using a buffer, because I don't want to process a single row (chunk) at a time, since I need to make asynchronous network calls to transform the data.
return new Promise((resolve, reject) => {
const buffer = new Array(5000)
let bufferIndex = 0
const [job] = await bigQuery.createQueryJob(options)
const bqStream = job.getQueryResultsStream()
const transformer = new Transform({
writableObjectMode: true,
readableObjectMode: false,
transform(chunk, enc, callback) {
buffer[bufferIndex] = chunk
if (bufferIndex < buffer.length - 1) {
bufferIndex++
}
else {
this.push(JSON.stringify(buffer).slice(1, -1)) // Transformation should happen here.
bufferIndex = 0
}
callback()
},
flush(callback) {
if (bufferIndex > 0) {
this.push(JSON.stringify(buffer.slice(0, bufferIndex)).slice(1, -1))
}
this.push("]")
callback()
},
})
bqStream
.pipe(transform)
.pipe(response)
bqStream.on("end", () => {
resolve()
})
}
I cannot store the entire query result in memory before responding to the HTTP client
Unfortunately, when using Cloud Functions, this is precisely what must happen.
There is a documented limit of 10MB for the response payload, and that is effectively stored in memory as your code continues to write to the response. Streaming of requests and responses is not supported.
One alternative is to write your response to an object in Cloud Storage, then send a link or reference to that file to the client so it can read the response fully from that object.
If you need to send a large streamed response, Cloud Functions is not a good choice. Neither is Cloud Run, which is similarly limited. You will need to look into other solutions that allow direct socket access, such as Compute Engine.
I tried to implement the workaround as suggested by Doug Stevenson and got the following error:
#firebase/firestore: Firestore (9.8.2):
Connection GRPC stream error.
Code: 3
Message: 3
INVALID_ARGUMENT: Request payload size exceeds the limit: 11534336 bytes.
I created a workaround to store data in Firestore first. It works fine when the content size is below 10MB.
import * as firestore from "firebase/firestore";
import { initializeApp } from "firebase/app";
import { firebaseConfig } from '../conf/firebase'
// Initialize Firebase
const app = initializeApp(firebaseConfig);
const fs = firestore.getFirestore(app);
export async function storeStudents(data, context) {
const students = await api.getTermStudents()
const batch = firestore.writeBatch(fs);
students.forEach((student) => {
const ref = firestore.doc(fs, 'students', student.studentId)
batch.set(ref, student)
})
await batch.commit()
return 'stored'
}
exports.getTermStudents = functions.https.onCall(storeStudents);
UPDATE:
To bypass Firestore's limit when using the batch function, I just looped through the array and set (add/update) documents. Set() creates or overwrites a single document.
export async function storeStudents(data, context) {
const students = await api.getTermStudents({images: true})
students.forEach((student: Student) => {
const ref = firestore.doc(fs, 'students', student.student_id)
firestore.setDoc(ref, student)
})
return 'stored'
}

Lambda function only putting one data point into InfluxDB

I have a Lambda function that is designed to take a message from a SQS queue and then input a value called perf_value which is just an integer. The CloudWatch logs show it firing each time and logging Done as seen in the .then() block of my write point. With it firing each time I am still only seeing a single data point in InfluxDB Cloud. I can't figure out why it is only inputting a single value then nothing after that. I don't see a backlog in SQS and no error messages in CloudWatch either. I'm guessing it is a code issue or InfluxDB Cloud setup though I used defaults which you would expect to actually work for multiple data points
'use strict';
const {InfluxDB, Point, HttpError} = require('#influxdata/influxdb-client')
const InfluxURL = 'https://us-west-2-1.aws.cloud2.influxdata.com'
const token = '<my token>=='
const org = '<my org>'
const bucket= '<bucket name>'
const writeApi = new InfluxDB({url: InfluxURL, token}).getWriteApi(org, bucket, 'ms')
module.exports.perf = function (event, context, callback) {
context.callbackWaitsForEmptyEventLoop = false;
let input = JSON.parse(event.Records[0].body);
console.log(input)
const point = new Point('elapsedTime')
.tag(input.monitorID, 'monitorID')
.floatField('elapsedTime', input.perf_value)
// .timestamp(input.time)
writeApi.writePoint(point)
writeApi
.close()
.then(() => {
console.log('Done')
})
.catch(e => {
console.error(e)
if (e instanceof HttpError && e.statusCode === 401) {
console.log('Unauthorized request')
}
console.log('\nFinished ERROR')
})
return true
};
EDIT**
Still have been unable to resolve the issue. I can get one datapoint to go into the influxdb and then nothing will show up.
#Joshk132 -
I believe the problem is here:
writeApi
.close() // <-- here
.then(() => {
console.log('Done')
})
You are closing the API client object after the first write so you are only able to write once. You can use flush() instead if you want to force sending the Point immediately.

Handling Async Functions and Routing in Formidable / Extracting text in PDFReader

I'm creating an application where users upload a pdf and extracts the text into JSON format. I am able to access the text, but I can't hold the response until the PDF extraction is complete. I'm unfamiliar with Formidable and I may be missing something entirely.
I am using Formidable for uploading and PDFReader for text extraction. The front-end and back-end are on separate servers, and the app is only intended for local use, so that shouldn't be an issue. I'm able to console.log the text perfectly. I would like to work with the text in JSON format in some way. I would like to append the text to the response back to the front-end, but I can't seem to hold it until the response is sent.
const IncomingForm = require("formidable").IncomingForm;
const { PdfReader } = require('pdfreader');
const test = new PdfReader(this,1);
module.exports = function upload(req, res) {
let str = ''
let form = new IncomingForm();
form.parse(req, () => {
console.log('parse')
});
form.on("file", (field, file) => {
test.parseFileItems(file.path, (err, item) => {
if (err){
console.log(err)
}
else if (item){
if (item.text){
console.log(item.text)
str += item.text
}
}
})
});
form.on("end", () => {
console.log("reached end/str: ", str)
});
};
I've attempted a number of different ways of handling the async functions, primarily within form.on('file'). The following attempts at form.on('file') produce the same effect (the text is console.logged correctly but only after form.on('end") is hit:
//Making the callback to form.on('file') async then traditional await
form.on("file", async (field, file) => {
//...
await test.parseFileItems(...)
//...
console.log(str) //After end of PDFReader code, shows blank
//Making cb async, then manually creating promise
form.on("file", async (field, file) => {
//...
let textProm = await new Promise ((res, rej) => //...
I've also attempted to convert the text manually from the Buffer using fs.readFile, but this also produces the same effect; I can only access text after form.end is hit.
A few things I see is that form.on('file') is hit first, then form.parse. It seems maybe I'm attempting to parse the document twice (Formidable and Pdfreader), but this is probably necessary.
Also, after reading through the docs/stackoverflow, I think I'm mixing the built-in middleware with form.parse/form.on/form.end with manual callbacks, but I was unsure of how to stick with just one, and I'm still able to access the text.
Finally, PDFReader accesses text one line at a time, so parseFileItems is run for every line. I've attempted to resolve a Promise.all with the PdfReader instance, but I couldn't get it to work.
Any help would be greatly appreciated!

Resources