I'm creating a Firebase HTTP Function that makes a BigQuery query and returns a modified version of the query results. The query potentially returns millions of rows, so I cannot store the entire query result in memory before responding to the HTTP client. I am trying to use Node.js streams, and since I need to modify the results before sending them to the client, I am trying to use a transform stream. However, when I try to pipe the query stream through my transform stream, the Firebase Function crashes with the following error message: finished with status: 'response error'.
My minimal reproducible example is as follows. I am using a buffer, because I don't want to process a single row (chunk) at a time, since I need to make asynchronous network calls to transform the data.
return new Promise((resolve, reject) => {
const buffer = new Array(5000)
let bufferIndex = 0
const [job] = await bigQuery.createQueryJob(options)
const bqStream = job.getQueryResultsStream()
const transformer = new Transform({
writableObjectMode: true,
readableObjectMode: false,
transform(chunk, enc, callback) {
buffer[bufferIndex] = chunk
if (bufferIndex < buffer.length - 1) {
bufferIndex++
}
else {
this.push(JSON.stringify(buffer).slice(1, -1)) // Transformation should happen here.
bufferIndex = 0
}
callback()
},
flush(callback) {
if (bufferIndex > 0) {
this.push(JSON.stringify(buffer.slice(0, bufferIndex)).slice(1, -1))
}
this.push("]")
callback()
},
})
bqStream
.pipe(transform)
.pipe(response)
bqStream.on("end", () => {
resolve()
})
}
I cannot store the entire query result in memory before responding to the HTTP client
Unfortunately, when using Cloud Functions, this is precisely what must happen.
There is a documented limit of 10MB for the response payload, and that is effectively stored in memory as your code continues to write to the response. Streaming of requests and responses is not supported.
One alternative is to write your response to an object in Cloud Storage, then send a link or reference to that file to the client so it can read the response fully from that object.
If you need to send a large streamed response, Cloud Functions is not a good choice. Neither is Cloud Run, which is similarly limited. You will need to look into other solutions that allow direct socket access, such as Compute Engine.
I tried to implement the workaround as suggested by Doug Stevenson and got the following error:
#firebase/firestore: Firestore (9.8.2):
Connection GRPC stream error.
Code: 3
Message: 3
INVALID_ARGUMENT: Request payload size exceeds the limit: 11534336 bytes.
I created a workaround to store data in Firestore first. It works fine when the content size is below 10MB.
import * as firestore from "firebase/firestore";
import { initializeApp } from "firebase/app";
import { firebaseConfig } from '../conf/firebase'
// Initialize Firebase
const app = initializeApp(firebaseConfig);
const fs = firestore.getFirestore(app);
export async function storeStudents(data, context) {
const students = await api.getTermStudents()
const batch = firestore.writeBatch(fs);
students.forEach((student) => {
const ref = firestore.doc(fs, 'students', student.studentId)
batch.set(ref, student)
})
await batch.commit()
return 'stored'
}
exports.getTermStudents = functions.https.onCall(storeStudents);
UPDATE:
To bypass Firestore's limit when using the batch function, I just looped through the array and set (add/update) documents. Set() creates or overwrites a single document.
export async function storeStudents(data, context) {
const students = await api.getTermStudents({images: true})
students.forEach((student: Student) => {
const ref = firestore.doc(fs, 'students', student.student_id)
firestore.setDoc(ref, student)
})
return 'stored'
}
Related
I am trying to write a caching function that returns cached elasticcache data or makes an api call to retrieve that data. However, the lambda function seems to be very unrealiable and timing out often.
It seems that the issue is having redis calls as well as public api calls causes the issue. I can confirm that I have setup aws correctly with a subnet with an internet gateway and a private subnet with a nat gateway. The function works, but lonly 10 % of the time.The remaining times exceution is stopped right before making the API call.
I have also noticed that the api calls fail after creating the redis client. If I make the external api call prior to making the redis check it seems the function is a lot more reliable and doesn't time out.
Not sure what to do. Is it best practice to seperate these 2 tasks or am I doing something wrong?
let data = null;
module.exports.handler = async (event) => {
//context.callbackWaitsForEmptyEventLoop = false;
let client;
try {
client = new Redis(
6379,
"redis://---.---.ng.0001.use1.cache.amazonaws.com"
);
client.get(event.token, async (err, result) => {
if (err) {
console.error(err);
} else {
data = result;
await client.quit();
}
});
if (data && new Date().getTime() / 1000 - eval(data).timestamp < 30) {
res.send(`({
"address": "${token}",
"price": "${eval(data).price}",
"timestamp": "${eval(data).timestamp}"
})`);
} else {
getPrice(event); //fetch api data
}
```
There a lot of misunderstand in your code. I'll try to guide you to fix it and understand how to do that correctly.
You are mixing asynchronous and synchronous code in your function.
You should use JSON.parse instead of eval to parse the data because eval allows arbitrary code to be executed in your function
You're using the res.send to return response to the client instead of callback. Remember the usage of res.send is only in express and you're using a lambda and to return the result to client you need to use callback function
To help you in this task, I completely rewrite your code solving these misundersand.
const Redis = require('ioredis');
module.exports.handler = async (event, context, callback) => {
// prefer to use lambda env instead of put directly in the code
const client = new Redis(
"REDIS_PORT_ENV",
"REDIS_HOST_ENV"
);
const data = await client.get(event.token);
client.quit();
const parsedData = JSON.parse(data);
if (parsedDate && new Date().getTime() / 1000 - parsedData.timestamp < 30) {
callback(null, {
address: event.token,
price: parsedData.price,
timestamp: parsedData.timestamp
});
} else {
const dataFromApi = await getPrice(event);
callback(null, dataFromApi);
}
};
There another usage with lambdas that return an object instead of pass a object inside callback, but I think you get the idea and understood your mistakes.
Follow the docs about correctly usage of lambda:
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/using-lambda-functions.html
To undestand more about async and sync in javascript:
https://www.freecodecamp.org/news/synchronous-vs-asynchronous-in-javascript/
JSON.parse x eval: JSON.parse vs. eval()
I'm getting a little problem which I'm not being capable to debug. I wrote a little Firebase Function to get data from a JSON object and to store it in a Firestore Document. Simple.
It works, except the first time I run it after deployed (or after a long time has passed since the last execution). I have to run it once (without working), and then the subsequent tries always work, and I can see the new document being created with all the data inside it.
In the first attempt, there are no logs: Function execution took 601 ms, finished with status code: 200. Despite that, no document is being created nor changes being made.
In the second and subsequent attempts, If I request the function execution with a HTTP POST to https://cloudfunctions/functionName?id=12345, then the document '12345' is created inside collection with all the data inside it.
The collection where the documents are stored (scenarios) already exist in the database before any function call is executed.
This is the code:
const functions = require("firebase-functions");
const admin = require("firebase-admin");
admin.initializeApp();
const db = admin.firestore();
db.settings({ignoreUndefinedProperties: true});
const fetch = require("node-fetch");
let scenarioData;
const fetchScenarioJSON = async (scenarioId) => {
try {
const response = await fetch(`https://url/api/scenarios/single/${scenarioId}`);
const scenarioText = await response.text();
scenarioData = JSON.parse(scenarioText);
} catch (err) {
return ("not valid json");
}
return scenarioData;
};
/**
* Add data to Firestore.
* #param {JSON} scenario JSON array containing the scenario data.
*/
async function addDataToFirestore(scenario) {
const data = {
id: scenario.scenario._id,
name: scenario.scenario.name,
description: scenario.scenario.description,
language: scenario.scenario.language,
author: scenario.scenario.author,
draft: scenario.scenario.draft,
last_modified: scenario.scenario.last_modified,
__v: scenario.scenario.__v,
duration: scenario.scenario.duration,
grade: scenario.scenario.grade,
deleted: scenario.scenario.deleted,
view_count: scenario.scenario.view_count,
comments_count: scenario.scenario.comments_count,
favorites_count: scenario.scenario.favorites_count,
activities_duration: scenario.scenario.activities_duration,
activities: scenario.scenario.activities,
outcomes: scenario.scenario.outcomes,
tags: scenario.scenario.tags,
students: scenario.scenario.students,
created: scenario.scenario.created,
subjects: scenario.scenario.subjects,
};
const res = await db.collection("scenarios").doc(scenario.scenario._id).set(data);
}
exports.functionName =
functions.https.onRequest((request, response) => {
return fetchScenarioJSON(request.query.id).then((scenario) => {
if (typeof scenario === "string") {
if (scenario.includes("not valid json")) {
response.send("not valid json");
}
} else {
addDataToFirestore(scenario);
response.send(`Done! Added scenario with ID ${request.query.id} to the app database.`);
}
});
});
My question is if I am doing anything wrong with the code that makes the execution not work on the first call after it is deployed, but actually does work in subsequent calls.
It is most probably because you don't wait that the asynchronous addDataToFirestore() function is completed before sending back the response.
By doing
addDataToFirestore(scenario);
response.send()
you actually indicate (with response.send()) to the Cloud Function platform that it can terminate and clean up the Cloud Function (see the doc for more details). Since you don't wait for the asynchronous addDataToFirestore() function to complete, the doc is not written to Firestore.
The "erratic" behaviour (sometimes it works, sometimes not) can be explained as follows:
In some cases, your Cloud Function is terminated before the write to Firestore is fully executed, as explained above.
But, in some other cases, it may be possible that the Cloud Functions platform does not immediately terminate your CF, giving enough time for the write to Firestore to be fully executed. This is most probably what happens after the first call: the instance of the Cloud Function is still running and then the docs are written with the "subsequent calls".
The following modifications should do the trick (untested). I've refactored the Cloud Function with async/await, since you use it in the other functions.
// ....
async function addDataToFirestore(scenario) {
const data = {
id: scenario.scenario._id,
name: scenario.scenario.name,
description: scenario.scenario.description,
language: scenario.scenario.language,
author: scenario.scenario.author,
draft: scenario.scenario.draft,
last_modified: scenario.scenario.last_modified,
__v: scenario.scenario.__v,
duration: scenario.scenario.duration,
grade: scenario.scenario.grade,
deleted: scenario.scenario.deleted,
view_count: scenario.scenario.view_count,
comments_count: scenario.scenario.comments_count,
favorites_count: scenario.scenario.favorites_count,
activities_duration: scenario.scenario.activities_duration,
activities: scenario.scenario.activities,
outcomes: scenario.scenario.outcomes,
tags: scenario.scenario.tags,
students: scenario.scenario.students,
created: scenario.scenario.created,
subjects: scenario.scenario.subjects,
};
await db.collection("scenarios").doc(scenario.scenario._id).set(data);
}
exports.functionName =
functions.https.onRequest(async (request, response) => {
try {
const scenario = await fetchScenarioJSON(request.query.id);
if (typeof scenario === "string") {
if (scenario.includes("not valid json")) {
response.send("not valid json");
}
} else {
await addDataToFirestore(scenario); // See the await here
response.send(`Done! Added scenario with ID ${request.query.id} to the app database.`);
}
} catch (error) {
// ...
}
});
I am getting timeout error while testing a function that returns readStream from google cloud store.
// Test case
import { createResponse } from 'node-mocks-http';
it('getImage from google cloud storage', async done => {
let res: Response = createResponse();
let data = await controller.getImage({ image: "text data"},object, response);
});
// Function that calls google gloud storage and return object.
public async getImage(imageReqDto, baseDto, response){
let bucket = this.storage.bucket(process.env.IMAGE_BUCKET);
let file = bucket.file(image);
return file.createReadStream().pipe(response);;
}
Any solution how to pass the test case successfully after receiving the buffer data.
It depends on what you're testing for. If you're just wanting to make sure a ReadStream is returned, try using .toBeInstanceOf():
it('readstream is returned, when .getImage() is invoked with image data', async () => {
const response: Response = createResponse();
const stream = await controller.getImage({image: "text data"}, object, response);
expect(stream).toBeInstanceOf(ReadStream)
});
However a better test would be to know the expected response and assert that the ReadStream matches an expected value. If it's a small image you could load into a buffer and compare. If the image is large, you may consider storing it as base64 and comparing the actual with the expected.
I have a Lambda function that is designed to take a message from a SQS queue and then input a value called perf_value which is just an integer. The CloudWatch logs show it firing each time and logging Done as seen in the .then() block of my write point. With it firing each time I am still only seeing a single data point in InfluxDB Cloud. I can't figure out why it is only inputting a single value then nothing after that. I don't see a backlog in SQS and no error messages in CloudWatch either. I'm guessing it is a code issue or InfluxDB Cloud setup though I used defaults which you would expect to actually work for multiple data points
'use strict';
const {InfluxDB, Point, HttpError} = require('#influxdata/influxdb-client')
const InfluxURL = 'https://us-west-2-1.aws.cloud2.influxdata.com'
const token = '<my token>=='
const org = '<my org>'
const bucket= '<bucket name>'
const writeApi = new InfluxDB({url: InfluxURL, token}).getWriteApi(org, bucket, 'ms')
module.exports.perf = function (event, context, callback) {
context.callbackWaitsForEmptyEventLoop = false;
let input = JSON.parse(event.Records[0].body);
console.log(input)
const point = new Point('elapsedTime')
.tag(input.monitorID, 'monitorID')
.floatField('elapsedTime', input.perf_value)
// .timestamp(input.time)
writeApi.writePoint(point)
writeApi
.close()
.then(() => {
console.log('Done')
})
.catch(e => {
console.error(e)
if (e instanceof HttpError && e.statusCode === 401) {
console.log('Unauthorized request')
}
console.log('\nFinished ERROR')
})
return true
};
EDIT**
Still have been unable to resolve the issue. I can get one datapoint to go into the influxdb and then nothing will show up.
#Joshk132 -
I believe the problem is here:
writeApi
.close() // <-- here
.then(() => {
console.log('Done')
})
You are closing the API client object after the first write so you are only able to write once. You can use flush() instead if you want to force sending the Point immediately.
Have a very basic understanding of the Typescript language, but would like to know, how can I copy multiple documents from one firestore database collection to another collection?
I know how to send the request from the app's code along with the relevant data (a string and firebase auth user ID), but unsure about the Typescript code to handle the request...
Thats a very broad question, but something like this can move data in moderate sizes from one collection to another:
import * as _ from 'lodash';
import {firestore} from 'firebase-admin';
export async function moveFromCollection(collectionPath1: string, collectionPath2: string): void {
try {
const collectionSnapshot1Ref = firestore.collection(collectionPath1);
const collectionSnapshot2Ref = firestore.collection(collectionPath2);
const collectionSnapshot1Snapshot = await collectionSnapshot1Ref.get();
// Here we get all the snapshots from collection 1. This is ok, if you only need
// to move moderate amounts of data (since all data will be stored in memory)
// Now lets use lodash chunk, to insert data in batches of 500
const chunkedArray = _.chunk(collectionSnapshot1Snapshot.docs, 500);
// chunkedArray is now an array of arrays, with max 500 in each
for (const chunk of chunkedArray) {
const batch = firestore.batch();
// Use the batch to insert many firestore docs
chunk.forEach(doc => {
// Now you might need some business logic to handle the new address,
// but maybe something like this is enough
const newDocRef = collectionSnapshot2Ref.doc(doc.id);
batch.set(newDocRef, doc.data(), {merge: false});
});
await batch.commit();
// Commit the batch
}
console.log('Done!');
} catch (error) {
console.log(`something went wrong: ${error.message}`);
}
}
But maybe you can tell more about the use case?