Exporting data from Firestore to GCS

Exporting data from Firestore to GCS - node.js

I was trying to export data from firestore to google cloud storage using this code snippet
const functions = require('firebase-functions');
const firestore = require('#google-cloud/firestore');
const client = new firestore.v1.FirestoreAdminClient();
const bucket = 'gs://BUCKET_NAME';
exports.scheduledFirestoreExport = functions.pubsub.schedule('every 24 hours').onRun(async() => {
const projectId = process.env.GCP_PROJECT || process.env.GCLOUD_PROJECT;
const databaseName = client.databasePath(projectId, '(default)');
const response = await client.exportDocuments({
name: databaseName,
outputUriPrefix: bucket,
collectionIds: [],
});
console.log(`Backup Successful :${response}`, {response});
//here I am trying to import the data to bigquery
});
The problem I am facing is the client.exportDocuments completes a few milliseconds before the time of file creation in Google Cloud Storage Bucket. So when I am trying to access it for importing it says no such file exist.URL is wrong.
Any suggestions on this?

Here's the underlying method
databases.export documents.
The response is an Operation which is a potentially long-running process on GCP.
You'll need to poll (I think there's no way to subscribe to) the Operation endpoint until the job succeeds or fails.
If it completes, you could then begin the BigQuery job.
See: Managing Export and Import operations
However, this will likely exceed the timeout on the Cloud Function and should probably not be attempted during a single function invocation.
You may want to consider creating another process that's triggered once the export completes. I've not done this. It's possible that you could create a background function that's triggered by the GCS event. I don't know.

Related

Can I schedule a Google Cloud Task on the client side to call a cloud function with a payload?

I want to make sure I'm thinking about Cloud Tasks right conceptually, and not sure that I am.
The examples I've been looking at seem to trigger a cloud function first that then schedules a task, that then calls a cloud function again.
(Or at least this is what I'm understanding, I could be wrong).
I'd like to set up something so that when a user clicks a button, it schedules a cloud task for some time in the future (anywhere from 1 minute to an hour and half). The cloud task then triggers the cloud function to upload the payload to the db.
I tried to set this up client side but I've been getting the error "You need to pass auth instance to use gRPC-fallback client in browser or other non-Node.js environments."
I don't want the user to have to authenticate if that's what this is saying (not sure why I'd have to do that for my use case).
This is the code that gives that error.
const {CloudTasksClient} = require('#google-cloud/tasks');
const client = new CloudTasksClient();
// import { Plugins } from '#capacitor/core';
// const { RemotePlugin } = Plugins;
const scheduleTask = async(seconds) => {
async function createHttpTask() {
const project = 'spiral-productivity';
const queue = 'spiral';
const location = 'us-west2';
const url = 'https://example.com/taskhandler';
const payload = 'Hello, World!';
const inSeconds = 5;
// Construct the fully qualified queue name.
const parent = client.queuePath(project, location, queue);
const task = {
httpRequest: {
httpMethod: 'POST',
url,
},
};
if (payload) {
task.httpRequest.body = Buffer.from(payload).toString('base64');
}
if (inSeconds) {
// The time when the task is scheduled to be attempted.
task.scheduleTime = {
seconds: inSeconds + Date.now() / 1000,
};
}
// Send create task request.
console.log('Sending task:');
console.log(task);
const request = {parent: parent, task: task};
const [response] = await client.createTask(request);
console.log(`Created task ${response.name}`);
}
createHttpTask();
// [END cloud_tasks_create_http_task]
}
More recently I set up a service account and download a .json file and all of that. But doesn't this mean my users will have to authenticate?
That's why I stopped. Maybe I'm on the wrong track, but if anyone wants to answer what I need to do to schedule a cloud task from the client side without making the user authenticate, it would be a big help.
As always, I'm happy to improve the question if anything isn't clear. Just let me know, thanks!

Yes.
Your understanding is mostly accurate. Cloud Tasks is a way to queue "tasks". The examples are likely using Cloud Functions as an analog for "some app" (a web app) that would be analogous to your Node.js (web) app, i.e. your Node.js app can submit tasks to Cloud Tasks. To access Google Cloud Platform services (e.g. Cloud Tasks), you need to authenticate and authorize.
Since your app is the "user" of the GCP services, you're correct in using a Service Account.
See Application Default Credentials to understand authenticating (code) as a service account.
Additionally, see Controlling access to webapps.

Firebase scheduled cloud function not deleting nodes

I've been having several issues with my Firebase scheduled function. I'm reviewing the logs and documentation and it's running and logs as "Success" and OK, but the desired nodes of my Realtime Database are not updating/deleting. Here is my index.js code...
// The Cloud Functions for Firebase SDK to create Cloud Functions and setup triggers.
const functions = require('firebase-functions');
// The Firebase Admin SDK to access the Firebase Realtime Database.
const admin = require('firebase-admin');
admin.initializeApp();
exports.scheduledFunction = functions.pubsub.schedule('every 1 minutes').onRun((context) => {
const ref = admin.database().ref('messages/{pushId}');
var now = Date.now();
var cutoff = now - 24 * 60 * 60 * 1000;
var oldItemsQuery = ref.orderByChild('timestamp').endAt(cutoff);
return oldItemsQuery.once('value')
.then(snapshot => {
// create a map with all children that need to be removed
var updates = {};
snapshot.forEach(function (child) {
updates[child.key] = null
});
// execute all updates in one go and return the result to end the function
return ref.update(updates);
});
});
Here is how I've structured the data in my database reference that I want deleted at "cutoff"...
I've updated my firebase functions to the latest version. The only other issue that may be causing this is a Warning that I should
Consider adding ".indexOn": "timeStamp" at /messages/{pushId} to your
security rules for better performance.
Since this was a warning and the function worked fine when it was .onWrite I'm not sure this is it.

The problem is here:
const ref = admin.database().ref('messages/{pushId}');
The {pushId} in there makes no sense, and is causing you to query the wrong node.
You'll want to query and update:
const ref = admin.database().ref('messages');

how to get files from Cloud Storage for Firebase that created at a certain time?

I need to delete images at eventPoster folder in my cloud storage that has last modified metadata more than 6 months ago.
so I want to make a cron job using Cloud Function, and I need to 'query' that files that has last modified more than 6 months ago
how to do that using Admin SDK?

I think I find the solution
according to this answer, There is no way to query by metadata
so, as for now I use the code below to get out of date files
import * as admin from "firebase-admin";
import * as moment from "moment";
const app = admin.initializeApp(); // I am using Google Cloud Function, much more simple to initialize the app
const bucket = app.storage().bucket();
const response = await bucket.getFiles({prefix: "eventPoster/"}); // set your folder name in prefix here
const files = response[0];
const sixMonthAgo = moment().subtract(6, "months").toDate();
const outOfDateFiles = files.filter((file) => {
const createdTime = new Date(file.metadata.timeCreated);
return moment(createdTime).isBefore(sixMonthAgo);
});
console.log(`number of outdated files: ${outOfDateFiles.length}`);
read the documentation in here for further reading

Is it safe to export Firestore data multiple times to the same storage folder? Could the overwrite break the exported data?

Here is how I'm exporting my Firestore data:
import { today } from "#utils/dates/today";
import * as admin from "firebase-admin";
admin.initializeApp({
credential: admin.credential.cert(
MY_SERVICE_ACCOUNT as admin.ServiceAccount
)});
const client = new admin.firestore.v1.FirestoreAdminClient();
const BUCKET = "gs://MY_PROJECT_ID.appspot.com/firestore-backup";
const PROJECT_ID = "MY_PROJECT_ID";
const DB_NAME = client.databasePath(PROJECT_ID, "(default)");
export const backupData = async () : Promise<void> => {
const todayDate = today(); // THIS IS YYYY-MM-DD STRING
// const hashId = generateId().slice(0,5);
const responses = await client.exportDocuments({
name: DB_NAME,
outputUriPrefix: `${BUCKET}/${todayDate}`,
collectionIds: []
});
const response = responses[0];
console.log(`Operation Name: ${response['name']}`);
return;
};
You see I'm exporting to the following path:
/firestore-backup/YYYY-MM-DD/
If I'm going to backup multiple times over the same day, can I use same date folder? Is it safe to do it? Or should I add a hash to the folder name to avoid overwriting the previous export?
PS: The overwrite on a single day is not a problem. I just don't want to break the exported data.

If you go to the bucket and check the exports you'll see that the files exported seem to follow the same pattern every time. If we were to rely only on the write/update semantics of Cloud Storage, whenever there's a write to a location where a file already exists it is overwritten. Therefore, at first it doesn't seem it would cause data corruption.
However, the assumption above relies on the internal behavior of the export operations, which may be subject to future change (let aside that I can't even guarantee them as of now). Therefore, the best practice would be appending a hash to the folder name to prevent any unexpected behavior.
As an additional sidenote, it's worth mentioning that exports could incur in huge costs depending on the size of your Firestore data.

How to get inner child in cloud function for Firebase?

Here is my database and I want to trigger onWrite event on children of PUBLISHED_CONTENT_LIKES. When I add another userId under publishedContentId1, I can identify contentId as publishedContentId1 in my cloud function using event.params.pushId.
exports.handleLikeEvent = functions.database.ref('/USER_MANAGEMENT/PUBLISHED_CONTENT_LIKES/{pushId}')
.onWrite(event => {
// Grab the current value of what was written to the Realtime Database.
//const userId = event.data.child(publishedContentId);
//const test = event.params.val();
const publishedContentId = event.params.pushId;
var result = {"publishedContentId" : "saw"}
// You must return a Promise when performing asynchronous tasks inside a Functions such as
// writing to the Firebase Realtime Database.
// Setting an "uppercase" sibling in the Realtime Database returns a Promise.
return event.data.ref.parent.parent.child('PUBLISHED_CONTENTS/'+publishedContentId).set(result);
});
However I want to get newly added userId as well. How to get that userId using above event?

You can get the data that is being written under event.data. To determine the new user ID:
event.data.val().userID
I recommend watching the latest Firecast on writing Database functions as it covers precisely this topic.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Exporting data from Firestore to GCS - node.js

Related

Can I schedule a Google Cloud Task on the client side to call a cloud function with a payload?

Firebase scheduled cloud function not deleting nodes

how to get files from Cloud Storage for Firebase that created at a certain time?

Is it safe to export Firestore data multiple times to the same storage folder? Could the overwrite break the exported data?

How to get inner child in cloud function for Firebase?

Categories

Resources