Retrieve the result of a Foxx activity scheduled as a job - arangodb

Which is the best way to retrieve, in a second time, the result of a scheduled job executed by a Foxx activity?
The only way that I have found is to store the job result within the success function callback storing it into a dedicated collection or adding a property to the document that represents the completed job updating the "_job" system collection.
I have tried invoking the Foxx endpoint specifying the "x-arango-async: store" HTTP header: ArangoDB provides the job ID within the response header "X-Arango-Async-Id" but I can not retrieve the job result through the endpoint PUT _api/job/{jobId}.
Thanks a lot for any appreciated response.

Related

Is it possible to add a callback method in the backend to perform once a query finishes?

I'm setting up a service that will email a user the data generated by a Cubejs query. I'd like to have Cubejs notify the email-sending service (perhaps through SNS) that new data is available for sending: Is this possible? Are there better options for allowing asynchronous access to query results?
Perhaps you could look into WebSocketTransport, part of the real-time data fetch mechanism?

Firebase Cloud Express queue for storage resource to be generated

I have a large dataset stored in a Firestore collection and a Nodejs express app (exposed as a firebase functions.https.onRequest) with an endpoint which allows users to query this dataset and download large amounts of data.
I need to return the data in CSV format from the endpoint. Because there is a lot of data, I want to avoid doing large database reads each time the endpoint is hit.
My current endpoint does this:
User hits endpoint with a database query requesting documents within a range
Query is hashed into a filename. eg query_"startRange"_"endRange".csv
Check Firebase storage to see if this query has been run before
if the csv already exists:
return a 302 redirect to the csv file with a signed url
if the csv doesn't exist:
Run the query on the Firestore collection
Transform the data into the appropriate CSV format
upload the new CSV to Firebase storage
return a 302 redirect to the newly generated csv file with a signed url
This process is currently working really well, except I can already foresee an issue. The CSV generation stage takes roughly 20s for large queries and there is a high possibility of the same request being hit from multiple users at the same time.
I want to build in some sort of queuing system so that if X number of users hit the endpoint at once, only the first request triggers the generation of the new CSV and the other (X-1) requests will be queued and then resolved once the CSV is generated.
I have currently looked into firebase-queue which appears to be deprecated and not intended to be used with Cloud functions.
I have also seen other libraries like p-queue but I'm not sure I understand how that would work with Firebase Cloud functions and how seperate instances are booted for many requests.
I think that in your scenario the queue approach wouldn't work quite well with Cloud Functions. The queue cannot be implemented in a function as multiple instances won't know about each other, therefore the queue would need to be implemented in some kind of dedicated server, which IMO defeats the purpose of using Cloud Functions as both the queue and the processing could be ran in the same server.
I would suggest having a collection in Firestore that keeps track of the queries that have been requested. This way even if the CSV file isn't still saved on Storage you could check if some function instance is already creating it, then you could sleep the function until the operation completes and return the signed url. Overall the algorithm might look somewhat like this:
# Python PseudoCode
if csv_in_storage():
return signed_url()
if query_in_firestore():
while True:
sleep(X)
if csv_in_storage():
return signed_url()
try:
add_query_firestore()
csv = create_csv()
upload_csv(csv)
return signed_url()
except Exception:
while True:
sleep(X)
if csv_in_storage():
return signed_url()
The final try/catch is there because the add_query_firestore operation might eventually fail if two functions make simultaneous attempts to write the same document into Firestore. Nonetheless this are also good news since you know the CSV creation is in progress and you can wait for it to complete.
Please keep in mind the pseudocode above is just to illustrate the idea, having the while True as it is may lead to infinite loop and function timeout which is plain bad :).
I ended up solving this using a solution similar to what Happy-Monad suggested. I'm using the nodejs admin SDK but the idea is similar.
There is a collection in Firestore which keeps track of executed queries Queries. When a user hits the endpoint, I call the admin doc("Queries/<queryId>").create() method. This method will only create the query doc if it doesn't already exist, so I am able to avoid race conditions between parallel requests if I were to check for existing queries first.
Next the request starts an onSnapshot listener to the query that it attempted to created. The query has a status field which starts as created. The onSnapshot will only resolve once that status has changed to complete.
I have onCreate database trigger listening to "Queries/*". This database trigger handles the requested query and updates the query status to complete. In the case that the query already exists, the status is already in the complete state, so the onSnapshot resolves instantly.

Run a Google Cloud Function at specific time only once

I have job(script) which is written in nodejs.
I have another API which writes data (Id and time-t1) to the Cloud Spanner. As soon as the API is a hit I want to run the same job at given time-t1 and pass id as parameter
Can I write some code in my API which will trigger the job at a given time (Note - for a single hit on API job should run job only once). I tried searching on the net but could only find periodic scheduler.
In order to schedule a task for a specific dynamic time you can use Google Cloud Task and Google Cloud Functions
Read it here:
https://cloud.google.com/tasks/docs/tutorial-gcf

Azure scheduler - dynamic content in POST

Is it possible to use dynamic content in the POST body for a scheduled job in Azure scheduler?
I am writing a logic app that I would like to be able to pass a start time and a look back minute count to so that a failed invocation can be re-run across the same underlying data by passing the same parameters. I'm wondering if there are functions or operations similar to what can be found in logic apps for values such as utcNow()
We do not support dynamic content in Scheduler, you may find some timestamp in the request header in the calls Scheduler made though.
Why are you not using Logic Apps when it can perform what you need?

Spark Machine learning design model from web application

I have developed a web application where user can choose machine learning framework/ number of iterations/ some other tuning parameter. How can I invoke Spark job from user interface by passing all the inputs and display response to user. Depending on the framework (dl4j/ spark mllib/ H2o) user can either upload input csv or the data can be read from Cassandra.
How can I call SPARK job from user interface?
How can I display the result back to user?
Please help.
You can take a look at this github repository.
In this what is being done is as soon as a GET request is arrived, it takes out the data from the Cassandra and then Collect the data and throws it back as the response.
So in your case :
What you can do is , as soon as you recieve a POST request , you can get the parameters from the request and perform the operations accordingly using these parameters and the collect the Result on the master and then throw it back to the user as the Response.
P.S: Collecting on Master is a bit tricky and lot of data can cause OOM. What you can do is save the results on hadoop and send back the URL to the Results or something like that.
For more info look into this blog related to this github:
https://blog.knoldus.com/2016/10/12/cassandra-with-spark/

Resources