What is the best way to have a schedule of events in HaxeFlixel? In other words, I would like to have a function onTime(time:Int, function:Data->Void, data:Data):Void which causes function(data) to be executed at time time. I would then make a .json file with a whole list of timestamped functions and data and iterate through it so that each function is called with the specified data at the specified time.
Related
Let's say, I have 1000 documents in a Firestore collection.
How do I execute the same 1 Cloud Function but 10 times in parallel to process 100 documents each, say every 5 minutes?
I am aware I can use a Scheduler for the "every 5 minutes" part. The objective here is to distribute the load using multiple executions of the same function in parallel to handle the tasks. When the collection grows, I would like to add more instances. For example, let's say 1 execution per 100 documents.
I don't mind having another (or more) function to handle the distribution itself, and I don't mind the number of executions. I just don't want to loop through a large collection and process the tasks in a single function execution.
The numbers given above are examples. I am also open to using other services within GCP.
If you wanna execute the Cloud Function every time some changes occur in the Firestore documents, then you can use Cloud Firestore Trigger in Cloud Functions. The Cloud Function basically waits for changes, triggers when an event occurs and performs its tasks. You can go through these documents on Firestore triggers: Google Cloud Firestore Trigger, Cloud Firestore Triggers.
In case you are concerned that Cloud Function will not be able to process the requests parallely, then you should check out this document. Cloud Functions handle incoming requests by assigning it to an instance, in case the volume of requests increases, the Cloud Functions will start new instances to handle the requests.
Let's assume you have a function that, when called, process the single document and does anything you need with it. Let's call that function doSomething and let's assume it takes the document's path as parameter.
Then, you can create a function that will be scheduled every 5 minutes. In this function, you'll retrieve all the documents, holding them in an array (let's call it documents) and do something like:
const doSomething = httpsCallable(functions, 'doSomething');
let calls = [];
documents.map((document) => {
calls.push(
doSomething({path: document.path})
);
});
await Promise.all(calls);
This will create an array of calls, then it will fire all the calls at once, obtaining parallel executions of the same function.
I am trying to retrieve a specific step function execution input in the past using the list_executions and describe_execution functions in boto3, first to retrieve all the calls and then to get the execution input (I can't use describe_execution directly as I do not know the full state machine ARN). However, list_executions does not accept a filter argument (such as "name"), so there is no way to return partial matches, but rather it returns all (successful) executions.
The solution for now has been to list all the executions and then loop over the list and select the right one. The issue is that this function can return a max 1000 newest records (as per the documentation), which will soon be an issue as there will be more than 1000 executions and I will need to get old executions.
Is there a way to specify a filter in the list_executions/describe_execution function to retrieve execution partially filtered, for ex. using prefix?
import boto3
sf=boto3.client("stepfunctions").list_executions(
stateMachineArn="arn:aws:states:something-something",
statusFilter="SUCCEEDED",
maxResults=1000
)
You are right that the SFN APIs like ListExecutions do not expose other filtering options. Nonetheless, here are two ideas to make your task of searching execution inputs easier:
Use the ListExecutions Paginator to help with looping through the response items.
If you know in advance which inputs are of interest, add a Task to the State Machine to persist execution inputs and ARNs to, say, a DynamoDB table, in a manner that makes subsequent searches easier.
If I schedule a timer triggered Azure function to run every second and my function is taking 2 seconds to execute, will I just get back-to-back executions or will some execution queue eventually overflow?
Background:
We have an timer triggered Azure function that is currently executing every 30 seconds and is checking for new rows in a database table. If there are new rows, the data will be processed and the rows will be marked as handled.
If there are no new rows the execution is very fast. If there are 500 new rows (which is the max we are fetching at the moment) the execution takes about 20-25 seconds.
We would like to decrease the interval to one second to reduce the latency or row processing.
Update: I want back-to-back executions and I want to avoid overlapping executions.
Multiple azure functions can run concurrently. This is means you can still trigger the function again while the previous triggered function is still running. They will both run concurrently. They will only queue up if you setup options to run only 1 function at a time on 1 instance but doesn't look like you want that.
With concurrency, this means that 2 functions will read the same table on the DB at the same time. So you should read your table with UPDLOCK option LINK. This will prevent the subsequent triggered function from reading the same rows that were read in the previous function.
In short, the answer to your question is neither. If your functions overlap, by default, you will get multiple functions running at the same time. LINK
To achieve back to back execution for time triggers, set WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT and FUNCTIONS_WORKER_PROCESS_COUNT as 1 in the application settings configuration. This will ensure only 1 function executes runs at a time . See this LINK.
I have a lambda written in node js which executes every 15 minutes. I need to compare the records processed in the first execution (list of strings to indicate all the records) in the next execution and avoid processing the same records based on the string comparison. So basically first execution will store the info in a list of strings and then in the second execution, I would first compare the records about to get processed with each string present in the collection from first execution. Once done processing the fresh records in the second execution of lambda, I will then replace the string collection.list with new records for the same comparison in the third execution.
I figured that we should not be using any global variable as they tend to get changed in the execution.
So is there a way to achieve it?
No you can't save variables like that using Lambda. You can however save the list in a text file on s3. Read this file during next execution and make the necessary edits for the execution after that.
Background
I have a Node and React based application. I'm using Firebase for my storage and database. In my application users can fill out a form where they upload an image and select a time for the image to be added to their website. I save each image update as an object in my Firebase database like so. Images are arranged in order of ascending update time.
user-name: {
images: [
{
src: 'image-src-url',
updateTime: 1503953587727
}
{
src: 'image-src-url',
updateTime: 1503958424838
}
]
}
Scale
My applications db could potentially get very large with a lot of users and images. I'd like to ensure scalability.
Issue
How do I check when a specific image objects time has been met then execute a function? (I do not need assistance on the actual function that is being run just the checking of the db for a specific time.)
Attempts
I've thought about doing a cron job using node-cron that checks the entire database every 60s (users can only specify the minute the image will update, not the seconds.) Then if it finds a matching updateTime and executes my function. My concern is at a large scale that cron job will take a while to search the db and potentially miss a time.
I've also thought about when the user schedules a new update then dynamically create a specific cron job for that time. I'm unsure how to accomplish this.
Any other methods that may work? Are my concerns about node-cron not valid?
There are two approaches I can think of:
Keep track of the last timestamp you processed
Keep the "things to process" in a queue
Keep track of the last timestamp you processed
When you process items, you use the current timestamp as the cut-off point for your query. Something like:
var now = Date.now();
var query = ref.orderByChild("updateTime").endAt(now)
Now make sure to store this now somewhere (i.e. in your database) so that you can re-use it next time to retrieve the next batch of items:
var previous = ... previous value of now
var now = Date.now();
var query = ref.orderByChild("updateTime").startAt(previous).endAt(now);
With this you're only processing a single slice at a time. The only tricky bit is that somebody might insert a new node with an updateTime that you've already processed. If this is a concern for your use-case, you can prevent them from doing so with a validation rule on updateTime:
".validate": "newData.val() >= root.child('lastProcessed').val()"
As you add more items to the database, you will indeed be querying more items. So there is a scalability limit to this approach, but this approach should work well for anything up to a few hundreds of thousands of nodes (I haven't tested in a while so ymmv).
For a few previous questions on list size:
Firebase Performance: How many children per node?
Firebase Scalability Limit
How many records / rows / nodes is alot in firebase?
Keep the "things to process" in a queue
An alternative approach is to keep a queue of items that still need to be processed. So the clients add the items that they want processed to the queue with an updateTime of when they want to processed. And your server picks the items from the queue, performs the necessary updates, and removes the item from the queue:
var now = Date.now();
var query = ref.orderByChild("updateTime").endAt(now)
query.once("value").then(function(snapshot) {
snapshot.forEach(function(child) {
// TODO: process the child node
// remove the child node from the queue
child.ref.remove();
});
})
The difference with the earlier approach is that a queue's stable state is going to be empty (or at least quite small), so your queries will run against a much smaller list. That's also why you won't need to keep track of the last timestamp you processed: any item in the queue up to now is eligible for processing.