I know and have used the standard Nifi generateFlowFileProcessor running on a cron schedule. As an example lets assumme my nifi flow will execute a SQL DELETE and INSERT statement for two groups of tables on two schedule times.
The simplest implementation will have two generateFlowFileProcessors running on the different schedules,
Nifi Cron Expression -> Group
0 0/15 * * * * ? -> 15min tables schedule
0 0/30 * * * * ? -> 30min tables schedule
which load the list tables from a json file which then split to result in each table getting reloaded.
{
"tablesToLoad":
[
{
"sourceTable": "TableA",
"targetTable": "TableB",
},
{
"sourceTable": "TableC",
"targetTable": "TableD",
}
]
}
My problem is this doesn't scale well, if need to add other schedules and more table groups. I need to add more generateFlowFileProcessors and schedules to the flow for each reconfiguration.
My preference is to have a single generateFlowFileProcessor which runs on a 5min schedule. The exact schedule that each table runs will be defined as part of the externally loaded config json blob
{
"tablesToLoad":
[
{
"sourceTable": "TableA",
"targetTable": "TableB",
"schedule": "0 0/15 * * * * ?"
},
{
"sourceTable": "TableC",
"targetTable": "TableD",
"schedule": "0 0/30 * * * * ?"
},
{
"sourceTable": "TableC",
"targetTable": "TableD",
"schedule": "0 0 0/5 * * * ?"
}
]
}
Given that the flowFile will be generated with a currentTimeStamp, is there a simple or recommended way in which I could evalue the $schedule expression above with the currentTimestamp to determine if the cron expression matches or not?
Related
I'm trying to create a CRON for every minute in my Azure Function timer trigger.
As per the documentation I found this: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-timer#cron-examples
"0 */1 * * * *" doesn't run at all.
"*/1 * * * * *" does run every second.
Where am I going wrong?
function.json looks like this:
{
"bindings": [
{
"name": "myTimer",
"type": "timerTrigger",
"direction": "in",
"schedule": "0 */1 * * * *"
}
],
"scriptFile": "../dist/TriggerWork/index.js"
}
I could reproduce your issue with {AzureWebJobsStorage} connection string entry in the local.settings.json is somehow mismatched in its format:
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "{AzureWebJobsStorage}",
"FUNCTIONS_WORKER_RUNTIME": "dotnet"
}
}
For all triggers except for HTTP, a valid AzureWebJobsStorage connection string is required. The reason behind this has to do with scaling out to multiple VMs: If the function scales out to multiple VMs and has multiple instances, a storage account is needed to coordinate to ensure that only one instance of the timer trigger is running at a time. This poses some difficulty if you are trying to develop locally but, unfortunately, this is currently a limitation of the timer trigger.
For more details, you could refer to this similar issue.
I have a MongoDB datastore set up with location data stored like this:
{
"_id" : ObjectId("51d3e161ce87bb000792dc8d"),
"datetime_recorded" : ISODate("2013-07-03T05:35:13Z"),
"loc" : {
"coordinates" : [
0.297716,
18.050614
],
"type" : "Point"
},
"vid" : "11111-22222-33333-44444"
}
I'd like to be able to perform a query similar to the date range example but instead on a time range. i.e. Retrieve all points recorded between 12AM and 4PM (can be done with 1200 and 1600 24 hour time as well).
e.g.
With points:
"datetime_recorded" : ISODate("2013-05-01T12:35:13Z"),
"datetime_recorded" : ISODate("2013-06-20T05:35:13Z"),
"datetime_recorded" : ISODate("2013-01-17T07:35:13Z"),
"datetime_recorded" : ISODate("2013-04-03T15:35:13Z"),
a query
db.points.find({'datetime_recorded': {
$gte: Date(1200 hours),
$lt: Date(1600 hours)}
});
would yield only the first and last point.
Is this possible? Or would I have to do it for every day?
Well, the best way to solve this is to store the minutes separately as well. But you can get around this with the aggregation framework, although that is not going to be very fast:
db.so.aggregate( [
{ $project: {
loc: 1,
vid: 1,
datetime_recorded: 1,
minutes: { $add: [
{ $multiply: [ { $hour: '$datetime_recorded' }, 60 ] },
{ $minute: '$datetime_recorded' }
] }
} },
{ $match: { 'minutes' : { $gte : 12 * 60, $lt : 16 * 60 } } }
] );
In the first step $project, we calculate the minutes from hour * 60 + min which we then match against in the second step: $match.
Adding an answer since I disagree with the other answers in that even though there are great things you can do with the aggregation framework, this really is not an optimal way to perform this type of query.
If your identified application usage pattern is that you rely on querying for "hours" or other times of the day without wanting to look at the "date" part, then you are far better off storing that as a numeric value in the document. Something like "milliseconds from start of day" would be granular enough for as many purposes as a BSON Date, but of course gives better performance without the need to compute for every document.
Set Up
This does require some set-up in that you need to add the new fields to your existing documents and make sure you add these on all new documents within your code. A simple conversion process might be:
MongoDB 4.2 and upwards
This can actually be done in a single request due to aggregation operations being allowed in "update" statements now.
db.collection.updateMany(
{},
[{ "$set": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}}]
)
Older MongoDB
var batch = [];
db.collection.find({ "timeOfDay": { "$exists": false } }).forEach(doc => {
batch.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$set": {
"timeOfDay": doc.datetime_recorded.valueOf() % (60 * 60 * 24 * 1000)
}
}
}
});
// write once only per reasonable batch size
if ( batch.length >= 1000 ) {
db.collection.bulkWrite(batch);
batch = [];
}
})
if ( batch.length > 0 ) {
db.collection.bulkWrite(batch);
batch = [];
}
If you can afford to write to a new collection, then looping and rewriting would not be required:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$subtract": [ "$datetime_recorded", Date(0) ] },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
Or with MongoDB 4.0 and upwards:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
All using the same basic conversion of:
1000 milliseconds in a second
60 seconds in a minute
60 minutes in an hour
24 hours a day
The modulo from the numeric milliseconds since epoch which is actually the value internally stored as a BSON date is the simple thing to extract as the current milliseconds in the day.
Query
Querying is then really simple, and as per the question example:
db.collection.find({
"timeOfDay": {
"$gte": 12 * 60 * 60 * 1000, "$lt": 16 * 60 * 60 * 1000
}
})
Of course using the same time scale conversion from hours into milliseconds to match the stored format. But just like before you can make this whatever scale you actually need.
Most importantly, as real document properties which don't rely on computation at run-time, you can place an index on this:
db.collection.createIndex({ "timeOfDay": 1 })
So not only is this negating run-time overhead for calculating, but also with an index you can avoid collection scans as outlined on the linked page on indexing for MongoDB.
For optimal performance you never want to calculate such things as in any real world scale it simply takes an order of magnitude longer to process all documents in the collection just to work out which ones you want than to simply reference an index and only fetch those documents.
The aggregation framework may just be able to help you rewrite the documents here, but it really should not be used as a production system method of returning such data. Store the times separately.
I have a function app with the following code
public static void Run([TimerTrigger("*/5 * * * * *")]TimerInfo myTimer, TraceWriter log)
This executes my function every 5 seconds. In production I want the interval to be 30 seconds. After I publish the function to Azure it works and is run every 5 seconds.
On the top of the Integrate -page in the Function settings there is a message "Your app is currently in read-only mode because you have published a generated function.json. Changes made to function.json will not be honored by the Functions runtime" and the page is greyed out.
So how do I have different schedule for my timer function in development and production?
Make your schedule configurable. Declare it like this in code:
[TimerTrigger("%schedule%")]
Then add the development setting named schedule with value */5 * * * * * and production setting with value */30 * * * * *.
This should sum up the other answers given here:
Configure local settings
add a local.settings.json file to your project.
insert the following code:
{
"Values": {
"AzureWebJobsStorage": "DefaultEndpointsProtocol=https;AccountName=XXXXXXXXXX;AccountKey=XXXXXXXXXX",
"AzureWebJobsDashboard": "DefaultEndpointsProtocol=https;AccountName=XXXXXXXXXX;AccountKey=XXXXXXXXXX",
"schedule": "*/5 * * * * *",
"//": "put additional settings in here"
},
"Host": {
"LocalHttpPort": 7071,
"CORS": "*"
},
"ConnectionStrings": {
"SQLConnectionString": "XXXXXXXXXX"
}
}
set the trigger attribute like
[TimerTrigger("%schedule%")]
Configure Azure
go to the Azure Portal and go to functions, click on your function and select Application settings
in the application setting select Add new setting
enter schedule as key and */30 * * * * * as value
click save on the top left
redeploy your function
I am trying to use Azure function with to invoke the same function with different time and different param(url).
I didn't find any good example that shows how to pass some data. I want to pass link to function.
My code is:
var rp = require('request-promise');
var request = require('request');
module.exports = function (context //need to get the url) {
and the function
{
"bindings": [
{
"name": "myTimer",
"type": "timerTrigger",
"direction": "in",
"schedule": "0 0 */1 * * *"
}
],
"disabled": false
}
If your settings are relatively static (but you still don't want to hard code them), you may use app settings to store them, and then read e.g.
let url = process.env['MyUrl'];
If URL should be determined per request, you may use HTTP trigger and read the URL from query parameters:
let url = req.query.myurl;
I'm not sure what exactly you are trying to achieve with parameterized timer-triggered function.
Another possibility is if your parameters are stored somewhere in e.g. Azure Document DB (Cosmos).
You could still use a TimerTrigger to invoke the function, and include a DocumentDB input binding allowing you to query for the specific parameter values needed to execute the function.
Here's a C# example triggered by Timer, with a DocumentDB input binding. Note: I'm using the latest VS2017 Preview tooling for Azure functions.
[FunctionName("TimerTriggerCSharp")]
public static void Run(
[TimerTrigger("0 */1 * * * *")]TimerInfo myTimer, TraceWriter log,
[DocumentDB("test-db-dev", "TestingCollection", SqlQuery = "select * from c where c.doc = \"Test\"")] IEnumerable<dynamic> incomingDocuments)
{..}
With the following binding json:
{
"bindings": [
{
"name": "myTimer",
"type": "timerTrigger",
"direction": "in",
"schedule": "0 */5 * * * *"
},
{
"type": "documentDB",
"name": "incomingDocuments",
"databaseName": "test-db-dev",
"collectionName": "TestingCollection",
"sqlQuery": "select * from c where c.docType = \"Test\"",
"connection": "my-testing_DOCUMENTDB",
"direction": "in"
}
],
"disabled": false
}
I am using apiDoc for documentation in Sails.js app. And, last week I saw someone define responses being used by multiple controllers in a file named api_definitions.js
Example
/*
* #apiDefine UserSuccessExample
* #apiSuccessExample Success-Response:
* HTTP/1.1 201 OK
* {
* "message": "User Created successfully",
* "user" : {
* "displayname": "somedisplayname",
* "lastname": "ALastName",
* "firstname": "AFirstName",
* "email": "sososo#soos.so",
* "phonenumber": "0839293288"
* },
* "token" : "ey.jkernekrerkerkeekwewekwbejwbewbebewbwkebebbwbeibwubfebfebwiee"
* }
*/
And, in each of the controllers, referenced it using the normal Use Parameter #apiUse UserSuccessExample. But when I tried it, I was getting an error in my console saying, it wasn't defined:
Error
error: Referenced groupname does not exist / it is not defined with #apiDefine.
{ File: 'api/controllers/UserController.js',
Block: 2,
Element: '#apiUse',
Groupname: 'UserSuccessExample',
Definition: '#apiUse group',
Example: '#apiDefine MyValidGroup Some title\n#apiUse MyValidGroup' }