Getting dates for historical data copy in Azure datafactory - azure

I have to copy historical data from a rest api for one year eg., from March 1,2019 to March1,2020.The Rest API takes a start and end date as params.
To prevent load, however I have to call the API in pieces like copy with the start and end date as March1,2019 to March 30,2019....Once thats done then April 1,2019 to april 30,2019 and so on till March 1 ,2020 automatically and without manual intervention.
I was able to use utc now and add days for copying data for previous day to current startofday but am unable to figure out the copy of historical data.Any idea if this is possible?

You can try something like this:
1.create two variable named start_date and end_date.
2.create a Until activity, and type this expression:#greaterOrEquals(variables('end_date'),'2020-03-01T00:00:00Z')
3.create a Copy activity
Source setting:
Source dataset:
Sink dataset:
4.create tow Set variable activity to change start_date and end_date
Result:
By the way, you can change your date format according you need. Reference https://learn.microsoft.com/en-us/dotnet/standard/base-types/standard-date-and-time-format-strings.

Related

Cognos scheduled report e-mail with current date

I want to schedule the mailing of a Cognos report, always using the current date. So, for example: today, I want to e-mail the report with data up until the 28th, but tomorrow it should contain data up until the 29th.
How can I achieve this?
Thanks in advance!
If you're expecting code, you didn't provide enough information, but let me try...
Assuming the "date" data against which you want to filter is in a query item named [Date] which is in a query subject named [Time] which is in a namespace named [namespace], create a filter like this:
[namespace].[Time].[Date] = current_date
If you want up to the current date which includes the days leading up to it you can use what dougp posted slightly modified.
[namespace].[Time].[Date] <= current_date
to ensure the where clause is pushed down to the database, I personally like to use a macro for current_date. So the expression becomes
[namespace].[Time].[Date] <= # timestampMask ( $current_timestamp , 'yyyy-mm-dd' ) #

Copy data every 1 minute from DataLake by DataFactory

I have a Data Lake storage with the following folder structure:
{YEAR}
- {MONTH}
- {DAY}
- {HOUR}
- {sometext}_{YEAR}_{MONTH}_{DAY}_{HOUR}_{Minute}_{someuuid}.json
example
Could you please help me to configure Data Factory Copy data action?
I need to run Trigger every 1 minute - to copy data from Data Lake by previous minute to Cosmos DB
I've tried this
where the first expresion is
#formatDateTime(utcnow(),'yyyy/MM/dd/HH')
and the second one
#{formatDateTime(utcnow(),'yyyy')}_#{formatDateTime(utcnow(),'MM')}_#{formatDateTime(utcnow(),'dd')}_#{formatDateTime(utcnow(),'HH')}_#{formatDateTime(addMinutes(utcnow(), -1),'mm')}*.json
But it can skip some data, especially when Hour changes.
I'm a new in Data Factory and don't know what is the more efficient way how to do that. Please help
The Pipeline Expression Language has a number of Date functions built in. You can use the addMinutes function to add 1 minute.
To avoid clock skew, I would capture the utcnow() value and store it without any formatting:
In another variable, add a minute to the captured value rather than executing utcnow() again:
Once you have those variables, just use them to format the date string(s).
Result:
NOTE: use concat with the formatDateString to get the wildcard value you want:
Result:

How to specify query for Azure Data Factory Source (Azure Table Storage) for yesterday's records

I am copying records from an Azure Storage Table (source) to Azure Storage Table (sink) everyday. So if I am executing the query on December 24th 2019 (UTC) for instance, then I want to copy records for December 23rd 2019 (UTC). The query works and is doing what I intend it to do. Here is the query:
Timestamp ge datetime'2019-12-23T00:00Z' and Timestamp lt datetime'2019-12-24T00:00Z'
In the query above, the Timestamp column is automatically stamped in the Azure Storage Table when a new record is inserted in it. That is how Azure Storage Table works.
And here is the screenshot of the Data Factory Pipeline:
I want to parameterize the query now. That is: if the query is run on 24th December, 2019 then it should copy 23rd December 2019's records and keep sliding as it executes everyday on a schedule. I don't know how to do that. I know that there is a utcNow function and there is a subtractFromTime Function. I just don't know how to put it together.
#4c74356b41, Thank you for your kind support. Based on your answers and some more googling, I was able to piece it together. Here is the final expression:
Timestamp ge #{concat('datetime','''',addDays(startOfDay(utcNow()), -1),'''')} and Timestamp lt #{concat('datetime','''',startOfDay(utcNow()),'''')}
You can do something like this:
addDays(startOfDay(utcNow()), -1)
this would find the start of the previous day
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions#date-functions

Hive query manipulation for missing data not produced on non-business days (weekends & holidays)

I've a query regarding some tweaking my Hive query in the requirement defined below; couldn't get my head around on this.
Case: The data gets generated only on business days i.e., weekdays & non-holidays dates. This data I load in Hive. The source & target, both are HDFS.
Stringent process: The data should be replicated for every day. So, for Saturday & Sunday, I'll copy the same data of Friday. Same is the case for public holidays.
Current process: As of now I'm executing it manually to load weekends' data.
Requirement: I need to automate this in the query itself.
Any suggestions? A solution in spark for the same is also welcome if feasible.
Though clear what the issue is, it is unclear when you say " in the query itself".
Two options
When querying results, look for data using a scalar sub query (using Impala) that looks first for the max date relative to a given select date i.e. max less than or dqual to given seldct date; thus no replication.
Otherwise use scheduling and when scheduled a) check date for weekend via Linux or via SQL b) maintain a table of holiday dates and check for existence. If either or both of the conditions are true, then copy from the existing data as per bullet 1 whereby select date is today, else do your regular processing.
Note you may need to assume that you are running processing to catch up due to some error. Implies some control logic but is more robust.

Azure Table Storage date time comparison with UTC and local time difference

I am trying to store and retrieve three datetime columns on azure table storage.
Start date, End date and Last Executed date.
I have another fourth column which is called timeIntervalInMinutes.
My code execution will happen on an azure VM which might be on any zone in US.
I am going to use these values to execute some task based on current time:
start date >= current date <= end date
Current time = Last Executed Date (time) + intervalInMinutes
There are few doubts that I have:
The user is going to call this worker service on azure from their app, from any timezone US. Do I need to get their timezone in their request and store it along with the date time, which will get stored in UTC format on table storage?
If Yes, then if the user tries to retrieve this configuration information, should I convert based on UTC to local time conversion?
I need to execute my custom task based on the local time of the user. So when converting to UTC and storing it in the azure table, I need to say UTC + or - to suit the local time of the user's timezone?
Please help.
The OP wrote in a comment:
I have solved this by adding entity properties with DateTimeOffset type and created separate fields for holding UtcOffset timespan value as ticks (long type). With this I can compare the UTC format timestamp with UtcOffset values and date as well for my scheduler to work. So it's now easy for me to query against date and as well as retrieve and present the user back with his / her own specified local time.

Resources