I want to check if a file name contains a date pattern (dd-mmm-yyy) in it using if condition activity in Azure Data Factory. For example: The file name I have is like somestring_23-Apr-1984.csv which has a date pattern in it.
I get the file name using Get Metadata activity and passing it to the if condition activity where I want to check if the file name has the date pattern in it and based on the result I would like to perform different tasks. The only way I know to do this is by using regex to check if the pattern is available in the file name string but, Azure does not have a regex solution mentioned in the documentation.
Is there any other way to achieve my requirement in ADF? Your help is much appreciated.
Yes,there is no regex in expression.There is another way to do this,but it is very complex.
First,get the date string(23-Apr-1984) from the output of Get Metadata.
Then,split the date string and determine whether each part match date pattern.
Below is my test pipeline:
First Set variable:
name: fileName
value: #split(split(activity('MyGetMetadataActivity').output.itemName,'_')[1],'.csv')[0]
Second Set variable:
name: fileArray
value: #split(variables('fileName'),'-')
If Condition:
Expression:#and(contains(variables('DateArray'),variables('fileArray')[0]),contains(variables('MonthArray'),variables('fileArray')[1]))
By the way,I want to compare date with 0 and 30 originally,but greaterOrEquals() doesn't support nested properties.So I use contains().
Hope this can help you.
Related
I have a CSV File in the Following format which want to copy from an external share to my datalake:
Test; Text
"1"; "This is a text
which goes on on a second line
and on on a third line"
"2"; "Another Test"
I do now want to load it with a Copy Data Task in an Azure Synapse Pipeline. The result is the following:
Test; Text
"1";" \"This is a text"
"which goes on on a second line";
"and on on a third line\"";
"2";" \"Another Test\""
So, yo see, it is not handling the Multi-Line Text correct. I also do not see an option to handle multiline text within a Copy Data Task. Unfortunately i'm not able to use a DataFlow Task, because it is not allowing to run with an external Azure Runtime, which i'm forced to use, due to security reasons.
In fact, i'm of course not speaking about this single test file, instead i do have x thousands of files.
My settings for the CSV File look like follows:
CSV Connector Settings
Can someone tell me how to handle this kind of multiline data correctly?
Do I have any other options within Synapse (apart from the Dataflows)?
Thanks a lot for your help
Well turns out this is not possible with a CSV File.
The pragmatic solution is to use "binary" files instead, to transfer the CSV Files and only load and transform them later on with a Python Notebook in Synapse.
You can achieve this in azure data factory by iterating through all lines and check for delimiter in each line. And then, use string manipulation functions with set variable activities to convert multi-line data to a single line.
Look at the following example. I have a set variable activity with empty value (taken from parameter) for req variable.
In lookup, create a dataset with following configuration to the multiline csv:
In foreach, where I iterate each row by giving items value as #range(0,sub(activity('Lookup1').output.count,1)). Inside for each, I have an if activity with following condition:
#contains(activity('Lookup1').output.value[item()]['Prop_0'],';')
If this is true, then I concat the current result to req variable using 2 set variable activities.
temp: #if(contains(activity('Lookup1').output.value[add(item(),1)]['Prop_0'],';'),concat(variables('req'),activity('Lookup1').output.value[item()]['Prop_0'],decodeUriComponent('%0D%0A')),concat(variables('req'),activity('Lookup1').output.value[item()]['Prop_0'],' '))
actual (req variable): #variables('val')
For false, I have handled the concatenation in the following way:
temp1: #concat(variables('req'),activity('Lookup1').output.value[item()]['Prop_0'],' ')
actual1 (req variable): #variables('val2')
Now, I have used a final variable to handle last line of the file. I have used the following dynamic content for that:
#if(contains(last(activity('Lookup1').output.value)['Prop_0'],';'),concat(variables('req'),decodeUriComponent('%0D%0A'),last(activity('Lookup1').output.value)['Prop_0']),concat(variables('req'),last(activity('Lookup1').output.value)['Prop_0']))
Finally, I have taken copy data activity with a sample source file with 1 column and 1 row (using this to copy our actual data).
Now, take source file configuration as shown below:
Create an additional column with value as final variable value:
Create a sink with following configuration and select mapping for only above created column:
When I run the pipeline, I get the data as required. The following is an output image for reference.
I am trying to "beautify" the data I receive from some windows logs on Graylog. My idea is to change the windows log ID from a number to the actual definition for that ID. For example: I receive a log with ID 4625, I want to show in my widget "An account failed to log on".
To do that, I am using a pipeline and a lookup table, which reads the IDs and the respective definitions in natural language from a .csv that I've uploaded on the server.
This is the rule that I wrote for my pipeline, that doesn't seem to work:
rule "eventid_windows_rule"
when
has_field("winlogbeat_winlog_event_id")
then
let winlogbeat_winlog_italiano = lookup("winlogbeat_winlog_event_id", to_string($message.winlogbeat_winlog_event_id));
set_field("winlogbeat_winlog_italiano", winlogbeat_winlog_italiano);
end
I think my problem is specifically in this rule, because Graylog allows to test the lookup tables, and if I manually write an ID, the lookup table finds the respective description.
I solved the issue myself, this is the correct code for the rule:
rule "eventid_windows_rule"
when
has_field("winlogbeat_winlog_event_id")
then
let winlogbeat_winlog_italiano = lookup("eventid_widget_windows_lookup", $message.winlogbeat_winlog_event_id);
set_field("winlogbeat_winlog_italiano", winlogbeat_winlog_italiano);
end
This rule checks if the log has the field "winlogbeat_winlog_event_id", then it generates the new field "winlogbeat_winlog_italiano", associates the numeric value of "winlogbeat_winlog_event_id" with the description in natural language thanks to the .csv that I've created, then puts the description in the field "winlogbeat_winlog_italiano".
I'm trying to use system variable '#pipeline().TriggerTime' in a dynamic content field.
I have a 'Copy Data' activity which has a sink dataset to a folder.
Inside this Sink dataset, I try to set the filepath to
#concat('Trigger_',formatDateTime(#pipeline().TriggerTime, 'ddMMyyyyHHmmss'), '.trg')
But I get the following error message.
The activity is contained in an 'If Condition' block which itself is contained in a 'ForEach' but this variable should be global in the pipeline so I don't see why it shouldn't work.
Thanks for any help.
As Joel comments,just change "#pipeline" to "pipeline".
#concat('Trigger_',formatDateTime(pipeline().TriggerTime, 'ddMMyyyyHHmmss'), '.trg')
If you want to use multiple functions,you just add # at the beginning.
If you want to get the string of functions,you need to add double #,such as "Answer is: ##{pipeline().parameters.myNumber}" return the string Answer is: #{pipeline().parameters.myNumber}.
More detail,you can refer to this documentation.
I need to add a Dynamic Content in an ADF. In such a way that it needs to read the files from the folder with name as ‘StartDateOfMonth-EndDateOfMonth’ as below format.
Result: 20190601-20190630
Here are a few steps for how you can achieve that:
in DataSet:
create parameter "Date"
set up a connection to one selected file
now, replace "File" field with expression, similar to the following:
#concat('filedata_',dataset().Date,'.csv')
in Pipeline:
when using above DataSet, you just must pass the value, which you can set up by 'Set Variable'
I am using Data Factory v2 and I currently have a simple copy activity which copies files from an FTP server to blob storage. The file names on this server are of the following form :
File_{Year}{Month}{Day}.zip
In order to download the most recent file I add this filter to my input dataset json file :
"fileName": {
"value": "#concat('File_',formatDateTime(utcnow(), 'yyyyMMdd'), '.zip')",
"type": "Expression"
}
I now want to be able to download yesterday's file which is possible using adddays().
However I would like to be able to do this in the same copy activity and it seems that Data Factory v2 does not allow me to use the following kind of regular expression logic :
#concat('File_',formatDateTime(utcnow(), 'yyyyMMdd'), '.zip') || #concat('File_', formatDateTime(adddays(utcnow(), -1), 'yyyyMMdd'), '.zip')
Is this possible or do I need a separate activity ?
It would seem strange to need a second activity since a Copy Activity can only take a single input but if the regex is simple enough, then multiple files are treated as a single input and if not then multiple files are treated as multiple inputs.
The '||' won't work since it will be evaluate as a single string.
But I can provided two solutions for this.
using a tumbling window daily trigger and set the start time as yesterday. So it will trigger two pipeline run.
using Foreach activity + copy activity. The foreach activity iterate an array to pass the yesterday and today to the copy activity.
Btw, you could just use string interpolation expression instead of concat. They are the same.
File_#{formatDateTime(utcnow(), 'yyyyMMdd')}.zip
I would suggest you to read about get metadata activity. This can be helpful in your scenario I think.
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity
You have itemName property, lastModified property, check it out.