Azure Stream Analytic Output - azure

Azure Stream Analytic storing directory like this. Earlier we used datepath
(2022/04/14) for path pattern and its output store separate directory in datalake. Now it has changed storing directory like
'2022%2F4%2F19'. How to solve this problem?

When you configure the Azure Stream Analytic Output you can mention the Path pattern, Date format, and Time format. All are optional properties we can use to filter out.
Path pattern
You can mention the Path pattern which is used to locate the blob within the specified container.
Do not include a path pattern if you wish to read blobs from the container's root.
You can specify one or more instances of the following three variables within the path:{date}, {time}, or {partition}`
Example : cluster1/logs/{date}/{time}/{partition}
Date format
The date format in which the files are structured if you utilize the date variable in the path.
Example:
YYYY/MM/DD
Time format
The date format in which the files are structured if you utilize the Time variable in the path. Currently, it supports only HH (Hours).
References:
Configure the blob storage as a stream
SO Thread for Dynamic Path pattern.
Stream analysis Custom path pattern

Related

Azure Data Factory removing spaces from column names of csv file

I'm a bit new to azure data factory so apologies if I'm missing anything obvious. I've done several searches and I can't find anything that quite fits.
So the situation is that we have an existing pipeline that will take the path to a csv file and pass this in as a delimited data set. As a sink it is using a parquet data set. This is a generic process that we can pass any delimited file into and it will output it as parquet.
This has been working well but now we have started receiving files with spaces and special characters in the header which causes the output to parquet to fail. Unfortunately we don't have control over the format of the files we receive so I can't handle this at source.
What I would like to do is on ingestion of the file replace any spaces and other special characters in the header with an underscore. If I were doing this on premise I could quickly create a powershell script to do it. I had thought about creating a custom task in AFD to call a powershell script to do this in the blob storage but that seems more complicated than it should be. Is there something else I can do to get this process working while keeping it generic?
As #Joel Cochran mentioned, you can use the below expression in Select transformation to replace space and special characters in the header.
regexReplace($$,'[^a-zA-Z]','_')
Source:
In Select transformation, remove the auto mappings and add new rule base mapping to use this expression.
preview:
You can change the output filename not directly in the Copy activity, assuming you are using this activity.
The workaround is to use a parameter for the filename output that you can cleanup.
You can use the Get Metadata activity to get all filenames from the source csv files.
Then loop over these files with a foreach activity.
Within the foreach activity you can set the output filename with the new name with the cleaned value.
The function could look like this:
#replace(item().name, ' ', '_')
More information on the replace function

Azure adds timestamp at the beginning logs

I have a problem with the logs retrieving from my docker containers with Azure log analytics, all logs are retrieving well but Azure adds a date at the beginning of each line of the log, which means that an entry is created for each line and I can't analyze my logs correctly because they are divided...
For example on this image I have in the black rectangle an added date (by azure I think) and in the red rectangle the date appearing in my logs :
Also, if there is no date on a line of my logs, there is still an added date on all lines, even the empty ones
The problem is that azure cuts my log file line by line by adding a date on each line when I would like it to delimit with the dates already present in my logs files.
Do you have any solutions?
One of the solution I can think of is that, when you query the logs, you can use the replace() method to replace the redundant date(replace it with a empty string etc.). And you need to write the proper regular expression for your purpose.
A false query like below:
ContainerLog
| extend new_logEntry=replace(#'xxx', #'xxx', LogEntry)
Currently Azure Monitor for containers doesn’t support multi-line logging, but there are workarounds available. You can configure all the services to write in JSON format and then Docker/Moby will write them as a single line.
https://learn.microsoft.com/fr-fr/azure/azure-monitor/insights/container-insights-faq#how-do-i-enable-multi-line-logging

Azure Data Factory Dataset Dynamic Folder Path

I have a data set that resides under a folder path where the date is dynamic (e.g. rootfolder/subfolder/yyyy/mm/dd/subfolder/subfolder), and I am trying to pull it with a copy activity. So far I cannot get Data Factory to recognize that my date is dynamic...
This is the code that I have tried so far:
["rootfolder/subfolder/subfolder/subfolder/subfoler/#{formatDateTime(utcnow(),'yyyy')}/#{formatDateTime(utcnow(),'MM')}/#{formatDateTime(utcnow(),'dd')}/subfolder/file"]
You need to make use of concat function provided by data factory.
#concat('rootfolder/subfolder/subfolder/subfolder/subfolder/',formatDateTime(utcnow(),'yyyy'),'/',formatDateTime(utcnow(),'MM'),'/',formatDateTime(utcnow(),'dd'),'/subfolder/file')
The concat function is similar as in programming languages which concats the strings.
More details: Azure Data Factory Loop Through Files
Just to build pm Anish K answer you can also shorten this a bit by using formatting
formatDateTime(utcnow(),'yyyy/MM/dd')
So final answer would be
#concat('rootfolder/subfolder/subfolder/subfolder/subfolder/',formatDateTime(utcnow(),'yyyy//MM/dd'),'/subfolder/file')
In case you want to learn a bit more on parametrization on ADF feel free to check out this video https://youtu.be/pISBgwrdxPM

Unable to copy file from SFTP in Azure Data Factory when using wildcard(*) in the filename

I am unable to copy csv files from an SFTP connection to blob storage when using the wildcard(*) in the filename.
More specifically, I receive csv files in the SFTP on a daily basis, and they are of the format: "ddMMyyyyxxxxxx.csv", where "xxxxxx" is the timestamp. More concretely, my csv file for the 13th of March is: "13032019083647.csv", while for the 14th of March: "14032019083556.csv". Obviously, the timestamp is different for every day, thus I want to copy the file independently of whatever strings exists between the date and the the file extenstion.
In the "File" subfield of the "File path" of the "Connection" tab of my subset, I give as input: "13032019*.csv", as instructed by the help icon next to the field:
When I do so, my Debug run fails with:
{"errorCode": "2200", "message":
"ErrorCode=UserErrorInvalidCopyBehaviorBlobNameNotAllowedWithPreserveOrFlattenHierarchy,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Cannot
adopt copy behavior PreserveHierarchy when copying from folder to a
single file.,Source=Microsoft.DataTransfer.ClientLibrary}
I receive a similar error no matter which type of copy behaviour I choose. I have also tried experimenting with the fileFilter parameter (even though ADF warns that the same behaviour can be achieved with the fileName option), but I still end up getting the same error.
For further clarification, I am attaching the Code segment that ADF produces for this configuration:
I should also mention, that when using the full fileName in the corresponding field, namely the value: "13032019083647.csv", copying works normally.
Any help would be greatly appreciated!
My guess it might get two files with wildcard operation.
In such cases we need to use metadata activity, filter activity and for-each activity to copy these files.
1.Metadata activity : Use data-set in these activity to point the particular location of the files and pass the child Items as the parameter.
2.Filter activity : Use filter to filter the files based on your needs.
3.For-each activity : In the For-each activity get Items from the previous activity and add copy activity inside the for-each.
In copy activity the source data set should be #item().name.
I hope this will solve your issue.
What worked for me was the following: I kept the same regex for the input file, but I defined as "Copy behaviour: Merge Files". Since as mentioned, there is only 1 file that satisfies the regex condition, only 1 file was created as output. I am aware that this is a sort of "dirty" solution, but it did the trick for me.

pentaho create archive folder with MM-YYYY

I would like to archive every file in a folder by putting it in another archive folder with a name like this: "Archive/myfolder-06-2014"
My problem is how to retrieve the current month and year and then how to create a folder (if it does not already exist) with these data.
This solution may be a little awkward (due to the required fuss) but it seems to work. The idea is to precompute the target filename in a seperate transformation and store it as a system variable (TARGET_ZIP_FILENAME):
The following diagrams show the settings of selected components.
Get the current time...
Provide the pattern of the target filename as a string constant...
Extract the month and year as formatted integers...
Replace the month in the pattern (the year will work equivalently)
Set the resulting filename as a system variable
The main job will call the transformation and use the system variable as the zip target filename.
Also you have to make sure that the setting Create Parent folder is active:

Resources