How to delete files based older than specified date in Azure Data lake - azure

I have data folders created on daily basis in datalake. Folder path is dynamic from JSON Format
Source Folder Structure
SAPBW/Master/Text
Destination Folder Structure
SAP_BW/Master/Text/2019/09/25
SAP_BW/Master/Text/2019/09/26
SAP_BW/Master/Text/2019/09/27
..
..
..
SAP_BW/Master/Text/2019/10/05
SAP_BW/Master/Text/2019/09/06
SAP_BW/Master/Text/2019/09/07
..
..
SAP_BW/Master/Text/2019/09/15
SAP_BW/Master/Text/2019/09/16
SAP_BW/Master/Text/2019/09/17
I want to delete the folders created before 5 days for each folder of sinkTableName
So, in DataFactory, i have Called the folder path in a for each loop as
#concat(item().DestinationPath,item().SinkTableName,'/',item().LoadTypeName,'/',formatDateTime(adddays(utcnow(),-5),item().LoadIntervalFormat),'/')"
Need syntax to delete the files in each folder based on the JSON.
Unable to find the way to delete folder wise and setup the delete activity depending on the dates prior to five days from now

I see that you are doing a concatenation , which I think is the way to go . But I see that you are using the expression formatDateTime(adddays(utcnow(),-5) , which will give you something like 2019-10-15T08:23:18.9482579Z which i don't think is desired . I suggest to try with #formatDateTime(adddays(utcnow(),-5) ,'yyyy/MM/dd'). Let me know how it goes .

Related

Azure Synapse Analytics - deleting pipeline Folder

I am new to Synapse and I have to make a pipeline that will delete files from folders in a hierarchy like the attached image. expecting hierarchy. The red half circles mark the files I would like to delete files for example older than 2 months.
As for now I have made a pipline for a single folder and using the for each loop I can get to the files and delete the corresponding one. And it works, since I have about 60-70 folders and even more files I wanted to go a level higher up and make a pipeline for each folder to execute. And with this is a problem. When i use GetMetadata Activity for top folder, and use for each loop to take name folders then i can not acess files in folder just only folder. Could you help me someone how to slove this?
deleting pipline for single folder using for each loop
We can achieve this using nested for each activities with the help of execute pipeline activity. As mentioned, Get metadata with wildcards returns all files without folders and Delete activity is unable to recognize wildcard folder paths(Folder/*).
I have created a similar folder structure for demo. In my pipeline, I have first created an array parameter req_files (sample1.csv and sample2.csv) with names of files required.
Note: If you want to dynamically do this, you can use append variable to build required file names (file09/22 and file08/22).
I used one get metadata to get folder names (which are inside root folder). I am iterating through the output of get metadata in my for each activity (items value is #activity('root folder contents').output.childItems).
Inside my for each, I used another get metadata activity to loop through each of the sub folders (to get file contents).
Now I have the folder name and list of files inside it. I am going to use execute pipeline to implement nested for each. Create 3 parameters in a new pipeline called delete_pipeline (where I perform delete) as current_folder, folder_files and files_needed.
Pass the following dynamic content for each of them from parent pipeline.
current_folder: #item().name
folder_files: #activity('sub folder contents').output.childItems
files_needed: #pipeline().parameters.req_files
Now in delete_pipeline, I have a for each loop to loop through the list of files we are passing (items value is #pipeline().parameters.folder_files).
Inside this for each, I am using an If condition activity. This is because I want to delete files which are not in my req_files parameter (array from parent pipeline which we passed to files_needed parameter in delete_pipeline). The condition for if condition activity will be as following:
#contains(pipeline().parameters.files_needed,item().name)
We need to delete the file only when it is not present in req_files (files_needed). So, when the condition is false, we perform delete.
I have created 2 parameters file_namepath_of_file_to_delete and file_name_to_delete in the dataset I am using for delete activity with following dynamic content.
file_namepath_of_file_to_delete: Folder/#{pipeline().parameters.current_folder}
file_name_to_delete: #item().name
When I run the pipeline, it keeps the required files and deletes the rest. The following are output images for reference.
Debug output: https://i.imgur.com/E6GNVHW.png
My folder after I run the pipeline: https://i.imgur.com/bqN00Dw.png

rename a file from multiple directories in linux

I am trying to rename a file from multiple directories
I have a file called slave.log in multiple directories like slave1,slave2......slave17. so daily log rotation happens and creates a new file with current dateformat in it whereas the file data contains a previous day data . I want to rename those files with previous date format .
I have written shell script which works fine but the problem here is , I need to pass the path as parameter but likewise I have 17 directories so i cant schedule 17 cron enteries to run ...I have only basic knowledge about scripting . Please help me with best solution for this scenario

How to get files from a subfolder present under nested parent folder in azure data factory?

My folder structure is like below,
Container/xx56585/DST_1/2021-03-26/xxxxxxxx.csv
Container/xx56585/DST_1/2021-03-26/xxxxxxxx.ctl
Container/xx56585/DST_2/2021-03-26/yyyyyyyyy.csv
Container/xx56585/DST_2/2021-03-26/yyyyyyyyy.ctl
Container/xx56585/DST_3/2021-03-26/zzzzzzzzz.csv
Container/xx56585/DST_3/2021-03-26/zzzzzzzzz.ctl
Container/xx56585/DST_4/2021-03-26/sssssssssss.csv
Container/xx56585/DST_4/2021-03-26/sssssssssss.ctl
I need to copy .csv and .ctl files to sFTP target and move these files to achieve folder(in the blob storage after copy activity)
Please help me on this
Update:
We can use Get Metadata1 to check does the ctl file exist.
Add dynamic content #concat('xx56585/',item(),'/',substring(adddays(utcnow(),-3),0,10),'/') to the path.
I created a simple test to copy files under <rundate> folders to target folder.
My folder structure
Input/xx56585/DST_1/2021-03-26/xxxxxxxx.csv
Input/xx56585/DST_2/2021-03-26/yyyyyyyyy.csv
Input/xx56585/DST_3/2021-03-26/zzzzzzzzz.csv
Input/xx56585/DST_4/2021-03-26/sssssssssss.csv
Output:
Define an Array type variable Array1 and assign the value ["DST_1","DST_2","DST_3","DST_4"].
At ForEach1 activity, we can add dynamic content #variables('Array1') to traverse this array.
Inside ForEach1 activity, we can use Copy activity to copy files under the dynamic path via expression #concat('xx56585/',item(),'/',substring(adddays(utcnow(),-3),0,10),'/').
My current date is 2020-03-29 so I use adddays(utcnow(),-3) to get 2020-03-26 in the above steps.
That's all.
I think we can add filter activity in this before copy activity in which we can use substring function and find if file name contains .ctl or .csv

Node : What is the right way to delete all the files from a directory?

So I was trying to delete all my files inside a folder using node.
I came across 2 methods .
Method 1
Delete the folder using rmkdir. But if I plan on adding the images on the same folder then I use mkdir and creates the same folder again and appends the files to it.
Example: I have an Add Files and Delete ALL button. When I click deleteAll , the folder gets deleted. And when I click add then the folder gets created and the file gets added to that folder
Method 2
Using readdir , I loop through the files and stores in an array and then delete only the files instead of the folder.
Which is the best way to do it ? If its not among these then please advice me a better solution.
The rm function of ShellJS will do the trick. It works as a one-liner, and it works cross-platform, and is well tested and documented. It even supports recursive deletes.
Basically, something such as:
const { rm } = require('shelljs');
rm('-rf', '/tmp/*');
(Sample code taken from ShellJS' documentation.)

How to copy the content of a a folder which its name is partially known in centos

I have a folder which has another folder inside it (lets say test and insidetest-some random number). Now what I am trying to do is to copy the content of the insidetest-... into another folder. The problem is that I know half of the name of the folder which in test folder and I do not know the the randon number attached to it. (Just for more explanation I get the a zip file from bitbucket api and then after unzip it it has this structure. So I can never know the exact name of the folder inside test. If I knew that I could simply use sth like this:
cp home/test/* /home/myfolder/
But I cannot do it in this situation. Can anyone help?
If some part of the name is constant then use the command like this:-
cp home/test/halfname* /home/folder/ -r

Resources