How to get get string between two wildcards? - azure

In adf, i want to pass file type with wildcards as a parameter. It should pick up similiar type files from source folder and copy into corresponding folder(created based on file type) in the target.
For example
I am passing xyz00r37 as a parameter with wildcard char *.To copy the file to target folder xyz00r37,need to get the string between wildcard *.This is my requirement.
In future, filetype name and length may vary.To copy the files dynamically regardless of file type, need to extract the string between * symbol.

You can use Azure Data Factory expression language functions to get rid of the "*" wildcard.
Here's a sample code where I split the input parameter (input=xyz00r37*) and got rid of the wildcard.
#split(pipeline().parameters.input, '*')[0]

There are many ways this can achieved . You have not shared the exact file format so few guess .
wildcardxyz00r37wildcard - #replace(pipeline().parameters.filename,'*','')
wildcardxyz00r37wildcard - #split(pipeline().parameters.filename,'*')[1]
wildcardxyz00r37 - #split(pipeline().parameters.filename, '*')[1]
xyz00r37wildcard - #split(pipeline().parameters.filename, '*')[0] ( as called out before ) .
I think replace option takes care of all the scenarios .
HTH

Related

adf function to read only certain portion of filename based on pattern

I have file names as SMP_ACC_STG_20210987654.txt and another filename SMP_ACC_STG_BS_20210987654.txt. I can use #substring(item().name,0,11) and i get SMP_ACC_STG for first file which is correct but for second file I need to get filename as SMP_ACC_STG_BS and it returns same file name as first because i have harcoded the length in substring. I tried using indexof but it didnt give me the expected result.
I need to extract the text before _20210987654.txt and use that as filename.
I have used the below, and got my file names:
#substring(item().name,0,lastindexof(item().name,'_'))
Which gave me:
SMP_ACC_STG
SMP_ACC_STG_BS

How to use 'keys' in a dictionary as wildcard patterns in matching filenames?

I'm trying to open the files whose name contains a pattern - which is the key value stored in a dictionary (key values in the dictionary are the patterns I'm trying to match in the file name).
I'm currently using glob.blob to match the pattern in the file name. The name of my dictionary is "xd". So, I want to implement something like this:
for key in xd :
for name in glob.glob(*key*):
file = open ('name','w')
I'm getting invalid syntax error here
I want to be able to open all the files which have the 'key' in their name and perform text addition in those files. Could someone please tell me if there is a way of doing this?
glob.glob() expects a str parameter, so you'll need to make a string out of the wildcards with key included. I'd also suggest opening files using with so you don't forget to close the file descriptor.
for key in xd:
for name in glob.glob(f"*{key}*"):
with open(name, 'w')
...

How do I extract a string between two characters using ADFs expression builder?

I'm trying to extract part of a file name using expressions in ADF expression builder. The part I'm trying to extract is dynamic in size but always appears between "_" and "-".
How can I go about doing this extraction?
Thanks!
Suppose there's a pipeline parameter named filename, you could use the below expression to extract value between '_' and '-', e.g. input 'ab_cd-', you would get 'cd' as output:
#{substring(pipeline().parameters.fileName, add(indexOf(pipeline().parameters.fileName, '_'),1),sub(indexOf(pipeline().parameters.fileName, '-'),3))}
You may want to check the documentation of Expressions and functions in Azure Data Factory for more details: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions#string-functions

U-SQL Error - Change the identifier to use at least one lower case letter

I am fairly new to U-SQL and trying to run a U-SQL script in Azure Data Lake Analytics to process a parquet file using the Parquet extractor functionality. I am getting the below error and I don't find a way to get around it.
Error - Change the identifier to use at least one lower case letter. If that is not possible, then escape that identifier (for example: '[ACTIVITY]'), or embed it in a CSHARP() block (e.g CSHARP(ACTIVITY)).
Unfortunately all the different fields generated in the Parquet file are capitalized and I don't want to to escape these identifiers. I have tried if I could wrap the identifier with CSHARP block and it fails as well (E_CSC_USER_RESERVEDKEYWORDASIDENTIFIER: Reserved keyword CSHARP is used as an identifier.) Is there anyway I could extract the parquet file? Thanks for your help!
Code Snippet:
SET ##FeaturePreviews = "EnableParquetUdos:on";
#var1 =
EXTRACT ACTIVITY string,
AUTHOR_NAME string,
AFFLIATION string
FROM "adl://xxx.azuredatalakestore.net/Abstracts/FY2018_028"
USING Extractors.Parquet();
#var2 =
SELECT *
FROM #var1
ORDER BY ACTIVITY ASC
FETCH 5 ROWS;
OUTPUT #var2
TO "adl://xxx.azuredatalakestore.net/Results/AbstractsResults.csv"
USING Outputters.Csv();
Based on your description you try to say
EXTRACT ALLCAPSNAME int FROM "/data.parquet" USING Extractors.Parquet();
In U-SQL, we reserve all caps identifiers so we can add new keywords in the future without invalidating old scripts.
To work around, you just have to quote the name (escape it) like in any other SQL dialect:
EXTRACT [ALLCAPSNAME] int FROM "/data.parquet" USING Extractors.Parquet();
Note that this is not changing the name of the field. It is just the syntactic way to address the field.
Also note, that in most SQL communities, it is considered a best practice to always quote identifiers to avoid reserved keyword clashes.
If all fields in the Parquet file are all caps, you will have to quote them all... In a future update you will be able to say EXTRACT * FROM … for Parquet (and Orc) files, but you still will need to quote the columns when you refer to them explicitly.

Search for a string that appears in any file which paths contains a given word in Atom

I'd love to know how to search for a string like join on any file in which at any level in the path a given word, workspace for example, is present.
So it would match all the following:
app/js/workspaces/foo.js
css/project_a/some-workspace-awesome-/bar.js
scripts/open-workspaces.sh
I tried this with using *workspace* (img for reference) however it only finds the last one of the examples given (only matches the file name).
Find -> Find in Project. This returns all paths and references to the keyword.

Resources