Azure search indexing csv files - azure

When I try to add an indexer to index csv file from blob storage, it automatically picks some of the fields from the csv file to be added as field name in the index schema but leaves some of the fields. That is probably because the left out fields have spaces or characters that are not allowed as azure search fields. Is there any way I can index all of the fields defined in the csv without changing the file itself?

Since some column names are invalid index field names, they'll need to be explicitly renamed and the indexer has to be informed of the associations.
Create the index with valid field names corresponding to each column you're interested in (https://learn.microsoft.com/en-us/rest/api/searchservice/create-index)
Create the data source to the storage account containing the CSV files (https://learn.microsoft.com/en-us/rest/api/searchservice/create-data-source)
Create an indexer using the above data source and index, and also setting the fieldMappings. (https://learn.microsoft.com/en-us/rest/api/searchservice/create-indexer). See here for sample JSON to rename fields.

Related

how to compare the file names that are inside a folder (Datalake) using ADF

I have list of files inside a datalake folder and I have list of files names stored in the .CSV File..
My requirement is to compare the files names in the Datalake folder with the filenames in the .CSV File and if the filenames are matching then I want to copy these files and if filenames are not matching then I want to send an Email with missing files in the datalake.
I have used GetMetaData activity(child items) to get the list of files in the datalake folder and I'm stuck here. Now I want to compare these filenames with the filenames stored in the .CSV File and do the further operations.
Kindly Help
My requirement is to compare the files names in the Datalake folder with the filenames in the .CSV File and if the filenames are matching then I want to copy these files and if filenames are not matching then I want to send an Email with missing files in the datalake.
Get Metadata activity is taken, and dataset is created for datalake. ChildItems is taken as argument for output.
Output of the metadata activity is passed in for each activity.
#activity('Get Metadata1').output.childItems
Inside for-each, lookup is taken and csv file which contains list of file names is referred.
If condition is taken and expression is given as
#contains(string(activity('Lookup1').output.value),item().name)
In true case, copy activity is added to copy the matched file name into SQL database.
Edited- To copy from one location to other location in datalake, follow below steps 1 and 2
Source dataset is taken and in file path , file name is given as #{item().name}
In Sink dataset also, file path is given similarly. This will dynamically create filename as in source.
In false case, append variable is added and all the values which do not match with lookup, got appended to variable of type array.
Refer the MS document on How to send email - Azure Data Factory & Azure Synapse | Microsoft Learn for sending email.

How to rename column names from lookup in ADF?

I have metadata in my Azure SQL db /csv file as below which has old column name and datatypes and new column names.
I want to rename and change the data type of oldfieldname based on those metadata in ADF.
The idea is to store the metadata file in cache and use this in lookup but I am not able to do it in data flow expression builder. Any idea which transform or how I should do it?
I have reproduced the above and able to change the column names and datatypes like below.
This is the sample csv file I have taken from blob storage which has meta data of table.
In your case, take care of new Data types because if we don't give correct types, it will generate error because of the data inside table.
Create dataset and give this to lookup and don't check first row option.
This is my sample SQL table:
Give the lookup output array to ForEach.
Inside ForEach use script activity to execute the script for changing column name and Datatype.
Script:
EXEC SP_RENAME 'mytable2.#{item().OldName}', '#{item().NewName}', 'COLUMN';
ALTER TABLE mytable2
ALTER COLUMN #{item().NewName} #{item().Newtype};
Execute this and below is my SQL table with changes.

how to check and compare the file names that are inside a folder (Datalake) using ADF

My requirement is to compare the files names in the Datalake folder with the filenames in the .CSV File and if the filenames are matching then I want to copy these files and if filenames are not matching then I want to store these file names in a .CSV file in the datalake.
Kindly Help.
You can achieve the requirement in the following way using 3 steps i.e., get filenames from csv file and ADLS folder, filter the matching and unmatched file names (from folder) and finally do respective copy operations.
Step-1:
I used get metadata activity to get the list of filenames from ADLS folder (sample1.csv, sample2.csv, sample3.csv, sample4.csv). Create dataset pointing to your folder and use child items as field list.
And look up to get the filenames (sample1.csv, sample2.csv, sample5.csv, sample6.csv) from the csv file.
Step-2
Now using filter activity, get the matching file names. I used the following as my items and filter condition to get matching filenames:
items- #activity('list of files in folder').output.childItems
condition- #contains(string(activity('filenames present in csv').output.value),item().name)
To get the unmatched filenames from the ADLS folder, I used the following items and filter condition:
items- #activity('list of files in folder').output.childItems
condition- #not(contains(string(activity('filenames present in csv').output.value),item().name))
Step-3:
Now, use for each activity to copy each file to another location. I used the items value in 1st for each as #activity('getting matching files').output.Value. Inside this, I have configured a copy activity to copy the current for each item (i.e., filename).
I have created a parameter in dataset called filename. I passed its value (#item().name) from copy data source settings as shown below.
Now for unmatched filenames from folder, I used for each and append variable to create an array of filenames like ["sample3.csv", "sample4.csv"]. The items value in for each is #activity('getting unmatched files').output.Value.
Inside for each, I used append variable with value as #item().name.
Now, we have to create a new csv file with all the unmatched filenames from folder. Use copy data activity, take a sample csv file (with some content. This content doesnot matter, we just need a file to use as source).
Now add an additional column, called filenames with dynamic content value as below. (Make sure in pipeline JSON the filenames value is same as in this reference image )
#join(variables('filenames'),'
')
#the values will be joined using newline(\n).
#Using \n directly in dynamic content would not work as it will be taken as \\n.
#So change it in pipeline json as in above reference image.
Configure the sink as following. Select the path, filename and make sure to select quote character as no quote character.
Output:
When I run the pipeline, it would generate the required output. The matched and copied files are generated as below:
The unmatched filenames written to csv files will be as shown below:

Create list of files in Azure Storage and send it to sql table using ADF

I need to copy file names of excel files that are in my Azure Storage as blobs and then put these names in the SQL Server table using ADF. It can be a file path as a name of a file but the hardest thing is that in the dataset which takes all the files from one specific folder I have to select a sheet name and these sheet names are different for each file, therefore it returns an error. Is there a way to create a collective dataset without indicating the sheet name?
So, if I understand your question correctly you are looking for a way to write all Excel filenames to a SQL Database using ADF.
You can use the generic Get Metadata activity and use a binary dataset as source. Select Child items as an field to retrieve. This will retrieve all files in the folder. Then add a filter to only select the Excel file types.
Hope that this gets you on the right track.

Ingest Multiple CSV linked files in SOLR

I am new to SOLR, My problem is to link multiple CSV files linked together via single field in SOLR.
I have indexed a file of more than 5GB from CSV containing more that 250 fieds (one field taxonomyid) in document and querying it successfully, now i have to add one more CSV file having fields (taxonomyid, taxonomyvalue, description) and link with the already indexed CSV file with the field taxonomoyid.Kindly help me with the direction for what should i go for in SOLR R&D.

Resources