How to fetch the column count from dat file in Azure data lake analytics files - azure

I have different Dat and CSV files. it's containing more than 255 columns and delimiter as '|' and tab. How to fetch the column count. Anyone please share sample U-Sql code

I know this was down voted, so I hope it is still OK to supply an answer (although I'm not including a code sample).
Extract just the first row in your file (using FETCH 1 ROWS) into a single column rowset. You should then be able to use String.Split to get a column count.

Related

Delete bottom two rows in Azure Data Flow

I would like to delete the bottom two rows of an excel file in ADF, but I don't know how to do it.
The flow I am thinking of is this.
enter image description here
*I intend to filter -> delete the rows to be deleted in yellow.
The file has over 40,000 rows of data and is updated once a month. (The number of rows changes with each update, so the condition must be specified with a function.)
The contents of the file are also shown here.
The bottom two lines contain spaces and asterisks.
enter image description here
Any help would be appreciated.
I'm new to Azure and having trouble.
I need your help.
Add a surrogate key transformation to put a row number on each row. Add a new branch to duplicate the stream and in that new branch, add an aggregate.
Use the aggregate transformation to find the max() value of the surrogate key counter.
Then subtract 2 from that max number and filter for just the rows up to that max-2.
Let me provide a more detailed answer here ... I think I can get it in here without writing a separate blog.
The simplest way to filter out the final 2 rows is a pattern depicted in the screenshot here. Instead of the new branch, I just created 2 sources both pointing to the same data source. The 2nd stream is there just to get a row count and store it in a cached sink. For the aggregation expression I used this: "count(1)" as the row count aggregator.
In the first stream, that is the primary data processing stream, I add a Surrogate Key transformation so that I can have a row number for each row. I called my key column "sk".
Finally, set the Filter transformation to only allow rows with a row number <= the max row count from the cached sink minus 2.
The Filter expression looks like this: sk <= cachedSink#output().rowcount-2

Delimited File with Varying Number of Rows Azure Data Factory

I have a delimited file separated by hashes that looks somewhat like this,
value#value#value#value#value#value##value
value#value#value#value##value#####value#####value
value#value#value#value###value#value####value##value
As you can see, when separated by hashes, there are more columns in the 2nd and 3rd rows than there is in the first. I want to be able to ingest this into a database using a ADF Data Flow after some transformations. However, whenever I try to do any kind of mapping, I always only see 7 columns (the number of columns in the first row).
Is there any way to get all of the values? As many columns as there are in the row with most number of items? I do not mind the nulls.
Note: I do not have a header row for this.
Azure Data Factory directly will not be able to Import schema -row with the maximum number of column. Hence, it is important to make sure you have same number of columns in your file.
You can use Azure functions to validate your file and update it to get equal number of columns in all rows.
You could give it a try to have a local file with row with the maximum number of column and import the schema from the file, else you have to go for Azure Functions where you have to convert the file and then trigger the pipeline.

Azure Data Factory Copy Activity is dropping columns on the floor

first time, long time.
I'm running an import of a csv file that has 734 columns in Azure Data Factory Copy Activity. Data factory is not reading the last 9 columns and is populating with NULL. Even in the preview I can see that the columns have no values but the schema for those columns is detected. Is there a limit of columns in Copy to 725?
As Joel said there is no restriction for 725 or so columns . I suggest
Go to the mapping tab and only pick 726th column ( if you have a header it will be easy or ADF will generate header like Prop_726( most probably) , copy the data to blob as sink , If the blob has the field , that means that you have a data type issue on the table .
Let me know how its goes , if you are still facing the issue , please share some dummy data for 726th column .
Here is what happened. I had the file in zip folders, and I thought I had to unzip the files first to process them. It turns out that when unzipping through ADF, it stripped the quotation marks from my columns, and then one of the columns had an escape character in it. That escape character shifted everything over, and resulted in me losing nine columns.
But I did learn a bunch of things NOT to do, so it wasn't a total waste of time. Thanks for the answers!

Import 2 or more columns from Excel into 1 column Access

I have an Excel report that is the output of an opinion tool. In this Excel I have all the responses that the people submit for my quizz, in the questions that are multiple choise answer the tool output those questions like one question per option and only the selected option is the column with data in the Excel. For example, if my quizz is like this:
Q1 Your name:
R1 =
Q2 Options
opt 1
opt 2
opt 3
The Excel report will appear like this
Excel Report
So I want that when I import the Excel to Access it can automatically merge those columns to have only to headers in the Access table: "Q1 Your name:" & "Q2 Options"
Also, for context of the job, I will make some other editions to that imported table and then copy to another Access table (table 2) so even if there is a way to merge those Access columns before copy to the another one I will accept it like, I don't know, insert from this column and if empty insert from that column, I'm not good at doing queries sorry. Only the table 2 will have information, the first table would be like a temporary one so I will daily delete information from that one and preserve the important data en the table 2
Thanks for the support
Simplest way I can see to achieve your goal is to concatenate the three columns; since by the sound of it you will only ever have a value in one column per question per record. You could do this in Excel prior to the import, you could use a calculated field on the table or you could build a query that concatenates all your questions. My suggestion would be Excel since using the =CONCATENATE() function is probably going to be easiest option for you.
If you do import your raw data into Access you will need to assign unique column names, ie Q2_Op1, Q2_Op2, Q2_Op3.
The query syntax to concatenate these fields one would be something like:
SELECT Q1_Name, [Q2_Op1] & [Q2_Op2] & [Q3_Op3] AS Q2_Options
FROM Table1;
Where Q1_Name, Q2_Op1, Q2_Op2, Q3_Op3 are the column names on the imported data table.

need to compare data in two excel files

I need to compare two excel files. One is extracted from database and saved as CSV. Other file is cumulative report containing all records for that day. I need to check if all the data in the cumulative report is in my other csv file that's extracted from database. I know VLOOKUP, but I am not sure if VLOOKUP can compare entire file records. Many files have 4 to 5 thousand records with 50 columns in it. Is there any other option? Any free ETL tools ?
I decided to use beyond compare tool to compare two files.
Sort both files on an ID field
And compare them using Beyond Compare tool
It's a nice tool. You can get it HERE.
You can indeed use VLOOKUP.
A simple way to solve this is to
1. Create a database connection for the CSV saved data in Sheet1.
2. Copy and Link the second file (cumulative report) to this file in Sheet2.
Then use VLOOKUP or simple IF statements to compare the data.

Resources