Azure Data Factory Copy Activity is dropping columns on the floor - azure

first time, long time.
I'm running an import of a csv file that has 734 columns in Azure Data Factory Copy Activity. Data factory is not reading the last 9 columns and is populating with NULL. Even in the preview I can see that the columns have no values but the schema for those columns is detected. Is there a limit of columns in Copy to 725?

As Joel said there is no restriction for 725 or so columns . I suggest
Go to the mapping tab and only pick 726th column ( if you have a header it will be easy or ADF will generate header like Prop_726( most probably) , copy the data to blob as sink , If the blob has the field , that means that you have a data type issue on the table .
Let me know how its goes , if you are still facing the issue , please share some dummy data for 726th column .

Here is what happened. I had the file in zip folders, and I thought I had to unzip the files first to process them. It turns out that when unzipping through ADF, it stripped the quotation marks from my columns, and then one of the columns had an escape character in it. That escape character shifted everything over, and resulted in me losing nine columns.
But I did learn a bunch of things NOT to do, so it wasn't a total waste of time. Thanks for the answers!

Related

Delimited File with Varying Number of Rows Azure Data Factory

I have a delimited file separated by hashes that looks somewhat like this,
value#value#value#value#value#value##value
value#value#value#value##value#####value#####value
value#value#value#value###value#value####value##value
As you can see, when separated by hashes, there are more columns in the 2nd and 3rd rows than there is in the first. I want to be able to ingest this into a database using a ADF Data Flow after some transformations. However, whenever I try to do any kind of mapping, I always only see 7 columns (the number of columns in the first row).
Is there any way to get all of the values? As many columns as there are in the row with most number of items? I do not mind the nulls.
Note: I do not have a header row for this.
Azure Data Factory directly will not be able to Import schema -row with the maximum number of column. Hence, it is important to make sure you have same number of columns in your file.
You can use Azure functions to validate your file and update it to get equal number of columns in all rows.
You could give it a try to have a local file with row with the maximum number of column and import the schema from the file, else you have to go for Azure Functions where you have to convert the file and then trigger the pipeline.

How to merge 2 BIG Tables into 1 adding up existing values with PowerQuery

I have 2 big tables (1 has 690K Rows, 2nd one has 890K rows).
They have the same format and columns:
Username - Points - Bonuses - COLUMN D... COLUMN - K.
Lets say in the first table i have the "Original" usernames and in the 2nd table i have "New" usernames + Some of the "Original" usernames (So people who are still playing + people who are new to the game).
What I'm trying to do is to merge them so i can have in a single table (sum up) their values.
I've already made my tables proper System Tables.
I created their connection in the workbook.
I've tried to merge them but i keep getting less rows than i expect to have, so some records are being left out or not being summed.
I've tried Left Outer, Right Outer, Full Outer with no success.
This is where im standing:
As #Jenn said, i had to append the tables instead of merging them and i also used a filter inside PowerQuery to remove all blanks/zeros before loading it into Excel, i was left with 500K Unique rows, instead of 1.6 Million. Thanks for the comment!
I would append the tables, as indicated above. First load each table separately into PowerQuery, and then append one table into the other one. The column names look a little long and it may make sense to simplify the column names so that the system doesn't read them as different columns due to an inadvertent typo.

How to fetch the column count from dat file in Azure data lake analytics files

I have different Dat and CSV files. it's containing more than 255 columns and delimiter as '|' and tab. How to fetch the column count. Anyone please share sample U-Sql code
I know this was down voted, so I hope it is still OK to supply an answer (although I'm not including a code sample).
Extract just the first row in your file (using FETCH 1 ROWS) into a single column rowset. You should then be able to use String.Split to get a column count.

Custom parallel extractor - U-SQL

I try create a custom parallel extractor, but i have no idea how do it correctly. I have a big files (more than 250 MB), where data for each row are stored in 4 lines. One file row store data for one column. Is this possible to create working parallely extractor for large files? I am afraid that data for one row, will be in different extents after file splitting.
Example:
...
Data for first row
Data for first row
Data for first row
Data for first row
Data for second row
Data for second row
Data for second row
Data for second row
...
Sorry for my English.
I think, you can process this data using U-SQL sequentially not in parallel. You have to write a custom applier to take a single/multiple rows and return single/multiple rows. And then, you can invoke it with CROSS APPLY. You can take help from this applier.
U-SQL Extractors by default are scaled out to work in parallel over smaller parts of the input files, called extents. These extents are about 250MB in size each.
Today, you have to upload your files as row-structured files to make sure that the rows are aligned with the extent boundaries (although we are going to provide support for rows spanning extent boundaries in the near future). In either way though, the extractor UDO model would not know if your 4 rows are all inside the same extent or across them.
So you have two options:
Mark the extractor as operating on the whole file with adding the following line before the extractor class:
[SqlUserDefinedExtractor(AtomicFileProcessing = true)]
Now the extractor will see the full file. But you lose the scale out of the file processing.
You extract one row per line and use a U-SQL statement (eg. using Window Functions or a custom REDUCER) to merge the rows into a single row.
I have discovered that I cant use static method to get an instance of IExtractor implementation in USING statement if I want use AtomicFileProcessing set on true.

Importing CSV to Cassandra DB

I am trying to import a .csv file to Cassandra by using the COPY command in the CQL. The problem is that, after executing the command, the console displays that '327 rows imported in 0.589 seconds' but only the last row from the csv file gets into the Cassandra table, and the size of the table is only one row.
Here is the command that i am using for copying:
copy test from '/root/Documents/test.csv' with header=true;
And when i do select * from test; it shows only one row.
Any help would be appreciated.
Posting my comment here as an answer for future reference.
One way this could happen (where all rows are imported but the result is only 1 left in table) is if every row had the same primary key. Each insert is really doing a replacement, hence the result of just a single row of data.

Resources