Azure Data factory Delimited Text file ignoring imported Schema - azure

I get a weekly file in which has up to 34 columns but sometimes the first line of the file only has 29 columns. I have imported a schema with 34 columns but when I preview the data, data factory, just ignores the schema I've made for the file and shows the first 29 fields.
Apparently we cant ask for headers to be added to file. How do I force data factory to just read the file as having 34 columns because I've given it the schema. Adding the missing 5 pipes which are the delimiter fixes the issue but I don't want to have to do that every week.
Kind Regards.

I have repro’d with some sample data using data flow.
Create the delimited text dataset and select column delimiter as no delimiter to read the file as single column data.
In the source, the first row contains 3 columns delimited by pipe | and the second row has 5 columns when delimited with |.
Using derived column transformation, split the column into multiple columns based on |.
ex: split(Column_1, '|')[1]

Related

Data missing when importing a text file into excel

i'm trying to import a text file of csv data into excel. The data contains mostly integers but there's one column with strings. I'm using the Data tab of excel professional plus 2019. However, when I select comma as the delimiter i loose 5 of the 16 columns, starting with the one containing strings. The data looks like the below. the date and the 7 numbers are in their own columns (just white space separated) . can anyone help or explain many thanks
2143, Wed, 6,Jul,2016, 38,20,03,39,01,24,04, 2198488, 0, Lancelot , 6
Before
after
full data is on https://github.com/CH220/textfileforexcel
Your problem stems from the very first line of data in your text file:
40,03,52,02,07,20,14, 13137760, 1, Lancelot , 7
As you can see, there are only eleven "segments". Hence, when you try to use the import dialog to separate by comma, there will only be 11 columns even though subsequent rows have 16 columns.
Possible solutions:
Correct the text file so the first line has the desired number of segments
Change the Import Dialog, as you did, to comma, then
Transform
Edit the second line of the generated M-code to change from Columns=11 to Columns=16. You do this in the Advanced Editor
Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\new 2.txt"),[Delimiter=",", Columns=16, Encoding=1252]),
Change the Fixed Width "argument" from 0,23 => 0
Transform
Split Column by delimiter (using the comma) in Power Query.
To me, the "best" way would be to correct the text file.

Combine multiple files csv into one using awk

I want to combine two .csv files based on the unique id that exists in both files.
First file consist of 17 columns and the second one in 2 columns where in both files the first column is the same unique id.
In the to be created file 3 i would like 18 columns.
I have been trying paste
paste -d ' ' SPOOL1.csv SPOOL2.csv > MERGED.csv
but that of course does not take the unique columns into consideration.
Not proficient in awk so all help is appreciated.
Thanks
sounds like if the files are sorted then
join SPOOL1 SPOOL2 > MERGED
should get you closer if you deal with the delimiters not shown

Copy Activity missing column in the output

I have a copy activity which takes the output of a procedure and writes it to a temp CSV file. I needed to have headers in double quotation mark so after that I have a Data Flow task that takes the temp file and adds the quote all in the sink settings. Yet the output is not what is expected. It looks like the last column is missing in some of the records due to comma in the data.
Is there a way to use only copy activity but still have the column names in double quotes?
When we set the column delimiter, data factory will consider the first row as the schema according the delimiter number. If your data which has the value which same with the column delimiter, then you will miss some columns.
Just for now in Data Factory, we can't solve it. The only way is that please se the different column delimiter, for example the '|':
Output example:
And we also can't make the header wrapped by double quote for the output .csv file. It's not supported in Data Factory.
HTH.

How can I add comments at the top of data file that I have created using savetxt function in python 3.0?

Using the savetxt function, I created a data file named as 'output.dat' to which two arrays were written as two different columns. So the file output.dat contains 2 columns of data. Now I want to add headings at the top of each column that would help me to remind what the file contains when I refer back the file later. Say, I want to put the heading 'Time' on the top of the first column and 'Voltage' on the top of the second. How can I do this?

Read only 2 columns from a '~' delimited text file into dataframe and store second column as string

I have a very large text file (3.33 GB) which has 47 columns separated by delimiter ~. I just need the first and the last column to work with. The last column is a 17 digit number which may contain leading zeros. I have to store this column as a string (so as to not remove the leading zeros). An example of the first and last column is shown below:
id Number
0 0 10030040125198660
1 12345 60034046122158670
My question is whether it's possible to read just these two columns alone, and store the second column as string ? The reason I ask is because loading 3.3GB file as a dataframe takes a lot of time, converting it into string takes an even longer amount. I want to know if I can save time by choosing only the columns I need.
My code as of now (shown the column names as numbers for easy understanding):
df=pd.read_csv('myfile.txt',low_memory=False,sep='~',header=None)
df.drop(columns=[2,3,4...,46],inplace=True) #Keeping only column 1 and 47
df['47']=df['47'].astype(str)
Any help is highly appreciated!
You should use "usecols" parameter. Check out the read_csv official documentation. Infact that is the first thing you should check

Resources