i'm trying to import a text file of csv data into excel. The data contains mostly integers but there's one column with strings. I'm using the Data tab of excel professional plus 2019. However, when I select comma as the delimiter i loose 5 of the 16 columns, starting with the one containing strings. The data looks like the below. the date and the 7 numbers are in their own columns (just white space separated) . can anyone help or explain many thanks
2143, Wed, 6,Jul,2016, 38,20,03,39,01,24,04, 2198488, 0, Lancelot , 6
Before
after
full data is on https://github.com/CH220/textfileforexcel
Your problem stems from the very first line of data in your text file:
40,03,52,02,07,20,14, 13137760, 1, Lancelot , 7
As you can see, there are only eleven "segments". Hence, when you try to use the import dialog to separate by comma, there will only be 11 columns even though subsequent rows have 16 columns.
Possible solutions:
Correct the text file so the first line has the desired number of segments
Change the Import Dialog, as you did, to comma, then
Transform
Edit the second line of the generated M-code to change from Columns=11 to Columns=16. You do this in the Advanced Editor
Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\new 2.txt"),[Delimiter=",", Columns=16, Encoding=1252]),
Change the Fixed Width "argument" from 0,23 => 0
Transform
Split Column by delimiter (using the comma) in Power Query.
To me, the "best" way would be to correct the text file.
I want to combine two .csv files based on the unique id that exists in both files.
First file consist of 17 columns and the second one in 2 columns where in both files the first column is the same unique id.
In the to be created file 3 i would like 18 columns.
I have been trying paste
paste -d ' ' SPOOL1.csv SPOOL2.csv > MERGED.csv
but that of course does not take the unique columns into consideration.
Not proficient in awk so all help is appreciated.
Thanks
sounds like if the files are sorted then
join SPOOL1 SPOOL2 > MERGED
should get you closer if you deal with the delimiters not shown
I have a copy activity which takes the output of a procedure and writes it to a temp CSV file. I needed to have headers in double quotation mark so after that I have a Data Flow task that takes the temp file and adds the quote all in the sink settings. Yet the output is not what is expected. It looks like the last column is missing in some of the records due to comma in the data.
Is there a way to use only copy activity but still have the column names in double quotes?
When we set the column delimiter, data factory will consider the first row as the schema according the delimiter number. If your data which has the value which same with the column delimiter, then you will miss some columns.
Just for now in Data Factory, we can't solve it. The only way is that please se the different column delimiter, for example the '|':
Output example:
And we also can't make the header wrapped by double quote for the output .csv file. It's not supported in Data Factory.
HTH.
Using the savetxt function, I created a data file named as 'output.dat' to which two arrays were written as two different columns. So the file output.dat contains 2 columns of data. Now I want to add headings at the top of each column that would help me to remind what the file contains when I refer back the file later. Say, I want to put the heading 'Time' on the top of the first column and 'Voltage' on the top of the second. How can I do this?
I have a very large text file (3.33 GB) which has 47 columns separated by delimiter ~. I just need the first and the last column to work with. The last column is a 17 digit number which may contain leading zeros. I have to store this column as a string (so as to not remove the leading zeros). An example of the first and last column is shown below:
id Number
0 0 10030040125198660
1 12345 60034046122158670
My question is whether it's possible to read just these two columns alone, and store the second column as string ? The reason I ask is because loading 3.3GB file as a dataframe takes a lot of time, converting it into string takes an even longer amount. I want to know if I can save time by choosing only the columns I need.
My code as of now (shown the column names as numbers for easy understanding):
df=pd.read_csv('myfile.txt',low_memory=False,sep='~',header=None)
df.drop(columns=[2,3,4...,46],inplace=True) #Keeping only column 1 and 47
df['47']=df['47'].astype(str)
Any help is highly appreciated!
You should use "usecols" parameter. Check out the read_csv official documentation. Infact that is the first thing you should check