Data missing when importing a text file into excel - excel

i'm trying to import a text file of csv data into excel. The data contains mostly integers but there's one column with strings. I'm using the Data tab of excel professional plus 2019. However, when I select comma as the delimiter i loose 5 of the 16 columns, starting with the one containing strings. The data looks like the below. the date and the 7 numbers are in their own columns (just white space separated) . can anyone help or explain many thanks
2143, Wed, 6,Jul,2016, 38,20,03,39,01,24,04, 2198488, 0, Lancelot , 6
Before
after
full data is on https://github.com/CH220/textfileforexcel

Your problem stems from the very first line of data in your text file:
40,03,52,02,07,20,14, 13137760, 1, Lancelot , 7
As you can see, there are only eleven "segments". Hence, when you try to use the import dialog to separate by comma, there will only be 11 columns even though subsequent rows have 16 columns.
Possible solutions:
Correct the text file so the first line has the desired number of segments
Change the Import Dialog, as you did, to comma, then
Transform
Edit the second line of the generated M-code to change from Columns=11 to Columns=16. You do this in the Advanced Editor
Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\new 2.txt"),[Delimiter=",", Columns=16, Encoding=1252]),
Change the Fixed Width "argument" from 0,23 => 0
Transform
Split Column by delimiter (using the comma) in Power Query.
To me, the "best" way would be to correct the text file.

Related

Notepad to Excel column conversion - How to Parse one mixed string to 2 different column in excel

I have data in notepad with more than 1000+ entries, which need to convert in to Excel with particular break based on length. can someone help
011000015FRB-BOS FEDERAL RESERVE BANK OF BOSTON MABOSTON Y Y20040910
File format is as below
Position Field
1-9 Routing number
1 Office code
I tried delimiting option but dint worked out.
If your data always has the routing number in columns 1-9, then delimited import is the way to go. Choose Import From Text, then select Fixed Width and click Next. On Step 2, click at each character that would be a separator. Eg, click at character 9 to split it into two columns with the first column haviong the first nine characters and the second column having the rest. Step 3 will allow you to set the data format. I'd recommend setting the first column to text so Excel doesn't try to use scientific notation or something on your account numbers.

Python 3 pandas dataframe merge without caring for random spaces or enters on cell string

Im having an issue with panda mergers creating inconsistent results if the csv file im using has random enter lines at the end of cell items. This usually creates 2 rows of the same cell name as some csv has the right format (no random leading or trailing junk line or space), and the one with the invisible spaces.
I tried .strip on the col before inputting them, but the extra line from "enter" can still get past. By enter/extra line, i don't mean a extra row, but if you open in excel and click on the cell, there's an extra blank line under the word and the cell expands to fit. I'm not sure if .strip has extra settings that can expand to capture these errors or if another filter layer is needed.
I have attached an image of the output csv and the error, but in effect, the extra enter(extra blank line) is not removed from .strip() on the col, resulting in merge on the same col treating the 2 rows as different strings. is there a good method of striping extra spaces in the data cell, or merging without caring for the exact spacing or lines following a string?
df=pd.read_csv(datalists_1.csv)
df2=pd.read_csv(datalists_2.csv)
df['chem_name'] = df['chem_name'].str.strip()
df2['chem_name'] = df2'chem_name'].str.strip()
merged_df=df.merge(df2,how='outer')
output:
enter image description here

read an ASCII text file

How do I read correctly this ASCII text file?
I can download it as a zip file here: https://www.irs.gov/pub/irs-soi/eo2016.zip
When I open it out of the zip file, add ".txt" to the file name, and open it in Excel - there are many numbers without any sense displayed (screenshot attached).enter image description here
I have also opened it in MatLab and RStudio, but there these numbers are also displayed.
Anybody knows how to do this correctly?
As discussed in the comments, the file is in fixed-width format (line length: 9444), and column positions have been specified in a separate Excel sheet.
Here are 3 possibilities to import such a file in Excel.
1. Excel's 'Convert Text to Columns' wizard
There's a 'Text to Columns' button in the 'Data' tab of Excel's ribbon.
It supports fixed-width files, but manually placing 833 column separators will be an incredibly tedious job.
And there seems to be no way to save the column definitions for subsequent imports.
2. Use Excel formulas
From the specs sheets (EO990_16), copy columns C and D, and paste them to another Excel sheet, transposed; use Paste Special - Transpose. This should populate rows 1 and 2 as follows:
1 13 22 26 27 102 162 ...
12 9 4 1 75 60 2 ...
Now fill the rest of the sheet starting from row 3 with formulas referencing the data sheet, like you see below.
This is a straightforward duplication of any single cell horizontally and vertically.
=MID(Data!$A3, A$1, A$2) =MID(Data!$A3, B$1, B$2) =MID(Data!$A3, C$1, C$2) ...
=MID(Data!$A4, A$1, A$2) =MID(Data!$A4, B$1, B$2) =MID(Data!$A4, C$1, C$2) ...
=MID(Data!$A5, A$1, A$2) =MID(Data!$A5, B$1, B$2) =MID(Data!$A5, C$1, C$2) ...
... ... ...
Source:
https://www.wizardofexcel.com/2011/09/28/saving-a-fixed-width-import-layout/
3. Convert to CSV
CSV is easy to import.
This command-line approach may help:
convert a fixed width file from text to csv
As a solution I used Excel, just separating the data with formulas according to the length of each cell described in the explanation excel.
If it's a text document why don't you open it in a text editor?

Read only 2 columns from a '~' delimited text file into dataframe and store second column as string

I have a very large text file (3.33 GB) which has 47 columns separated by delimiter ~. I just need the first and the last column to work with. The last column is a 17 digit number which may contain leading zeros. I have to store this column as a string (so as to not remove the leading zeros). An example of the first and last column is shown below:
id Number
0 0 10030040125198660
1 12345 60034046122158670
My question is whether it's possible to read just these two columns alone, and store the second column as string ? The reason I ask is because loading 3.3GB file as a dataframe takes a lot of time, converting it into string takes an even longer amount. I want to know if I can save time by choosing only the columns I need.
My code as of now (shown the column names as numbers for easy understanding):
df=pd.read_csv('myfile.txt',low_memory=False,sep='~',header=None)
df.drop(columns=[2,3,4...,46],inplace=True) #Keeping only column 1 and 47
df['47']=df['47'].astype(str)
Any help is highly appreciated!
You should use "usecols" parameter. Check out the read_csv official documentation. Infact that is the first thing you should check

Sorting txt data files while importing in Excel Data Query

I am trying to enter approximately 190 txt datafiles in Excel using the New Query tool (Data->New Query->From File->From Folder). In the Windows explorer the data are properly ordered: the first being 0summary, the second 30summary etc.
However, when entering them through the query tool the files are sorted as shown in the picture (see line 9 for example, you will see that the file is not in the right position):
The files are sorted based on the first digit instead of the value represented. Is there a solution to this issue? I have tried putting space between the number and the summary but it also didn't work. I saw online that Excel doesn't recognize the text within "" or after /, but I am not allowed to save the text files with those symbols in their name in Windows. Even when removed the word summary the problem didn't fix. Any suggestions?
If all your names include the word Summary:
You can add a column "Extract" / "Text before delimiter" enter "Summary", change the column type to Number and sort over that column
If the only numbers are those you wish to sort on, you can
add a custom column with just the numbers
Change the data type to whole number
sort on that.
The formula for the custom column:
Text.Select([Name],{"0".."9"})
If the alpha portion varies, and you need to sort on that also, you can do something similar adding another column for the alpha portion, and sorting on that.
If there might be digits after the leading digits upon which you want to sort, then use the following formula for the added column which will extract only the digits at the beginning of the file name:
=Text.Middle([Name],0,Text.PositionOfAny([Name],{"A".."z"}))

Resources