how to read text data and Convert to pandas dataframe - python-3.x

I am reading from the URL
import pandas as pd
url = 'https://ticdata.treasury.gov/Publish/mfh.txt'
Data = pd.read_csv(url, delimiter= '\t')
But when I export the Dataframe, I see all the columns are combined in a single column
I tried with different separators but didn't work. I want to get the Proper data frame. How I can achieve it.
Please help

I just opened this file in a text editor. This is not a tab delimited file, or anything delimited file. This is a fixed width fields files from line 9 to 48. You should use pd.read_fwf instead skiping some lines.
This would work:
import pandas as pd
url = 'https://ticdata.treasury.gov/Publish/mfh.txt'
Data = pd.read_fwf(url, widths=(31,8,8,8,8,8,8,8,8,8,8,8,8,8), skiprows=8, skipfooter=16, header=(0,1))
Data

The file you're linking is not in the CSV/TSV format. You need to transform the data to look something like this before loading it in this way.

Related

How to transform a report that is not a table format into a table format with Python

I've been struggling a while trying to solve the following issue.
I have some reports in .txt format that in the first 3 rows the metadata is placed, then, the report is displayed as a normal table.
Report Layout viewed in excel
Because of the layout of the report i have to use the following code to read the file
import pandas as pd
df=pd.read_csv('R1.txt', sep='delimiter',header=None)
I use sep='delimiter' because my .txt file is tabular separated file. getting the following:dataframe after reading
how can I take only the "dates" value and have it repeated along into a new column? having this i can erase the metadata rows so i can have the dataframe layout as a table. see example
desired dataframe layout
if your separator is a tabular you should use the following
import pandas as pd
df=pd.read_csv('R1.txt', sep='\t',header=None)

Convert pandas Data frame to existing Excel keeping the worksheet format

I have a data frame that I want to convert into an existing excel file using openpyxl. This file is already created, and it has a format (shown in the image) that I want to keep once the information is transferred from the data frame.
import pandas as pd
import openpyxl
dataframe=pd.read_excel('info.xlsx')
with pd.ExcelWriter('file.xlsx', engine='openpyxl', if_sheet_exists='replace',mode='a', keep_format=True ) as writer:
dataframe.to_excel(writer,sheet_name='DATAFRAME INFO',startrow=1,index=None)
I can't find the way to do it, I have tried adding in "kwargs" something like keep_format=True, but still does not work, it always removes the existing format.
Thank you very much IMAGE OF THE FORMAT

Switching between columns in a csv file

I have a folder with a lot of csv files, I want move column A to C, leave column A empty and push all other columns to the right.
I tried looking for something similar, but all other examples I saw refer to specific csv files and not an iteration over a folder.
thank you,
Here, the code to iterate over csvfile on your folder. Fill ellipsis with your code:
import pathlib
import shutil
root_dir = pathlib.Path('your_directory_path_here')
for csvfile in root_dir.glob('*.csv'):
df = pd.read_csv(csvfile, ...) # read your csv
# modify the order of your column here
...
shutil.copyfile(csvfile, f"{csvfile}.bak") # backup your csv
df.to_csv(csvfile, ...) # write back your dataframe

Editing CSV files with Pandas

My CSV file contains "" which ruins the file, when I import using Pandas, it considers that all columns as one value.
what I want to make is to change the following value in the column
4.7,3.2,1.3,.2,"Setosa"
to
4.7,3.2,1.3,.2,'Setosa'
Can't you use something like below?
string.replace('"', "'")

Pandas: Not seperating rows into colums

First Pandas Project
Starting to learn pandas and wanted to test with a dataset of my weightlifting which I exported as a CSV format. The purpose of this was to analyze my progression, but I have unfortunately run into an issue where my data rows are all stored in the same column and not splitting the data into the different columns which looks correct based on the imported header.
I've tried to add the seperator function while importing the csv, but looking at the data it needs to be "," that seperates the values (I guesss CSV always takes comma as default).
I am using the following code:
import pandas as pd
data = pd.read_csv("strong.csv")
Data from CSV looks like this:
Date,Workout Name,Exercise Name,Set Order,Weight,Reps,Distance,Seconds,Notes,Workout Notes
2018-05-08 19:27:54,"1: Back, Biceps & Abs","Deadlift (Barbell)",1,50,12,0,0,"",""
2018-05-08 19:27:54,"1: Back, Biceps & Abs","Deadlift (Barbell)",2,50,10,0,0,"",""
2018-05-08 19:27:54,"1: Back, Biceps & Abs","Deadlift (Barbell)",3,110,1,0,0,"",""
See image to see data.head() result:
( https://i.imgur.com/qQtw66S.png )
EDIT: See link to CSV file with first columns.
https://github.com/Trools/StrongProject
It seems there is an error in the CSV export.
I just tried creating a new file with the same data and suddently I dot an error about 11 lines instead of 10 existing in row 134. When going through the file, I found that one of the last data entries (weight) was stored as 72,5 instead of 72.5 which resulted in the issues of having an additional seperated value.
I am however a little confused why Pandas didnt give this error when trying to load the data in Jupyter notebook?
How would one go about an issue where a CSV export is not correctly formated?

Resources