Editing CSV files with Pandas - python-3.x

My CSV file contains "" which ruins the file, when I import using Pandas, it considers that all columns as one value.
what I want to make is to change the following value in the column
4.7,3.2,1.3,.2,"Setosa"
to
4.7,3.2,1.3,.2,'Setosa'

Can't you use something like below?
string.replace('"', "'")

Related

how to read text data and Convert to pandas dataframe

I am reading from the URL
import pandas as pd
url = 'https://ticdata.treasury.gov/Publish/mfh.txt'
Data = pd.read_csv(url, delimiter= '\t')
But when I export the Dataframe, I see all the columns are combined in a single column
I tried with different separators but didn't work. I want to get the Proper data frame. How I can achieve it.
Please help
I just opened this file in a text editor. This is not a tab delimited file, or anything delimited file. This is a fixed width fields files from line 9 to 48. You should use pd.read_fwf instead skiping some lines.
This would work:
import pandas as pd
url = 'https://ticdata.treasury.gov/Publish/mfh.txt'
Data = pd.read_fwf(url, widths=(31,8,8,8,8,8,8,8,8,8,8,8,8,8), skiprows=8, skipfooter=16, header=(0,1))
Data
The file you're linking is not in the CSV/TSV format. You need to transform the data to look something like this before loading it in this way.

Read excel with a particular word in title

I have a folder which may or may not have multiple excel file.
The name of the excel files can change with time, but there would be one specific keyword that will always be in the name of the excel.
For test purposes, let the keyword be Fruits
For the excel which have fixed name like Fruits_Pineapple.xlsx the code works:
import pandas as pd
pd.read_excel(r'c:\mypath\Fruits_Pineapple.xlsx')
But I can have excel file like Fruits_Pineapple, Fruits_Apple,Vegetables etc. I want to know how can I read the excel with contains functionality.
I have searched SO but surprisingly couldn't find any solution!!
Since you have no idea of how many (if any) excel files are valid in your folder, you can do the following using glob-
import glob
import pandas as pd
excel_list = glob.glob("*Fruits*.xlsx")
#or whatever extension you have, you can give a relative path or complete path.
for excel in excel_list:
pd.read_excel(excel)
#Whatever else you need to do below

Pandas: Not seperating rows into colums

First Pandas Project
Starting to learn pandas and wanted to test with a dataset of my weightlifting which I exported as a CSV format. The purpose of this was to analyze my progression, but I have unfortunately run into an issue where my data rows are all stored in the same column and not splitting the data into the different columns which looks correct based on the imported header.
I've tried to add the seperator function while importing the csv, but looking at the data it needs to be "," that seperates the values (I guesss CSV always takes comma as default).
I am using the following code:
import pandas as pd
data = pd.read_csv("strong.csv")
Data from CSV looks like this:
Date,Workout Name,Exercise Name,Set Order,Weight,Reps,Distance,Seconds,Notes,Workout Notes
2018-05-08 19:27:54,"1: Back, Biceps & Abs","Deadlift (Barbell)",1,50,12,0,0,"",""
2018-05-08 19:27:54,"1: Back, Biceps & Abs","Deadlift (Barbell)",2,50,10,0,0,"",""
2018-05-08 19:27:54,"1: Back, Biceps & Abs","Deadlift (Barbell)",3,110,1,0,0,"",""
See image to see data.head() result:
( https://i.imgur.com/qQtw66S.png )
EDIT: See link to CSV file with first columns.
https://github.com/Trools/StrongProject
It seems there is an error in the CSV export.
I just tried creating a new file with the same data and suddently I dot an error about 11 lines instead of 10 existing in row 134. When going through the file, I found that one of the last data entries (weight) was stored as 72,5 instead of 72.5 which resulted in the issues of having an additional seperated value.
I am however a little confused why Pandas didnt give this error when trying to load the data in Jupyter notebook?
How would one go about an issue where a CSV export is not correctly formated?

Find and Replace on import

I'm looking for a way to automatically do a find and replace after I have imported data from a CSV file. The date data in my CSV file has a time stamp that I do not want to use. Yes, I can do it manually but would like to automate it if possible.
The data in the left column is what I want to change to match what is in the right column.
Sample Data
Thanks.

Import Excel spreadsheet into phpMyAdmin

I have been trying to import an excel (xlsx) file into phpMyAdmin.
I have tried as both excel and csv file. I have tried csv and csv using load data.
I have replaced the default field termination value from ; to ,.
Most times I was getting an variety of error messages, so I deleted my field names column and then was able to import a single row of data only.
The data was off by a column, and I guess that has something to do with the structure of my table, which has a field for ID# as a primary auto-incrementing field which is not in my csv file.
I tried adding a column for that before importing with no success. I would have thought that I could import right from the xlsx file as that is one of the choices in phpMyAdmin but everything I read or watch online converts to csv.
I could use some help here.
I had a similar problem that I solved it by changing the 'fields enclosed by' option from " (double quote) to ' (single quote) and doing the same to the first line of the file which contains the field names. Worked like a charm. Hope this helps.
This is hopelessly late, but I'm replying in the hope that this might help a future viewer.
The reason that the CSV data is off by one is the very fact that you don't have the ID# field in it! The way to get around this is to import the file into a temporary table, then run
INSERT INTO `table`
SELECT NULL, <field1>, <field2>...
FROM `temp table`;
Adding NULL to the list of fields means that MySQL will autogenerate the ID# field (assuming you've set it to AUTO_INCREMENT).

Resources