Read excel into pandas dataframe without modifying the values of excel? - python-3.x

I am reading an xlsx file using Python's Pandas pd.read_excel(myfile.xlsx,sheet_name="my_sheet",header=2) and writing the df to a csv file using df.to_csv.
The excel file contains several columns with percentage values in it (e.g. 27.44 %). In the dataframe the values are getting converted to 0.2744, i don't want any modification in data. How can i achieve this?
I already tried:
using lambda function to convert back value from 0.2744 to 27.44 % but this i don't want this because the column names/index are not fixed. It can be any col contain the % values
df = pd.read_excel(myexcel.xlsx,sheet_name="my_sheet",header=5,dtype={'column_name':str}) - Didn't work
df = pd.read_excel(myexcel.xlsx,sheet_name="my_sheet",header=5,dtype={'column_name':object}) - Didn't work
Tried xlrd module also, but that too converted % values to float.
df = pd.read_excel(myexcel.xlsx,sheet_name="my_sheet")
df.to_csv(mycsv.csv,sep=",",index=False)

from your xlsx save the file directly in csv format
To import your csv file use pandas library as follow:
import pandas as pd
df=pd.read_csv('my_sheet.csv') #in case your file located in the same directory
more information on pandas.read_csv

Related

How do I convert my response with byte characters to readable CSV - PYTHON

I am building an API to save CSVs from Sharepoint Rest API using python 3. I am using a public dataset as an example. The original csv has 3 columns Group,Team,FIFA Ranking with corresponding data in the rows.For reference. the original csv on sharepoint ui looks like this:
after using data=response.content the output of data is:
b'Group,Team,FIFA Ranking\r\nA,Qatar,50\r\nA,Ecuador,44\r\nA,Senegal,18\r\nA,Netherlands,8\r\nB,England,5\r\nB,Iran,20\r\nB,United States,16\r\nB,Wales,19\r\nC,Argentina,3\r\nC,Saudi Arabia,51\r\nC,Mexico,13\r\nC,Poland,26\r\nD,France,4\r\nD,Australia,38\r\nD,Denmark,10\r\nD,Tunisia,30\r\nE,Spain,7\r\nE,Costa Rica,31\r\nE,Germany,11\r\nE,Japan,24\r\nF,Belgium,2\r\nF,Canada,41\r\nF,Morocco,22\r\nF,Croatia,12\r\nG,Brazil,1\r\nG,Serbia,21\r\nG,Switzerland,15\r\nG,Cameroon,43\r\nH,Portugal,9\r\nH,Ghana,61\r\nH,Uruguay,14\r\nH,South Korea,28\r\n'
how do I convert the above to csv that pandas can manipulate with the columns being Group,Team,FIFA and then the corresponding data dynamically so this method works for any csv.
I tried:
data=response.content.decode('utf-8', 'ignore').split(',')
however, when I convert the data variable to a dataframe then export the csv the csv just returns all the values in one column.
I tried:
data=response.content.decode('utf-8') or data=response.content.decode('utf-8', 'ignore') without the split
however, pandas does not take this in as a valid df and returns invalid use of dataframe constructor
I tried:
data=json.loads(response.content)
however, the format itself is invalid json format as you will get the error json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Given:
data = b'Group,Team,FIFA Ranking\r\nA,Qatar,50\r\nA,Ecuador,44\r\nA,Senegal,18\r\n' #...
If you just want a CSV version of your data you can simply do:
with open("foo.csv", "wt", encoding="utf-8", newline="") as file_out:
file_out.writelines(data.decode())
If your objective is to load this data into a pandas dataframe and the CSV is not actually important, you can:
import io
import pandas
foo = pandas.read_csv(io.StringIO(data.decode()))
print(foo)

How to read text file data in scientific format using pandas DataFrame

I have a text file (input.txt) with 2 columns of data which is in scientific format as shown.
input.txt file contents:
4.6277245181485196e-02 -3.478992280123e-02
5.147225314664928553e-02 -3.626645537995224627e-02
5.719622597261836416e-02 -3.778369677696073736e-02
6.351385032440140521e-02 -3.9348512512335400e-02
7.049988917103996999e-02 -4.096034949794334634e-02
7.822948857937785105e-02 -4.261684461302106541e-02
8.67649433797989394455e-02 -4.77e-02
9.614380281036348508e-02 -4.604114963738591831e-02
1.063651118650106309e-01 -4.777947266421164740e-02
1.173824105396738815e-01 -4.950717696170207904e-02
1.291006932795119577e-01 -5.119743181445588626e-02
I used below code to read the data as a DataFrame.
import pandas as pd
from tabulate import tabulate
df = pd.read_csv('input.txt',delim_whitespace=True,engine='python',header=None,skip_blank_lines=True)
f=open('output.txt','w')
f.write(tabulate(df.values,tablefmt="plain"))
f.close()
But the data is not getting read in scientific format. I'm writing the same data to another outfile file using tabulate (to look evenly spaced as a table). And, it is not in scientific format and also truncating the digits as shown.
output.txt file contents:
0.0462772 -0.0347899
0.0514723 -0.0362665
0.0571962 -0.0377837
0.0635139 -0.0393485
0.0704999 -0.0409603
0.0782295 -0.0426168
0.0867649 -0.0477
0.0961438 -0.0460411
0.106365 -0.0477795
0.117382 -0.0495072
0.129101 -0.0511974
I need the data to be read as-is, i.e. scientific format in this case and output to another file using tabulate. What needs to modify in the above code?
When reading the CSV specify dtype=str:
df = pd.read_csv("input.txt", sep=r"\s+", engine="python", dtype=str, header=None)
print(tabulate(df.values, tablefmt="plain", disable_numparse=True))
Prints:
4.6277245181485196e-02 -3.478992280123e-02
5.147225314664928553e-02 -3.626645537995224627e-02
5.719622597261836416e-02 -3.778369677696073736e-02
6.351385032440140521e-02 -3.9348512512335400e-02
7.049988917103996999e-02 -4.096034949794334634e-02
7.822948857937785105e-02 -4.261684461302106541e-02
8.67649433797989394455e-02 -4.77e-02
9.614380281036348508e-02 -4.604114963738591831e-02
1.063651118650106309e-01 -4.777947266421164740e-02
1.173824105396738815e-01 -4.950717696170207904e-02
1.291006932795119577e-01 -5.119743181445588626e-02

Issue when exporting dataframe to csv

I'm working on a mechanical engineering project. For the following code, the user enters the number of cylinders that their compressor has. A dataframe is then created with the correct number of columns and is exported to Excel as a CSV file.
The outputted dataframe looks exactly like I want it to as shown in the first link, but when opened in Excel it looks like the image in the second link:
1.my dataframe
2.Excel Table
Why is my dataframe not exporting properly to Excel and what can I do to get the same dataframe in Excel?
import pandas as pd
CylinderNo=int(input('Enter CylinderNo: '))
new_number=CylinderNo*3
list1=[]
for i in range(1,CylinderNo+1):
for j in range(0,3):
Cylinder_name=str('CylinderNo ')+str(i)
list1.append(Cylinder_name)
df = pd.DataFrame(list1,columns =['Kurbel/Zylinder'])
list2=['Triebwerk', 'Packung','Ventile']*CylinderNo
Bauteil = {'Bauteil': list2}
df2 = pd.DataFrame (Bauteil, columns = ['Bauteil'])
new=pd.concat([df, df2], axis=1)
list3=['Nan','Nan','Nan']*CylinderNo
Bewertung={'Bewertung': list3}
df3 = pd.DataFrame (Bewertung, columns = ['Bewertung'])
new2=pd.concat([new, df3], axis=1)
Empfehlung={'Empfehlung': list3}
df4 = pd.DataFrame (Empfehlung, columns = ['Empfehlung'])
new3=pd.concat([new2, df4], axis=1)
new3.set_index('Kurbel/Zylinder')
new3 = new3.set_index('Kurbel/Zylinder', append=True).swaplevel(0,1)
#export dataframe to csv
new3.to_csv('new3.csv')
To be clear, a comma-separated values (CSV) file is not an Excel format type or table. It is a delimited text file that Excel like other applications can open.
What you are comparing is simply presentation. Both data frames are exactly the same. For multindex data frames, Pandas print output does not repeat index values for readability on the console or IDE like Jupyter. But such values are not removed from underlying data frame only its presentation. If you re-order indexes, you will see this presentation changes. The full complete data frame is what is exported to CSV. And ideally for data integrity, you want the full data set exported with to_csv to be import-able back into Pandas with read_csv (which can set indexes) or other languages and applications.
Essentially, CSV is an industry format to store and transfer data. Consider using Excel spreadsheets, HTML markdown, or other reporting formats for your presentation needs. Therefore, to_csv may not be the best method. You can try to build text file manually with Python i/o write methods, with open('new.csv', 'w') as f, but will be an extensive workaround See also #Jeff's answer here but do note the latter part of solution does remove data.

Pandas read_csv to adding some very small values to the dataframe

When i use pandas read_csv, pandas add some little value to the dataframe, it went from -0.079257 to -0.07925700000000001, why is this happening and how can I fix this? It also only happen to some specific values, while others seems fine.
I've tried using float_precision but seems doesn't do anything, I'm new to pandas
df = pd.read_csv('filepath')
print(df.iat[0,0])
Dataset Link
I changed the dataset file type from txt to csv manually using notepad.
Dataset Image
This is because your original data have a np.float32 precision.
import pandas as pd
df = pd.read_csv('./avila/avila-ts.txt')
print(df.iat[0,0]) # 0.13029200000000002
# stored as np.float32
df.to_csv('./my.csv',float_format=np.float32, index_label=False)
df_1 = pd.read_csv('./my.csv')
print(df_1.iat[0,0]) # 0.13029200000000002
# stored as np.float16
df.to_csv('./my.csv',float_format=np.float16, index_label=False)
df_1 = pd.read_csv('./my.csv')
print(df_1.iat[0,0]) # 0.1302
I don't know what your data is structured. could you open the data and check, better still screenshot.
data = pandas.read_csv('filepath')
data.head()

Saving pandas data frame to .mat file in python3

I have a pandas data frame 'df', it looks like below but original data has many rows.
I would like to save this as .mat file with a name 'meta.mat'. I tried;
import scipy.io as sio
sio.savemat(os.path.join(destination_folder_path,'meta.mat'), df)
This creates the meta.mat file but it only writes the field names, when I open it in matlab it looks like this;
How can I fix this, thanks.
I don't think you can pass a pd.DataFrame directly when scipy.io.savemat is expecting a dict of numpy arrays. Try replacing df with the following in your call to savemat:
{name: col.values for name, col in df.items()}
This is another solution. The resulting mat file will in the form of a structure in matlab
# data dictionary
OutData = {}
# convert DF to dictionary before loading to your dictionary
OutData['Obj'] = df.to_dict('list')
sio.savemat('path\\testmat.mat',OutData)

Resources