Saving pandas data frame to .mat file in python3 - python-3.x

I have a pandas data frame 'df', it looks like below but original data has many rows.
I would like to save this as .mat file with a name 'meta.mat'. I tried;
import scipy.io as sio
sio.savemat(os.path.join(destination_folder_path,'meta.mat'), df)
This creates the meta.mat file but it only writes the field names, when I open it in matlab it looks like this;
How can I fix this, thanks.

I don't think you can pass a pd.DataFrame directly when scipy.io.savemat is expecting a dict of numpy arrays. Try replacing df with the following in your call to savemat:
{name: col.values for name, col in df.items()}

This is another solution. The resulting mat file will in the form of a structure in matlab
# data dictionary
OutData = {}
# convert DF to dictionary before loading to your dictionary
OutData['Obj'] = df.to_dict('list')
sio.savemat('path\\testmat.mat',OutData)

Related

How do I convert my response with byte characters to readable CSV - PYTHON

I am building an API to save CSVs from Sharepoint Rest API using python 3. I am using a public dataset as an example. The original csv has 3 columns Group,Team,FIFA Ranking with corresponding data in the rows.For reference. the original csv on sharepoint ui looks like this:
after using data=response.content the output of data is:
b'Group,Team,FIFA Ranking\r\nA,Qatar,50\r\nA,Ecuador,44\r\nA,Senegal,18\r\nA,Netherlands,8\r\nB,England,5\r\nB,Iran,20\r\nB,United States,16\r\nB,Wales,19\r\nC,Argentina,3\r\nC,Saudi Arabia,51\r\nC,Mexico,13\r\nC,Poland,26\r\nD,France,4\r\nD,Australia,38\r\nD,Denmark,10\r\nD,Tunisia,30\r\nE,Spain,7\r\nE,Costa Rica,31\r\nE,Germany,11\r\nE,Japan,24\r\nF,Belgium,2\r\nF,Canada,41\r\nF,Morocco,22\r\nF,Croatia,12\r\nG,Brazil,1\r\nG,Serbia,21\r\nG,Switzerland,15\r\nG,Cameroon,43\r\nH,Portugal,9\r\nH,Ghana,61\r\nH,Uruguay,14\r\nH,South Korea,28\r\n'
how do I convert the above to csv that pandas can manipulate with the columns being Group,Team,FIFA and then the corresponding data dynamically so this method works for any csv.
I tried:
data=response.content.decode('utf-8', 'ignore').split(',')
however, when I convert the data variable to a dataframe then export the csv the csv just returns all the values in one column.
I tried:
data=response.content.decode('utf-8') or data=response.content.decode('utf-8', 'ignore') without the split
however, pandas does not take this in as a valid df and returns invalid use of dataframe constructor
I tried:
data=json.loads(response.content)
however, the format itself is invalid json format as you will get the error json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Given:
data = b'Group,Team,FIFA Ranking\r\nA,Qatar,50\r\nA,Ecuador,44\r\nA,Senegal,18\r\n' #...
If you just want a CSV version of your data you can simply do:
with open("foo.csv", "wt", encoding="utf-8", newline="") as file_out:
file_out.writelines(data.decode())
If your objective is to load this data into a pandas dataframe and the CSV is not actually important, you can:
import io
import pandas
foo = pandas.read_csv(io.StringIO(data.decode()))
print(foo)

Read excel into pandas dataframe without modifying the values of excel?

I am reading an xlsx file using Python's Pandas pd.read_excel(myfile.xlsx,sheet_name="my_sheet",header=2) and writing the df to a csv file using df.to_csv.
The excel file contains several columns with percentage values in it (e.g. 27.44 %). In the dataframe the values are getting converted to 0.2744, i don't want any modification in data. How can i achieve this?
I already tried:
using lambda function to convert back value from 0.2744 to 27.44 % but this i don't want this because the column names/index are not fixed. It can be any col contain the % values
df = pd.read_excel(myexcel.xlsx,sheet_name="my_sheet",header=5,dtype={'column_name':str}) - Didn't work
df = pd.read_excel(myexcel.xlsx,sheet_name="my_sheet",header=5,dtype={'column_name':object}) - Didn't work
Tried xlrd module also, but that too converted % values to float.
df = pd.read_excel(myexcel.xlsx,sheet_name="my_sheet")
df.to_csv(mycsv.csv,sep=",",index=False)
from your xlsx save the file directly in csv format
To import your csv file use pandas library as follow:
import pandas as pd
df=pd.read_csv('my_sheet.csv') #in case your file located in the same directory
more information on pandas.read_csv

Pandas read_csv to adding some very small values to the dataframe

When i use pandas read_csv, pandas add some little value to the dataframe, it went from -0.079257 to -0.07925700000000001, why is this happening and how can I fix this? It also only happen to some specific values, while others seems fine.
I've tried using float_precision but seems doesn't do anything, I'm new to pandas
df = pd.read_csv('filepath')
print(df.iat[0,0])
Dataset Link
I changed the dataset file type from txt to csv manually using notepad.
Dataset Image
This is because your original data have a np.float32 precision.
import pandas as pd
df = pd.read_csv('./avila/avila-ts.txt')
print(df.iat[0,0]) # 0.13029200000000002
# stored as np.float32
df.to_csv('./my.csv',float_format=np.float32, index_label=False)
df_1 = pd.read_csv('./my.csv')
print(df_1.iat[0,0]) # 0.13029200000000002
# stored as np.float16
df.to_csv('./my.csv',float_format=np.float16, index_label=False)
df_1 = pd.read_csv('./my.csv')
print(df_1.iat[0,0]) # 0.1302
I don't know what your data is structured. could you open the data and check, better still screenshot.
data = pandas.read_csv('filepath')
data.head()

Python Pandas dataframe, how to integrate new columns into a new csv

guys, I need a bit help on Pandas and would appreciate greatly your inputs.
My original file looks like this:
I would like to convert it by mergering some pairs of columns (generating their averages) and returns a new file looking like this:
Also, if possible, I would also like to split the column 'RateDateTime' into two columns, one contains the date, the other contains only the time. How should I do it? I tried coding as belows but it doesn't work:
import pandas as pd
dateparse = lambda x: pd.datetime.strptime(x, '%Y/%m/%d %H:%M:%S')
df = pd.read_csv('data.csv', parse_dates=['RateDateTime'], index_col='RateDateTime',date_parser=dateparse)
a=pd.to_numeric(df['RateAsk_open'])
b=pd.to_numeric(df['RateAsk_high'])
c=pd.to_numeric(df['RateAsk_low'])
d=pd.to_numeric(df['RateAsk_close'])
e=pd.to_numeric(df['RateBid_open'])
f=pd.to_numeric(df['RateBid_high'])
g=pd.to_numeric(df['RateBid_low'])
h=pd.to_numeric(df['RateBid_close'])
df['Open'] = (a+e) /2
df['High'] = (b+f) /2
df['Low'] = (c+g) /2
df['Close'] = (d+h) /2
grouped = df.groupby('CurrencyPair')
Open=grouped['Open']
High=grouped['High']
Low=grouped['Low']
Close=grouped['Close']
w=pd.concat([Open, High,Low,Close], axis=1, keys=['Open', 'High','Low','Close'])
w.to_csv('w.csv')
Python returns:
TypeError: cannot concatenate object of type "<class 'pandas.core.groupby.groupby.SeriesGroupBy'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
Can someone help me please? Many thanks!!!
IIUYC, you don't need grouping here. You can simply update existing dataframe with new columns and specify, what columns you need to save to csv file in to_csv method. Here is example:
df['Open'] = df[['RateAsk_open', 'RateBid_open']].mean(axis=1)
df['RateDate'] = df['RateDateTime'].dt.date
df['RateTime'] = df['RateDateTime'].dt.time
df.to_csv('w.csv', columns=['CurrencyPair', 'Open', 'RateDate', 'RateTime'])

Create string from all row elements pandas

I have a csv file and I would like to create a string with all the elements of each row. Lets say that I have the following csv...
trump,clinton
google,microsoft,linkedin
linux,windows,osx
data science,operating systems
I would like to create a string like so; trump&clinton | google&microsoft&linkedin and so forth. I did import the file and create a df with pandas. The solution doesn't have to be with pandas, if can be done with import csv that is acceptable as well.
I need one string per row... each row will become its own string.
Try
df.apply('&'.join, axis=1)

Resources