Is there a way to export multiple pandas Dataframes in different sheet names using "to_csv" [duplicate] - python-3.x

I need to Export or save pandas Multiple Dataframe in an excel in different tabs?
Let's suppose my df's is:
df1:
Id Name Rank
1 Scott 4
2 Jennie 8
3 Murphy 1
df2:
Id Name Rank
1 John 14
2 Brown 18
3 Claire 11
df3:
Id Name Rank
1 Shenzen 84
2 Dass 58
3 Ghouse 31
df4:
Id Name Rank
1 Zen 104
2 Ben 458
3 Susuie 198
These are my four Dataframes and I need to Export as an Excel with 4 tabs i.e, df1,df2,df3,df4.

A simple method would be to hold your items in a collection and use the pd.ExcelWriter Class
Lets use a dictionary.
#1 Create a dictionary with your tab name and dataframe.
dfs = {'df1' : df1, 'df2' : df2...}
#2 create an excel writer object.
writer = pd.ExcelWriter('excel_file_name.xlsx')
#3 Loop over your dictionary write and save your excel file.
for name,dataframe in dfs.items():
dataframe.to_excel(writer,name,index=False)
writer.save()
adding a path.
from pathlib import Path
trg_path = Path('your_target_path')
writer = pd.ExcelWriter(trg_path.joinpath('excel_file.xlsx'))

Using xlsxwriter, you could do something like the following:
import xlsxwriter
import pandas as pd
### Create df's here ###
writer = pd.ExcelWriter('C:/yourFilePath/example.xslx', engine='xlsxwriter')
workbook = writer.book
### First df tab
worksheet1 = workbook.add_worksheet({}.format('df1') # The value in the parentheses is the tab name, so you can make that dynamic or hard code it
row = 0
col = 0
for Name, Rank in (df1):
worksheet.write(row, col, Name)
worksheet.write(row, col + 1, Rank)
row += 1
### Second df tab
worksheet2 = workbook.add_worksheet({}.format('df2')
row = 0
col = 0
for Name, Rank in (df2):
worksheet.write(row, col, Name)
worksheet.write(row, col + 1, Rank)
row += 1
### as so on for as many tabs as you want to create
workbook.close()
xlsxwriter allows you to do a lot of formatting as well. If you want to do that check out the docs

Related

Iteratively read excel sheet names, split and save them as new columns for each sheet in Python

Let's say we have many excel files with the multiple sheets as the following file data1.xlsx:
Sheet 1: 2021_q1_bj
a b c d
0 1 2 23 2
1 2 3 45 5
Sheet 2: 2021_q2_bj
a b c d
0 1 2 23 6
1 2 3 45 7
Sheet 3: 2019_q1_sh
a b c
0 1 2 23
1 2 3 45
Sheet 4: 2019_q2_sh
a b c
0 1 2 23
1 2 3 40
I need to obtain sheet name for each sheet, then split them by _, store the first part as year, the second part as quarter, and the last part as city.
Finaly I will save them back to excel file with multiple sheets.
ie., for the first sheet:
a b c d year quarter city
0 1 2 23 2 2021 q1 bj
1 2 3 45 5 2021 q1 bj
2 1 2 23 6 2021 q1 bj
3 2 3 45 7 2021 q1 bj
How could I achive this in Python? Thanks.
To loop all the excel files:
base_dir = './'
file_list = os.listdir(base_dir)
for file in file_list:
if '.xlsx' in file:
file_path = os.path.join(file_path, )
dfs = pd.read_excel()
You can use use f = pd.ExcelFile('data1.xlsx') to read the excel file in as an object, then loop through the list of sheet names by iterating through f.sheet_names, splitting each sheet name such as the "2019_q1_sh" string into the appropriate year, quarter, city and setting these as values of new columns in the DataFrame you are reading in from each sheet.
Then create a dictionary with sheet names as keys, and the corresponding modified DataFrame as the values. You can create a custom save_xls function that takes in such a dictionary and saves it, as described in this helpful answer.
Update: since you want to loop through all excel files in your current directory, you can use the glob library to get all of the files with extension .xlsx and loop through each of these files, read them in, and save a new file with the string new_ in front of the file name
import pandas as pd
from pandas import ExcelWriter
import glob
"""
Save a dictionary of dataframes to an excel file, with each dataframe as a separate page
Reference: https://stackoverflow.com/questions/14225676/save-list-of-dataframes-to-multisheet-excel-spreadsheet
"""
def save_xls(dict_df, path):
writer = ExcelWriter(path)
for key in dict_df:
dict_df[key].to_excel(writer, key)
writer.save()
## loop through all excel files
for filename in glob.glob("*.xlsx"):
f = pd.ExcelFile(filename)
dict_dfs = {}
for sheet_name in f.sheet_names:
df_new = f.parse(sheet_name = sheet_name)
## get the year and quarter from the sheet name
year, quarter, city = sheet_name.split("_")
df_new["year"] = year
df_new["quarter"] = quarter
df_new["city"] = city
## populate dictionary
dict_dfs[sheet_name] = df_new
save_xls(dict_df = dict_dfs, path = "new_" + filename)

How to write content of a list into an Excel sheet using openpyxl

I have the following list:
d_list = ["No., Start Name, Destination, Distance (miles)",
"1,ALBANY,NY CRAFT,28",
"2,GRACO,PIONEER,39",
"3,FONDA,ROME,41",
"4,NICCE,MARRINERS,132",
"5,TOUCAN,SUBVERSIVE,100",
"6,POLL,CONVERGENCE,28",
"7,STONE HOUSE,HUDSON VALLEY,9",
"8,GLOUCESTER GRAIN,BLACK MUDD POND,75",
"9,ARMY LEAGUE,MUMURA,190",
"10,MURRAY,FARMINGDALE,123"]
So, basically, the list consists of thousands of elements (just showed here a sample of 10), each is a string of comma separated elements. I'd like to write this into a new worksheet in a workbook.
Note: the workbook already exists and contains other sheets, I'm just adding a new sheet with this data.
My code:
import openpyxl
wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.create_sheet(title='distance')
for i in range(len(d_list)):
sheet.append(list(d_list[i]))
I'm expecting (in this example) 11 rows of data, each with 4 columns. However, I'm getting 11 rows alright but with each character of each string written in each cell! I think am almost there ... what am I missing? (Note: I've read through all the available posts related to this topic, but couldn't find any that answers this specific type of of question, hence I'm asking).
Many thanks!
You can use pandas to solve this:
1.) Convert your list into a dataframe:
In [231]: l
Out[231]:
['No., Start Name, Destination, Distance (miles)',
'1,ALBANY,NY CRAFT,28',
'2,GRACO,PIONEER,39',
'3,FONDA,ROME,41',
'4,NICCE,MARRINERS,132',
'5,TOUCAN,SUBVERSIVE,100',
'6,POLL,CONVERGENCE,28',
'7,STONE HOUSE,HUDSON VALLEY,9',
'8,GLOUCESTER GRAIN,BLACK MUDD POND,75',
'9,ARMY LEAGUE,MUMURA,190',
'10,MURRAY,FARMINGDALE,123']
In [228]: df = pd.DataFrame([i.split(",") for i in l])
In [229]: df
Out[229]:
0 1 2 3
0 No. Start Name Destination Distance (miles)
1 1 ALBANY NY CRAFT 28
2 2 GRACO PIONEER 39
3 3 FONDA ROME 41
4 4 NICCE MARRINERS 132
5 5 TOUCAN SUBVERSIVE 100
6 6 POLL CONVERGENCE 28
7 7 STONE HOUSE HUDSON VALLEY 9
8 8 GLOUCESTER GRAIN BLACK MUDD POND 75
9 9 ARMY LEAGUE MUMURA 190
10 10 MURRAY FARMINGDALE 123
2.) Write the above Dataframe to excel in a new-sheet in 4 columns:
import numpy as np
from openpyxl import load_workbook
path = "data.xlsx"
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book
df.to_excel(writer, sheet_name = 'distance')
writer.save()
writer.close()

Pandas - How to skip the first row of a csv file to be made the header with combining multiple csv files [duplicate]

I'm reading in a pandas DataFrame using pd.read_csv. I want to keep the first row as data, however it keeps getting converted to column names.
I tried header=False but this just deleted it entirely.
(Note on my input data: I have a string (st = '\n'.join(lst)) that I convert to a file-like object (io.StringIO(st)), then build the csv from that file object.)
You want header=None the False gets type promoted to int into 0 see the docs emphasis mine:
header : int or list of ints, default ‘infer’ Row number(s) to use as
the column names, and the start of the data. Default behavior is as if
set to 0 if no names passed, otherwise None. Explicitly pass header=0
to be able to replace existing names. The header can be a list of
integers that specify row locations for a multi-index on the columns
e.g. [0,1,3]. Intervening rows that are not specified will be skipped
(e.g. 2 in this example is skipped). Note that this parameter ignores
commented lines and empty lines if skip_blank_lines=True, so header=0
denotes the first line of data rather than the first line of the file.
You can see the difference in behaviour, first with header=0:
In [95]:
import io
import pandas as pd
t="""a,b,c
0,1,2
3,4,5"""
pd.read_csv(io.StringIO(t), header=0)
Out[95]:
a b c
0 0 1 2
1 3 4 5
Now with None:
In [96]:
pd.read_csv(io.StringIO(t), header=None)
Out[96]:
0 1 2
0 a b c
1 0 1 2
2 3 4 5
Note that in latest version 0.19.1, this will now raise a TypeError:
In [98]:
pd.read_csv(io.StringIO(t), header=False)
TypeError: Passing a bool to header is invalid. Use header=None for no
header or header=int or list-like of ints to specify the row(s) making
up the column names
I think you need parameter header=None to read_csv:
Sample:
import pandas as pd
from pandas.compat import StringIO
temp=u"""a,b
2,1
1,1"""
df = pd.read_csv(StringIO(temp),header=None)
print (df)
0 1
0 a b
1 2 1
2 1 1
If you're using pd.ExcelFile to read all the excel file sheets then:
df = pd.ExcelFile("path_to_file.xlsx")
df.sheet_names # Provide the sheet names in the excel file
df = df.parse(2, header=None) # Parsing the 2nd sheet in the file with header = None
df
Output:
0 1
0 a b
1 1 1
2 0 1
3 5 2
You can set custom column name in order to prevent this:
Let say if you have two columns in your dataset then:
df = pd.read_csv(your_file_path, names = ['first column', 'second column'])
You can also generate programmatically column names if you have more than and can pass a list in front of names attribute.

Appending values to a column in a loop

I have various files containing data. I want to extract one specific column from each file and create a new dataframe with one column containing all the extracted data.
So for example I have 3 files:
A B C
1 2 3
4 5 6
A B C
7 8 9
8 7 6
A B C
5 4 3
2 1 0
The new dataframe should only contain the values from column C:
C
3
6
9
6
3
0
So the column of the first file should be copied to the new dataframe, the column from the second file should be appendend to the new dataframe.
My code looks like this so far:
import pandas as pd
import glob
for filename in glob.glob('*.dat'):
df= pd.read_csv(filename, delimiter="\t", header=6)
df1= df["Bias"]
print(df)
Now df1 is overwritten in each loop step. Would it be a good idea to create a temporary dataframe in each loop step and then copy the data to the new dataframe?
Any input is appreciated!
Use list comprehension or for loop with append for list of DataFrames and if need only some columns add parameter usecols, last concat all together for big DataFrame:
dfs = [pd.read_csv(f, delimiter="\t", header=6, usecols=['C']) for f in glob.glob('*.dat')]
Or:
dfs = []
for filename in glob.glob('*.dat'):
df = pd.read_csv(filename, delimiter="\t", header=6, usecols=['C'])
#if need all columns
#df = pd.read_csv(filename, delimiter="\t", header=6)
dfs.append(df)
df = pd.concat(dfs, ignore_index=True)

How can I input values from a list or dataframe into each cell in existing excel file?

So basically, I want to update a worksheet with new data, overwriting existing cells in excel. Both files have the same column names (I do not want to create a new workbook nor add a new column).
Here I am retreiving the data that I want:
import pandas as pd
df1 = pd.read_csv
print(df1)
Ouput (I just copy and pasted the first 5 rows, there are about 500 rows total):
Index Type Stage CDID Period Index Value
0 812008000 6 2 JTV9 201706 121.570
1 812008000 6 2 JTV9 201707 121.913
2 812008000 6 2 JTV9 201708 121.686
3 812008000 6 2 JTV9 201709 119.809
4 812008000 6 2 JTV9 201710 119.841
5 812128000 6 1 K2VA 201706 122.030
The existing excel file has the same columns (and row total) as df1, but I just want to have the 'Index' column repopulated with the new values. Let's just say it looks like this (i.e. so I want the previous values for Index to go into the corresponding column):
Index Type Stage CDID Period Index Value
0 512901100 6 2 JTV9 201706 121.570
1 412602034 6 2 JTV9 201707 121.913
2 612307802 6 2 JTV9 201708 121.686
3 112808360 6 2 JTV9 201709 119.809
4 912233066 6 2 JTV9 201710 119.841
5 312128003 6 1 K2VA 201706 122.030
Here I am retrieving the excel file, and attempting to overwrite it:
from win32com.client import Dispatch
import os
xl = Dispatch("Excel.Application")
xl.Visible = True
wbs_path = ('folder path')
for wbname in os.listdir(wbs_path):
if not wbname.endswith("file name.xlsx"):
continue
wb = xl.Workbooks.Open(wbs_path + '\\' + wbname)
sh = wb.Worksheets("sheet name")
sh.Range("A1:A456").Value = df1[["Index"]]
wb.Save()
wb.Close()
xl.Quit()
But this doesn't do anything.
If I type in strings, such as:
h.Range("A1:A456").Value = 'o', 'x', 'c'
This repeats o in cells through A1 through to A456 (it updates the spreadsheet), but ignores x and c. I have tried converting df1 into a list and numpy array, but this doesn't work.
Does anyone know a solution or alternative workaround?
If the index of the dataframe is the same you can update columns by using update(). It could work like this:
df1.update(df2['Index'].to_frame())
Note: the to frame() is probably not needed
EDIT:
Since you try to update a excel-file and not a dataframe, my answer is probably not enough.
For this part I would suggest to load the file into a dataframe, update the data and save it.
df1 = pd.read_excel('file.xlsx', sheet_name='sheet_name')
# do the update
writer = pd.ExcelWriter('file.xlsx')
df1.to_excel(writer,sheet_name='sheet_name', engine='xlsxwriter')
writer.save()

Resources