How to write content of a list into an Excel sheet using openpyxl - python-3.x

I have the following list:
d_list = ["No., Start Name, Destination, Distance (miles)",
"1,ALBANY,NY CRAFT,28",
"2,GRACO,PIONEER,39",
"3,FONDA,ROME,41",
"4,NICCE,MARRINERS,132",
"5,TOUCAN,SUBVERSIVE,100",
"6,POLL,CONVERGENCE,28",
"7,STONE HOUSE,HUDSON VALLEY,9",
"8,GLOUCESTER GRAIN,BLACK MUDD POND,75",
"9,ARMY LEAGUE,MUMURA,190",
"10,MURRAY,FARMINGDALE,123"]
So, basically, the list consists of thousands of elements (just showed here a sample of 10), each is a string of comma separated elements. I'd like to write this into a new worksheet in a workbook.
Note: the workbook already exists and contains other sheets, I'm just adding a new sheet with this data.
My code:
import openpyxl
wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.create_sheet(title='distance')
for i in range(len(d_list)):
sheet.append(list(d_list[i]))
I'm expecting (in this example) 11 rows of data, each with 4 columns. However, I'm getting 11 rows alright but with each character of each string written in each cell! I think am almost there ... what am I missing? (Note: I've read through all the available posts related to this topic, but couldn't find any that answers this specific type of of question, hence I'm asking).
Many thanks!

You can use pandas to solve this:
1.) Convert your list into a dataframe:
In [231]: l
Out[231]:
['No., Start Name, Destination, Distance (miles)',
'1,ALBANY,NY CRAFT,28',
'2,GRACO,PIONEER,39',
'3,FONDA,ROME,41',
'4,NICCE,MARRINERS,132',
'5,TOUCAN,SUBVERSIVE,100',
'6,POLL,CONVERGENCE,28',
'7,STONE HOUSE,HUDSON VALLEY,9',
'8,GLOUCESTER GRAIN,BLACK MUDD POND,75',
'9,ARMY LEAGUE,MUMURA,190',
'10,MURRAY,FARMINGDALE,123']
In [228]: df = pd.DataFrame([i.split(",") for i in l])
In [229]: df
Out[229]:
0 1 2 3
0 No. Start Name Destination Distance (miles)
1 1 ALBANY NY CRAFT 28
2 2 GRACO PIONEER 39
3 3 FONDA ROME 41
4 4 NICCE MARRINERS 132
5 5 TOUCAN SUBVERSIVE 100
6 6 POLL CONVERGENCE 28
7 7 STONE HOUSE HUDSON VALLEY 9
8 8 GLOUCESTER GRAIN BLACK MUDD POND 75
9 9 ARMY LEAGUE MUMURA 190
10 10 MURRAY FARMINGDALE 123
2.) Write the above Dataframe to excel in a new-sheet in 4 columns:
import numpy as np
from openpyxl import load_workbook
path = "data.xlsx"
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book
df.to_excel(writer, sheet_name = 'distance')
writer.save()
writer.close()

Related

Is there a way to export multiple pandas Dataframes in different sheet names using "to_csv" [duplicate]

I need to Export or save pandas Multiple Dataframe in an excel in different tabs?
Let's suppose my df's is:
df1:
Id Name Rank
1 Scott 4
2 Jennie 8
3 Murphy 1
df2:
Id Name Rank
1 John 14
2 Brown 18
3 Claire 11
df3:
Id Name Rank
1 Shenzen 84
2 Dass 58
3 Ghouse 31
df4:
Id Name Rank
1 Zen 104
2 Ben 458
3 Susuie 198
These are my four Dataframes and I need to Export as an Excel with 4 tabs i.e, df1,df2,df3,df4.
A simple method would be to hold your items in a collection and use the pd.ExcelWriter Class
Lets use a dictionary.
#1 Create a dictionary with your tab name and dataframe.
dfs = {'df1' : df1, 'df2' : df2...}
#2 create an excel writer object.
writer = pd.ExcelWriter('excel_file_name.xlsx')
#3 Loop over your dictionary write and save your excel file.
for name,dataframe in dfs.items():
dataframe.to_excel(writer,name,index=False)
writer.save()
adding a path.
from pathlib import Path
trg_path = Path('your_target_path')
writer = pd.ExcelWriter(trg_path.joinpath('excel_file.xlsx'))
Using xlsxwriter, you could do something like the following:
import xlsxwriter
import pandas as pd
### Create df's here ###
writer = pd.ExcelWriter('C:/yourFilePath/example.xslx', engine='xlsxwriter')
workbook = writer.book
### First df tab
worksheet1 = workbook.add_worksheet({}.format('df1') # The value in the parentheses is the tab name, so you can make that dynamic or hard code it
row = 0
col = 0
for Name, Rank in (df1):
worksheet.write(row, col, Name)
worksheet.write(row, col + 1, Rank)
row += 1
### Second df tab
worksheet2 = workbook.add_worksheet({}.format('df2')
row = 0
col = 0
for Name, Rank in (df2):
worksheet.write(row, col, Name)
worksheet.write(row, col + 1, Rank)
row += 1
### as so on for as many tabs as you want to create
workbook.close()
xlsxwriter allows you to do a lot of formatting as well. If you want to do that check out the docs

how to extract different tables in excel sheet using python

In one excel file, sheet 1 , there are 4 tables at different locations in the sheet .How to read those 4 tables . for example I have even added one picture snap from google for reference. without using indexes is there any other way to extract tables.
I assume your tables are formatted as "Excel Tables".
You can create an excel table by mark a range and then click:
Then there are a good guide from Samuel Oranyeli how to import the Excel Tables with Python. I have used his code and show with examples.
I have used the following data in excel, where each color represents a table.
Remarks about code:
The following part can be used to check which tables exist in the worksheet that we are working with:
# check what tables that exist in the worksheet
print({key : value for key, value in ws.tables.items()})
In our example this code will give:
{'Table2': 'A1:C18', 'Table3': 'D1:F18', 'Table4': 'G1:I18', 'Table5': 'J1:K18'}
Here you set the dataframe names. Be cautious if the number of dataframes missmatches the number of tables you will get an error.
# Extract all the tables to individually dataframes from the dictionary
Table2, Table3, Table4, Table5 = mapping.values()
# Print each dataframe
print(Table2.head(3)) # Print first 3 rows from df
print(Table2.head(3)) gives:
Index first_name last_name address
0 Aleshia Tomkiewicz 14 Taylor St
1 Evan Zigomalas 5 Binney St
2 France Andrade 8 Moor Place
Full code:
#import libraries
from openpyxl import load_workbook
import pandas as pd
# read file
wb = load_workbook("G:/Till/Tables.xlsx") # Set the filepath + filename
# select the sheet where tables are located
ws = wb["Tables"]
# check what tables that exist in the worksheet
print({key : value for key, value in ws.tables.items()})
mapping = {}
# loop through all the tables and add to a dictionary
for entry, data_boundary in ws.tables.items():
# parse the data within the ref boundary
data = ws[data_boundary]
### extract the data ###
# the inner list comprehension gets the values for each cell in the table
content = [[cell.value for cell in ent]
for ent in data]
header = content[0]
#the contents ... excluding the header
rest = content[1:]
#create dataframe with the column names
#and pair table name with dataframe
df = pd.DataFrame(rest, columns = header)
mapping[entry] = df
# print(mapping)
# Extract all the tables to individually dataframes from the dictionary
Table2, Table3, Table4, Table5 = mapping.values()
# Print each dataframe
print(Table2)
print(Table3)
print(Table4)
print(Table5)
Example data, example file:
first_name
last_name
address
city
county
postal
Aleshia
Tomkiewicz
14 Taylor St
St. Stephens Ward
Kent
CT2 7PP
Evan
Zigomalas
5 Binney St
Abbey Ward
Buckinghamshire
HP11 2AX
France
Andrade
8 Moor Place
East Southbourne and Tuckton W
Bournemouth
BH6 3BE
Ulysses
Mcwalters
505 Exeter Rd
Hawerby cum Beesby
Lincolnshire
DN36 5RP
Tyisha
Veness
5396 Forth Street
Greets Green and Lyng Ward
West Midlands
B70 9DT
Eric
Rampy
9472 Lind St
Desborough
Northamptonshire
NN14 2GH
Marg
Grasmick
7457 Cowl St #70
Bargate Ward
Southampton
SO14 3TY
Laquita
Hisaw
20 Gloucester Pl #96
Chirton Ward
Tyne & Wear
NE29 7AD
Lura
Manzella
929 Augustine St
Staple Hill Ward
South Gloucestershire
BS16 4LL
Yuette
Klapec
45 Bradfield St #166
Parwich
Derbyshire
DE6 1QN
Fernanda
Writer
620 Northampton St
Wilmington
Kent
DA2 7PP
Charlesetta
Erm
5 Hygeia St
Loundsley Green Ward
Derbyshire
S40 4LY
Corrinne
Jaret
2150 Morley St
Dee Ward
Dumfries and Galloway
DG8 7DE
Niesha
Bruch
24 Bolton St
Broxburn, Uphall and Winchburg
West Lothian
EH52 5TL
Rueben
Gastellum
4 Forrest St
Weston-Super-Mare
North Somerset
BS23 3HG
Michell
Throssell
89 Noon St
Carbrooke
Norfolk
IP25 6JQ
Edgar
Kanne
99 Guthrie St
New Milton
Hampshire
BH25 5DF
You may convert your excel sheet to csv file and then use csv module to grab rows.
import pandas as pd
read_file = pd.read_excel("Test.xlsx")
read_file.to_csv ("Test.csv",index = None,header=True)
enter code here
df = pd.DataFrame(pd.read_csv("Test.csv"))
print(df)
For better approch please provide us sample excel file
You need two things:
Access OpenXML data via python: https://github.com/python-openxml/python-xlsx
Find the tables in the file, via what is called a DefinedName: https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.definedname?view=openxml-2.8.1
You may convert your excel sheet to csv file and then use csv module to grab rows.
//Code for excel to csv
import pandas as pd
read_file = pd.read_excel ("Test.xlsx")
read_file.to_csv ("Test.csv",index = None,header=True)
df = pd.DataFrame(pd.read_csv("Test.csv"))
print(df)
For better approch please provide us sample excel file

Iteratively read excel sheet names, split and save them as new columns for each sheet in Python

Let's say we have many excel files with the multiple sheets as the following file data1.xlsx:
Sheet 1: 2021_q1_bj
a b c d
0 1 2 23 2
1 2 3 45 5
Sheet 2: 2021_q2_bj
a b c d
0 1 2 23 6
1 2 3 45 7
Sheet 3: 2019_q1_sh
a b c
0 1 2 23
1 2 3 45
Sheet 4: 2019_q2_sh
a b c
0 1 2 23
1 2 3 40
I need to obtain sheet name for each sheet, then split them by _, store the first part as year, the second part as quarter, and the last part as city.
Finaly I will save them back to excel file with multiple sheets.
ie., for the first sheet:
a b c d year quarter city
0 1 2 23 2 2021 q1 bj
1 2 3 45 5 2021 q1 bj
2 1 2 23 6 2021 q1 bj
3 2 3 45 7 2021 q1 bj
How could I achive this in Python? Thanks.
To loop all the excel files:
base_dir = './'
file_list = os.listdir(base_dir)
for file in file_list:
if '.xlsx' in file:
file_path = os.path.join(file_path, )
dfs = pd.read_excel()
You can use use f = pd.ExcelFile('data1.xlsx') to read the excel file in as an object, then loop through the list of sheet names by iterating through f.sheet_names, splitting each sheet name such as the "2019_q1_sh" string into the appropriate year, quarter, city and setting these as values of new columns in the DataFrame you are reading in from each sheet.
Then create a dictionary with sheet names as keys, and the corresponding modified DataFrame as the values. You can create a custom save_xls function that takes in such a dictionary and saves it, as described in this helpful answer.
Update: since you want to loop through all excel files in your current directory, you can use the glob library to get all of the files with extension .xlsx and loop through each of these files, read them in, and save a new file with the string new_ in front of the file name
import pandas as pd
from pandas import ExcelWriter
import glob
"""
Save a dictionary of dataframes to an excel file, with each dataframe as a separate page
Reference: https://stackoverflow.com/questions/14225676/save-list-of-dataframes-to-multisheet-excel-spreadsheet
"""
def save_xls(dict_df, path):
writer = ExcelWriter(path)
for key in dict_df:
dict_df[key].to_excel(writer, key)
writer.save()
## loop through all excel files
for filename in glob.glob("*.xlsx"):
f = pd.ExcelFile(filename)
dict_dfs = {}
for sheet_name in f.sheet_names:
df_new = f.parse(sheet_name = sheet_name)
## get the year and quarter from the sheet name
year, quarter, city = sheet_name.split("_")
df_new["year"] = year
df_new["quarter"] = quarter
df_new["city"] = city
## populate dictionary
dict_dfs[sheet_name] = df_new
save_xls(dict_df = dict_dfs, path = "new_" + filename)

how to apply functions on multiple excel sheets in a loop in Python?

i have a excel file with a data like this on 57 sheets
Cate asso_num
1 "a" 33
2 "a" 67
3 "b" 97
4 "b" 60
i want to group by and get the mean of each category
def grouping( excel_file_location):
# should read all the excel sheets i.e 57 sheets currently in a loop (i dont know how to do it)
fil = pd.read_excel(...)
fil = fil.groupby("Cate").agg({"asso_num":"mean"})
# and should write in that same excel sheet
I want it do it from by writing function only
You can do the following:
def grouping(excel_file_location):
sheets_to_df= pd.read_excel(excel_file_location, sheet_name=None)
df = pd.concat(sheets_to_df, ignore_index=True)
df = df.groupby("Cate").agg({"asso_num":"mean"})
return df
So. In my example I created an excel with the data you provided and made three sheets with exact copies of it and gave:
path = r"C:\....\SDEGOSSONDEVARENNE\Sheets.xlsx"
Doing grouping(path)
returned:
asso_num
Cate
a 50.0
b 78.5
You can also reset the index
grouping(path).reset_index()
which gives
Cate asso_num
0 a 50.0
1 b 78.5

How can I input values from a list or dataframe into each cell in existing excel file?

So basically, I want to update a worksheet with new data, overwriting existing cells in excel. Both files have the same column names (I do not want to create a new workbook nor add a new column).
Here I am retreiving the data that I want:
import pandas as pd
df1 = pd.read_csv
print(df1)
Ouput (I just copy and pasted the first 5 rows, there are about 500 rows total):
Index Type Stage CDID Period Index Value
0 812008000 6 2 JTV9 201706 121.570
1 812008000 6 2 JTV9 201707 121.913
2 812008000 6 2 JTV9 201708 121.686
3 812008000 6 2 JTV9 201709 119.809
4 812008000 6 2 JTV9 201710 119.841
5 812128000 6 1 K2VA 201706 122.030
The existing excel file has the same columns (and row total) as df1, but I just want to have the 'Index' column repopulated with the new values. Let's just say it looks like this (i.e. so I want the previous values for Index to go into the corresponding column):
Index Type Stage CDID Period Index Value
0 512901100 6 2 JTV9 201706 121.570
1 412602034 6 2 JTV9 201707 121.913
2 612307802 6 2 JTV9 201708 121.686
3 112808360 6 2 JTV9 201709 119.809
4 912233066 6 2 JTV9 201710 119.841
5 312128003 6 1 K2VA 201706 122.030
Here I am retrieving the excel file, and attempting to overwrite it:
from win32com.client import Dispatch
import os
xl = Dispatch("Excel.Application")
xl.Visible = True
wbs_path = ('folder path')
for wbname in os.listdir(wbs_path):
if not wbname.endswith("file name.xlsx"):
continue
wb = xl.Workbooks.Open(wbs_path + '\\' + wbname)
sh = wb.Worksheets("sheet name")
sh.Range("A1:A456").Value = df1[["Index"]]
wb.Save()
wb.Close()
xl.Quit()
But this doesn't do anything.
If I type in strings, such as:
h.Range("A1:A456").Value = 'o', 'x', 'c'
This repeats o in cells through A1 through to A456 (it updates the spreadsheet), but ignores x and c. I have tried converting df1 into a list and numpy array, but this doesn't work.
Does anyone know a solution or alternative workaround?
If the index of the dataframe is the same you can update columns by using update(). It could work like this:
df1.update(df2['Index'].to_frame())
Note: the to frame() is probably not needed
EDIT:
Since you try to update a excel-file and not a dataframe, my answer is probably not enough.
For this part I would suggest to load the file into a dataframe, update the data and save it.
df1 = pd.read_excel('file.xlsx', sheet_name='sheet_name')
# do the update
writer = pd.ExcelWriter('file.xlsx')
df1.to_excel(writer,sheet_name='sheet_name', engine='xlsxwriter')
writer.save()

Resources