Iterating over pandas dataframe and saving into separate sheets in xlxs file - excel

I'm iterating over a pandas dataframe during and carry out and operation to obtain the information (excel sheet number) for saving the appropriate excel sheet like this:
from opnepyxl.utils.dataframe import dataframe_to_rows
for i,data in df.iterows:
sheet=data['SheetNo']
#Create excel writer
writer=pd.Excelwriter('output.xlxs')
# write dataframe to excelsheet
data.to_excel(writer, sheet)
#save the excel file
writer.save()
Dataframe:
ID SheetNo setting
2304 2 IGV5
2305 3 IGV2
2306 1 IGV6
2307 2 IGV2
2308 1 IGV1
What I wanted was for data to go into each of the created 'SheetNo' of the excel file, instead the the previous sheet is being overwritten by the following one, and you can only see the last sheet number.
What can I do to make this code work? Any other approach apart from mine above will be welcome.

the python code is below. Note that:
Assumption is that there is already an output.xlsx file in same dir that this code runs
It will search for worksheet names with the SheetNo column. If not available in the file, it will create a new worksheet/tab with that name and add the header row
3.The program will then add each row (append) to the sheet
Once all data in DF is added, it will save file back to same name
You can run this as many times as you want, it will keep adding new sheets or appending to existing sheets.
import pandas as pd
from openpyxl import load_workbook
data = {'ID': [2304,2305,2306,2307,2308],
'SheetNo': [2,3,1,2,1],
'setting': ['IGV5', 'IGV2', 'IGV6', 'IGV2', 'IGV1']}
df = pd.DataFrame(data)
headers = ['ID','SheetNo','setting']
FilePath = 'output.xlsx' #ASSUMPTON - File already exists
wb = load_workbook(FilePath)
for sheet in df.SheetNo.unique():
if str(sheet) in wb.sheetnames: #If sheet in excel file
ws = wb[str(sheet)]
else:
ws = wb.create_sheet(title=str(sheet)) #Create New sheet is not present
ws.append(headers) #New sheet = Need header, else not required
i =0
j = 0
for row in ws.iter_rows(min_row=ws.max_row+1, max_row=ws.max_row+len(df[df.SheetNo == sheet]), min_col=1, max_col=3):
for cell in row:
cell.value = df[df.SheetNo == sheet].iloc[i,j]
j = j + 1
j=0
i= i + 1
wb.save('output.xlsx')
Output excel after one run of the code

Related

Read last sheet of spreadsheet using pandas dataframe

I ma trying to use pandas dataframes to read the last sheet of a spreadsheet since I don't need the rest. how do I tell python just to take the last one? I can not find a flag on the documentation that says how to do this. I can specify the sheet with sheet_name flag but this does not work for me since I don't know how many sheets I have
raw_excel = pd.read_excel(path, sheet_name=0)
You can use the ExcelFile function.
xl = pd.ExcelFile(path)
# See all sheet names
sheet_names = xl.sheet_names
# Last sheet name
last_sheet = sheet_names[-1]
# Read a last sheet to DataFrame
xl.parse(last_sheet)

Python: Issue reading and writing with formulas from one excel sheet to another. Using openpyxl

I am new to programming and have a question with python, openpyxl, and formulas. At the moment, I am trying to use a formula to sum multiple cells on one excel sheet (xlsx), read it, and write it to another excel sheet. This is my current code:
# Give the location of the Departments Report
path4 = "Departments - June 29, 2021 to June 29, 2021.xlsx"
# To open the workbook
# workbook object is created
wb4 = openpyxl.load_workbook(path4)
# Get workbook active sheet object
# from the active attribute
ws5 = wb4.active
# Cell object is created by using
# sheet object's wb[] method.
ws5['I100'] = "=SUM(I8:I99)"
Dc1 = ws5['I100']
# Give the location of the write file
path2 = "Daily Financials.xlsx"
# To open the workbook
# workbook object is created
wb2 = openpyxl.load_workbook(path2)
# Get desired workbook sheet
ws6 = wb2["COS"]
# Cell is specified with value
ws6["CK7"].value = Dc1.value
# Finally, save and close the Excel file
# via the save() and close() method.
wb2.save('Daily Financials.xlsx')
wb2.close()
There are no errors when running the code, however, the formula is written on the new excel sheet as "=SUM(I8:I99)" and not the value of those from the first excel sheet. I have looked around and tried adding ".value", but I am struggling to figure this out.
Current excel output
Thanks for any help.
Since you are already using python, I suggest a loop that will put the actual data (not formula) into the cell in ws5:
# Cell object is created by using
# sheet object's wb[] method.
counter = 0
for c in ws5['I8:I99']:
counter += c[0].value
ws5['I100'] = counter
#or with list comprehension: ws5['I100'] = sum([c[0].value for c in ws5['I8:I99']])
Dc1 = ws5['I100']

Python: How to read multiple spreadsheets into a new format in a CSV?

I (newcomer) try to read from an excel document several tables and read in a new format in a single csv.
In the csv, i need the following fields: year (from a global variable), month (from a global variable), outlet (name of the tablesheet); rowvalue [a] (string to explain the row), columnvalue [1] (string to explain the cloumn), cellvalue (float)
The corresponding values must then be entered in these.
From the respective tables, only RowNum 6 to 89 need to be read
#BWA-Reader
#read the excel spreadsheet with all sheets
#Python 3.6
Importe
import openpyxl
import xlrd
from PIL import Image as PILImage
import csv
# year value of the Business analysis
year = "2018"
# month value of the Business analysis
month = "11"
# .xlxs path
wb = openpyxl.load_workbook("BWA Zusammenfassung 18-11.xlsx")
print("Found your Spreadsheet")
# List of sheets
sheets = wb.get_sheet_names()
# remove unneccessary sheets
list_to_remove = ("P",'APn','AP')
sheets_clean = list(set(sheets).difference(set(list_to_remove)))
print("sheets to load: " + str(sheets_clean))
# for loop for every sheet based on sheets_clean
for sheet in sheets_clean:
# for loop to build list for row and cell value
all_rows = []
for row in wb[sheet].rows:
current_row = []
for cell in row:
current_row.append (cell.value)
all_rows.append(current_row)
print(all_rows)
# i´m stucked -.-´
I expect an output like:
2018;11;Oldenburg;total_sales;monthly;145840.00
all sheets in one csv
Thank you so much for every idea how to solve my project!
The complete answer to this question is very dependent on the actual dataset.
I would recommend looking into pandas' read_excel() function. This will make it so much easier to extract the needed rows/columns/cells, all without looping through all of the sheets.
You might need some tutorials on pandas in order to get there, but judging by what you are trying to do, pandas might be a useful skill to have in the future!

Problems with Program using openpyxl [duplicate]

import openpyxl
wb=openpyxl.load_workbook('Book_1.xlsx')
ws=wb['Sheet_1']
I am trying to analyze an excel spreadsheet using openpyxl. My goal is to get the max number from column D for each group of numbers in column A. I would like help in getting a code to loop for the analysis. Here is an example of the spreadsheet that I am trying to analyze. The file name is Book 1 and the sheet name is Sheet 1. I am running Python 3.6.1, pandas 0.20.1, and openpyxl 2.4.7. I am providing the code I have so far.
IIUC, use pandas module to achieve this:
import pandas as pd
df = pd.read_excel('yourfile.xlsx')
maxdf = df.groupby('ID').max()
maxdf will have the result you are looking for.
Let's say you have file test.xlsx with worksheet ws1. Try:
from openpyxl import load_workbook
wb = load_workbook(filename='test.xlsx')
ws = wb['ws1']
for col in ws.columns:
col_max = 0
for cell in col:
if cell.value > col_max:
col_max = cell.value
print('next max:', col_max)
I'm looping over all the rows because I'm not sure what you've expected.

Python Copying specific rows from a .csv to an .xlsx given that a particular value is in the third column

I am new to python. I have done a very small amount at Uni.
I am working on a personal program for my family that takes .csv files from an ftp site, merges them all together, and then puts the information into a template .xlsx file. the information in this csv consist of each row holding a set of data for a position of a node.
I am stuck with this last section, moving the information from the csv file to the xlsx file.
the following image is of a bit of the csv file:
http://i.stack.imgur.com/6mST3.png
the third column (labelled valve pos) will have multiple rows from 1 - 15 and it will repeat (so there will be more than one row with a 1 in that column, so on).
I need to copy all the rows with a 1 in this column into a pre-existing sheet in the xlsx template. however other columns may also have a 1 without having a 1 in the third column so it needs to be based off that column alone. I have tried a couple of ways but keep hitting errors or ending up with corrupted information in the xlsx (eg merging of all the rows into one single row, the columns being in different positions, so on)
this so far includes the code I am currently working with these are two different codes i have tried but have not had success with.
The first code is:
wb = openpyxl.load_workbook('MasterTemplate4.xlsx')
ws = wb.get_sheet_by_name('All')
with open('file location' + fl1 + '.csv') as f:
reader = csv.DictReader(f)
rows = [row for row in reader if row['Valve Pos'] != '1']
for row in rows:
ws.write(row)
wb.save('save location' + fl1 + '.xlsx')
The second code is:
f = open(r'file location' + fl1 + '.csv')
csv.register_dialect('colons', delimiter=':')
reader = csv.reader(f, dialect='colons')
wb = openpyxl.load_workbook('MasterTemplate4.xlsx')
ws = wb.get_sheet_by_name('All')
for row_index, row in enumerate(reader):
for column_index, cell in enumerate(row):
column_letter = get_column_letter((column_index + 1))
ws.cell('%s%s'%(column_letter, (row_index + 1))).value = cell
wb.save('save location' + fl1 + '.xlsx')
The imports and defs have been done up the top globally as they are used in other sections of code
any help would be appreciated as I have been stuck on this for some weeks now, and have been unable to find any code online to deal with this issue.

Resources