Problems with Program using openpyxl [duplicate] - excel

import openpyxl
wb=openpyxl.load_workbook('Book_1.xlsx')
ws=wb['Sheet_1']
I am trying to analyze an excel spreadsheet using openpyxl. My goal is to get the max number from column D for each group of numbers in column A. I would like help in getting a code to loop for the analysis. Here is an example of the spreadsheet that I am trying to analyze. The file name is Book 1 and the sheet name is Sheet 1. I am running Python 3.6.1, pandas 0.20.1, and openpyxl 2.4.7. I am providing the code I have so far.

IIUC, use pandas module to achieve this:
import pandas as pd
df = pd.read_excel('yourfile.xlsx')
maxdf = df.groupby('ID').max()
maxdf will have the result you are looking for.

Let's say you have file test.xlsx with worksheet ws1. Try:
from openpyxl import load_workbook
wb = load_workbook(filename='test.xlsx')
ws = wb['ws1']
for col in ws.columns:
col_max = 0
for cell in col:
if cell.value > col_max:
col_max = cell.value
print('next max:', col_max)
I'm looping over all the rows because I'm not sure what you've expected.

Related

Iterating over pandas dataframe and saving into separate sheets in xlxs file

I'm iterating over a pandas dataframe during and carry out and operation to obtain the information (excel sheet number) for saving the appropriate excel sheet like this:
from opnepyxl.utils.dataframe import dataframe_to_rows
for i,data in df.iterows:
sheet=data['SheetNo']
#Create excel writer
writer=pd.Excelwriter('output.xlxs')
# write dataframe to excelsheet
data.to_excel(writer, sheet)
#save the excel file
writer.save()
Dataframe:
ID SheetNo setting
2304 2 IGV5
2305 3 IGV2
2306 1 IGV6
2307 2 IGV2
2308 1 IGV1
What I wanted was for data to go into each of the created 'SheetNo' of the excel file, instead the the previous sheet is being overwritten by the following one, and you can only see the last sheet number.
What can I do to make this code work? Any other approach apart from mine above will be welcome.
the python code is below. Note that:
Assumption is that there is already an output.xlsx file in same dir that this code runs
It will search for worksheet names with the SheetNo column. If not available in the file, it will create a new worksheet/tab with that name and add the header row
3.The program will then add each row (append) to the sheet
Once all data in DF is added, it will save file back to same name
You can run this as many times as you want, it will keep adding new sheets or appending to existing sheets.
import pandas as pd
from openpyxl import load_workbook
data = {'ID': [2304,2305,2306,2307,2308],
'SheetNo': [2,3,1,2,1],
'setting': ['IGV5', 'IGV2', 'IGV6', 'IGV2', 'IGV1']}
df = pd.DataFrame(data)
headers = ['ID','SheetNo','setting']
FilePath = 'output.xlsx' #ASSUMPTON - File already exists
wb = load_workbook(FilePath)
for sheet in df.SheetNo.unique():
if str(sheet) in wb.sheetnames: #If sheet in excel file
ws = wb[str(sheet)]
else:
ws = wb.create_sheet(title=str(sheet)) #Create New sheet is not present
ws.append(headers) #New sheet = Need header, else not required
i =0
j = 0
for row in ws.iter_rows(min_row=ws.max_row+1, max_row=ws.max_row+len(df[df.SheetNo == sheet]), min_col=1, max_col=3):
for cell in row:
cell.value = df[df.SheetNo == sheet].iloc[i,j]
j = j + 1
j=0
i= i + 1
wb.save('output.xlsx')
Output excel after one run of the code

Formulas are nan values in Pandas/Openpyxl

When I am inserting formulas using openpyxl in an excel like such:
#multiple rows
#execution hit on both sides?
from openpyxl import load_workbook
wb = load_workbook(filename = 'flat_user_data.xlsx')
ws = wb.active
for i in ws.iter_rows(min_row=10, max_row= 10799):
i = i[0].row
ws[f"N{i}"] = f'=IF(COUNTIFS(A{i-1}:A{i+1},A{i},D{i-1}:D{i+1},D{i},M{i-1}:M{i+1},True)>1,IF(COUNTIFS(A{i-1}:A{i+1},A{i},D{i-1}:D{i+1},D{i},M{i-1}:M{i+1},True)=2,True,"INVESTIGATE"),False)'
wb.save('flat_user_data.xlsx')
When I load the subsequent df I get for these rows only NaN values..
However when I open the excel file and save it these values become the
Is there a way to open the excel file, read the contents via code so that this does not need to happen manually?
Would love to hear your thoughts! Please help

Having trouble finding openpyxl equivalent of xlrd objects

I wrote my code using xlrd package for extracting specific information from excel file with multiple sheets. I partial match a string and get the value in the next column, and sometimes, I get the values in the next row for the same column depending on the requirement.
The below code is a part of my code using xlrd which works fine to pick value in the next column:
import xlrd
workbook = xlrd.open_workbook('sample_data.xlsx')
for sheet in workbook.sheets():
for rowidx in range(sheet.nrows):
row = sheet.row(rowidx)
row_val = sheet.row_values(rowidx)
for colidx, cell in enumerate(row):
if cell.value == "Student number":
sheet_name.append(sheet.name)
print("Sheet Name =", sheet.name)
customer_num.append(sheet.cell(rowidx,colidx+1).value)
print(cell.value + "=" , sheet.cell(rowidx,colidx+1).value)
But I now need to use openpyxl instead of xlrd to achieve this. It's a technical requirement. And I'm unable to find proper counterparts from the openpyxl package. I'm pretty new to Python too.
It would be very helpful and time saving if someone who has good knowledge of both xlrd and openpyxl can help me on how to replicate my above code using openpyxl. Thanks a lot.
for ws in wb:
for row in ws:
for cell in row:
if cell.value == "Student number":
print(sheet.title)
print("{0} = {1}".format(cell.value, cell.offset(column=1).value))

Python: How to read multiple spreadsheets into a new format in a CSV?

I (newcomer) try to read from an excel document several tables and read in a new format in a single csv.
In the csv, i need the following fields: year (from a global variable), month (from a global variable), outlet (name of the tablesheet); rowvalue [a] (string to explain the row), columnvalue [1] (string to explain the cloumn), cellvalue (float)
The corresponding values must then be entered in these.
From the respective tables, only RowNum 6 to 89 need to be read
#BWA-Reader
#read the excel spreadsheet with all sheets
#Python 3.6
Importe
import openpyxl
import xlrd
from PIL import Image as PILImage
import csv
# year value of the Business analysis
year = "2018"
# month value of the Business analysis
month = "11"
# .xlxs path
wb = openpyxl.load_workbook("BWA Zusammenfassung 18-11.xlsx")
print("Found your Spreadsheet")
# List of sheets
sheets = wb.get_sheet_names()
# remove unneccessary sheets
list_to_remove = ("P",'APn','AP')
sheets_clean = list(set(sheets).difference(set(list_to_remove)))
print("sheets to load: " + str(sheets_clean))
# for loop for every sheet based on sheets_clean
for sheet in sheets_clean:
# for loop to build list for row and cell value
all_rows = []
for row in wb[sheet].rows:
current_row = []
for cell in row:
current_row.append (cell.value)
all_rows.append(current_row)
print(all_rows)
# i´m stucked -.-´
I expect an output like:
2018;11;Oldenburg;total_sales;monthly;145840.00
all sheets in one csv
Thank you so much for every idea how to solve my project!
The complete answer to this question is very dependent on the actual dataset.
I would recommend looking into pandas' read_excel() function. This will make it so much easier to extract the needed rows/columns/cells, all without looping through all of the sheets.
You might need some tutorials on pandas in order to get there, but judging by what you are trying to do, pandas might be a useful skill to have in the future!

Python 3 - OpenPyxl - How to write edited excel workbook to new excel file?

Below is the python code to load a excel workbook and write some data on the sheet.
import openpyxl as op
from openpyxl import Workbook
new_excel = op.load_workbook('SpreadSheet.xlsx',read_only=False, keep_vba= True)
spreadsheet = new_excel .get_sheet_by_name('Input Quote')
spreadsheet['B30'] = 'VAU'
spreadsheet['D30'] = 1000
spreadsheet['F30'] = 5000
1) How can save this workbook in a separate excel(.xlsx) file?
2) If the
excel has formulas , how can they be automatically triggered by
openpyxl API ?
1.
import openpyxl
workbook = openpyxl.load_workbook('SpreadSheet.xlsx')
workbook.save('NewWorkbook.xlsx')
2.
I am not sure if there is a better way to handle this, but when I need to check to see if there is a formula, I get the value of the cell (formulas are just strings) and do some data validation to check what the formula is. Then I replicate the functionality via Python, and output the appropriate data.

Resources