When I am inserting formulas using openpyxl in an excel like such:
#multiple rows
#execution hit on both sides?
from openpyxl import load_workbook
wb = load_workbook(filename = 'flat_user_data.xlsx')
ws = wb.active
for i in ws.iter_rows(min_row=10, max_row= 10799):
i = i[0].row
ws[f"N{i}"] = f'=IF(COUNTIFS(A{i-1}:A{i+1},A{i},D{i-1}:D{i+1},D{i},M{i-1}:M{i+1},True)>1,IF(COUNTIFS(A{i-1}:A{i+1},A{i},D{i-1}:D{i+1},D{i},M{i-1}:M{i+1},True)=2,True,"INVESTIGATE"),False)'
wb.save('flat_user_data.xlsx')
When I load the subsequent df I get for these rows only NaN values..
However when I open the excel file and save it these values become the
Is there a way to open the excel file, read the contents via code so that this does not need to happen manually?
Would love to hear your thoughts! Please help
Related
I'm iterating over a pandas dataframe during and carry out and operation to obtain the information (excel sheet number) for saving the appropriate excel sheet like this:
from opnepyxl.utils.dataframe import dataframe_to_rows
for i,data in df.iterows:
sheet=data['SheetNo']
#Create excel writer
writer=pd.Excelwriter('output.xlxs')
# write dataframe to excelsheet
data.to_excel(writer, sheet)
#save the excel file
writer.save()
Dataframe:
ID SheetNo setting
2304 2 IGV5
2305 3 IGV2
2306 1 IGV6
2307 2 IGV2
2308 1 IGV1
What I wanted was for data to go into each of the created 'SheetNo' of the excel file, instead the the previous sheet is being overwritten by the following one, and you can only see the last sheet number.
What can I do to make this code work? Any other approach apart from mine above will be welcome.
the python code is below. Note that:
Assumption is that there is already an output.xlsx file in same dir that this code runs
It will search for worksheet names with the SheetNo column. If not available in the file, it will create a new worksheet/tab with that name and add the header row
3.The program will then add each row (append) to the sheet
Once all data in DF is added, it will save file back to same name
You can run this as many times as you want, it will keep adding new sheets or appending to existing sheets.
import pandas as pd
from openpyxl import load_workbook
data = {'ID': [2304,2305,2306,2307,2308],
'SheetNo': [2,3,1,2,1],
'setting': ['IGV5', 'IGV2', 'IGV6', 'IGV2', 'IGV1']}
df = pd.DataFrame(data)
headers = ['ID','SheetNo','setting']
FilePath = 'output.xlsx' #ASSUMPTON - File already exists
wb = load_workbook(FilePath)
for sheet in df.SheetNo.unique():
if str(sheet) in wb.sheetnames: #If sheet in excel file
ws = wb[str(sheet)]
else:
ws = wb.create_sheet(title=str(sheet)) #Create New sheet is not present
ws.append(headers) #New sheet = Need header, else not required
i =0
j = 0
for row in ws.iter_rows(min_row=ws.max_row+1, max_row=ws.max_row+len(df[df.SheetNo == sheet]), min_col=1, max_col=3):
for cell in row:
cell.value = df[df.SheetNo == sheet].iloc[i,j]
j = j + 1
j=0
i= i + 1
wb.save('output.xlsx')
Output excel after one run of the code
I ma trying to use pandas dataframes to read the last sheet of a spreadsheet since I don't need the rest. how do I tell python just to take the last one? I can not find a flag on the documentation that says how to do this. I can specify the sheet with sheet_name flag but this does not work for me since I don't know how many sheets I have
raw_excel = pd.read_excel(path, sheet_name=0)
You can use the ExcelFile function.
xl = pd.ExcelFile(path)
# See all sheet names
sheet_names = xl.sheet_names
# Last sheet name
last_sheet = sheet_names[-1]
# Read a last sheet to DataFrame
xl.parse(last_sheet)
I (newcomer) try to read from an excel document several tables and read in a new format in a single csv.
In the csv, i need the following fields: year (from a global variable), month (from a global variable), outlet (name of the tablesheet); rowvalue [a] (string to explain the row), columnvalue [1] (string to explain the cloumn), cellvalue (float)
The corresponding values must then be entered in these.
From the respective tables, only RowNum 6 to 89 need to be read
#BWA-Reader
#read the excel spreadsheet with all sheets
#Python 3.6
Importe
import openpyxl
import xlrd
from PIL import Image as PILImage
import csv
# year value of the Business analysis
year = "2018"
# month value of the Business analysis
month = "11"
# .xlxs path
wb = openpyxl.load_workbook("BWA Zusammenfassung 18-11.xlsx")
print("Found your Spreadsheet")
# List of sheets
sheets = wb.get_sheet_names()
# remove unneccessary sheets
list_to_remove = ("P",'APn','AP')
sheets_clean = list(set(sheets).difference(set(list_to_remove)))
print("sheets to load: " + str(sheets_clean))
# for loop for every sheet based on sheets_clean
for sheet in sheets_clean:
# for loop to build list for row and cell value
all_rows = []
for row in wb[sheet].rows:
current_row = []
for cell in row:
current_row.append (cell.value)
all_rows.append(current_row)
print(all_rows)
# i´m stucked -.-´
I expect an output like:
2018;11;Oldenburg;total_sales;monthly;145840.00
all sheets in one csv
Thank you so much for every idea how to solve my project!
The complete answer to this question is very dependent on the actual dataset.
I would recommend looking into pandas' read_excel() function. This will make it so much easier to extract the needed rows/columns/cells, all without looping through all of the sheets.
You might need some tutorials on pandas in order to get there, but judging by what you are trying to do, pandas might be a useful skill to have in the future!
Below is the python code to load a excel workbook and write some data on the sheet.
import openpyxl as op
from openpyxl import Workbook
new_excel = op.load_workbook('SpreadSheet.xlsx',read_only=False, keep_vba= True)
spreadsheet = new_excel .get_sheet_by_name('Input Quote')
spreadsheet['B30'] = 'VAU'
spreadsheet['D30'] = 1000
spreadsheet['F30'] = 5000
1) How can save this workbook in a separate excel(.xlsx) file?
2) If the
excel has formulas , how can they be automatically triggered by
openpyxl API ?
1.
import openpyxl
workbook = openpyxl.load_workbook('SpreadSheet.xlsx')
workbook.save('NewWorkbook.xlsx')
2.
I am not sure if there is a better way to handle this, but when I need to check to see if there is a formula, I get the value of the cell (formulas are just strings) and do some data validation to check what the formula is. Then I replicate the functionality via Python, and output the appropriate data.
import openpyxl
wb=openpyxl.load_workbook('Book_1.xlsx')
ws=wb['Sheet_1']
I am trying to analyze an excel spreadsheet using openpyxl. My goal is to get the max number from column D for each group of numbers in column A. I would like help in getting a code to loop for the analysis. Here is an example of the spreadsheet that I am trying to analyze. The file name is Book 1 and the sheet name is Sheet 1. I am running Python 3.6.1, pandas 0.20.1, and openpyxl 2.4.7. I am providing the code I have so far.
IIUC, use pandas module to achieve this:
import pandas as pd
df = pd.read_excel('yourfile.xlsx')
maxdf = df.groupby('ID').max()
maxdf will have the result you are looking for.
Let's say you have file test.xlsx with worksheet ws1. Try:
from openpyxl import load_workbook
wb = load_workbook(filename='test.xlsx')
ws = wb['ws1']
for col in ws.columns:
col_max = 0
for cell in col:
if cell.value > col_max:
col_max = cell.value
print('next max:', col_max)
I'm looping over all the rows because I'm not sure what you've expected.