I am trying to average out values within a column at a certain range. I tried listing out the range as a tuple then for looping to be able to get the cell value. I then created a variable for the average but get the error 'TypeError: 'float' object is not iterable.
range1 = ws["A2":"A6]
for cell in range1:
for x in cell:
average = sum(x.value)/len(x.value)
print(average)
Python and the Openpyxl API makes this kind of thing very easy.
rows = ws.iter_rows(min_row=2, max_row=6, max_col=1, values_only=True)
values = [row[0] for row in rows]
avg = sum(values) / len(values)
But you should probably check that the cells contain numbers, otherwise you'll see an exception.
Something like this will get you the mean of the cells.
import openpyxl as op
def main():
wb = op.load_workbook(filename='C:\\Users\\####\\Desktop\\SO nonsense\\Book1.xlsm')
range1 = wb['Sheet1']['A2:A6']
cellsum = 0
for i, cell in enumerate(range1, 1):
print(i)
cellsum += cell[0].value
print(cellsum / i)
main()
Is it possible to extract data that I've written to a xlsxwriter.worksheet?
import xlsxwriter
output = "test.xlsx"
workbook = xlsxwriter.Workbook(output)
worksheet = workbook.add_worksheet()
worksheet.write(0, 0, 'top left')
if conditional:
worksheet.write(1, 1, 'bottom right')
for row in range(2):
for col in range(2):
# Now how can I check if a value was written at this coordinate?
# something like worksheet.get_value_at_row_col(row, col)
workbook.close()
Is it possible to extract data that I've written to a xlsxwriter.worksheet?
Yes. Even though XlsxWriter is write only, it stores the table values in an internal structure and only writes them to file when workbook.close() is executed.
Every Worksheet has a table attribute. It is a dictionary, containing entries for all populated rows (row numbers starting at 0 are the keys). These entries are again dictionaries, containing entries for all populated cells within the row (column numbers starting at 0 are the keys).
Therefore, table[row][col] will give you the entry at the desired position (but only in case there is an entry, it will fail otherwise).
Note that these entries are still not the text, number or formula you are looking for, but named tuples, which also contain the cell format. You can type check the entries and extract the contents depending on their nature. Here are the possible outcomes of type(entry) and the fields of the named tuples that are accessible:
xlsxwriter.worksheet.cell_string_tuple: string, format
xlsxwriter.worksheet.cell_number_tuple: number, format
xlsxwriter.worksheet.cell_blank_tuple: format
xlsxwriter.worksheet.cell_boolean_tuple: boolean, format
xlsxwriter.worksheet.cell_formula_tuple: formula, format, value
xlsxwriter.worksheet.cell_arformula_tuple: formula, format, value, range
For numbers, booleans, and formulae, the contents can be accessed by reading the respective field of the named tuple.
For array formulae, the contents are only present in the upper left cell of the output range, while the rest of the cells are represented by number entries with 0 value.
For strings, the situation is more complicated, since Excel's storage concept has a shared string table, while the individual cell entries only point to an index of this table. The shared string table can be accessed as the str_table.string_table attribute of the worksheet. It is a dictionary, where the keys are strings and the values are the associated indices. In order to access the strings by index, you can generate a sorted list from the dictionary as follows:
shared_strings = sorted(worksheet.str_table.string_table, key=worksheet.str_table.string_table.get)
I expanded your example from above to include all the explained features. It now looks like this:
import xlsxwriter
output = "test.xlsx"
workbook = xlsxwriter.Workbook(output)
worksheet = workbook.add_worksheet()
worksheet.write(0, 0, 'top left')
worksheet.write(0, 1, 42)
worksheet.write(0, 2, None)
worksheet.write(2, 1, True)
worksheet.write(2, 2, '=SUM(X5:Y7)')
worksheet.write_array_formula(2,3,3,4, '{=TREND(X5:X7,Y5:Y7)}')
worksheet.write(4,0, 'more text')
worksheet.write(4,1, 'even more text')
worksheet.write(4,2, 'more text')
worksheet.write(4,3, 'more text')
for row in range(5):
row_dict = worksheet.table.get(row, None)
for col in range(5):
if row_dict != None:
col_entry = row_dict.get(col, None)
else:
col_entry = None
print(row,col,col_entry)
shared_strings = sorted(worksheet.str_table.string_table, key=worksheet.str_table.string_table.get)
print()
if type(worksheet.table[0][0]) == xlsxwriter.worksheet.cell_string_tuple:
print(shared_strings[worksheet.table[0][0].string])
# type checking omitted for the rest...
print(worksheet.table[0][1].number)
print(bool(worksheet.table[2][1].boolean))
print('='+worksheet.table[2][2].formula)
print('{='+worksheet.table[2][3].formula+'}')
workbook.close()
Is it possible to extract data that I've written to a xlsxwriter.worksheet?
No. XlsxWriter is write only. If you need to keep track of your data you will need to do it in your own code, outside of XlsxWriter.
I (newcomer) try to read from an excel document several tables and read in a new format in a single csv.
In the csv, i need the following fields: year (from a global variable), month (from a global variable), outlet (name of the tablesheet); rowvalue [a] (string to explain the row), columnvalue [1] (string to explain the cloumn), cellvalue (float)
The corresponding values must then be entered in these.
From the respective tables, only RowNum 6 to 89 need to be read
#BWA-Reader
#read the excel spreadsheet with all sheets
#Python 3.6
Importe
import openpyxl
import xlrd
from PIL import Image as PILImage
import csv
# year value of the Business analysis
year = "2018"
# month value of the Business analysis
month = "11"
# .xlxs path
wb = openpyxl.load_workbook("BWA Zusammenfassung 18-11.xlsx")
print("Found your Spreadsheet")
# List of sheets
sheets = wb.get_sheet_names()
# remove unneccessary sheets
list_to_remove = ("P",'APn','AP')
sheets_clean = list(set(sheets).difference(set(list_to_remove)))
print("sheets to load: " + str(sheets_clean))
# for loop for every sheet based on sheets_clean
for sheet in sheets_clean:
# for loop to build list for row and cell value
all_rows = []
for row in wb[sheet].rows:
current_row = []
for cell in row:
current_row.append (cell.value)
all_rows.append(current_row)
print(all_rows)
# i´m stucked -.-´
I expect an output like:
2018;11;Oldenburg;total_sales;monthly;145840.00
all sheets in one csv
Thank you so much for every idea how to solve my project!
The complete answer to this question is very dependent on the actual dataset.
I would recommend looking into pandas' read_excel() function. This will make it so much easier to extract the needed rows/columns/cells, all without looping through all of the sheets.
You might need some tutorials on pandas in order to get there, but judging by what you are trying to do, pandas might be a useful skill to have in the future!
import openpyxl
wb=openpyxl.load_workbook('Book_1.xlsx')
ws=wb['Sheet_1']
I am trying to analyze an excel spreadsheet using openpyxl. My goal is to get the max number from column D for each group of numbers in column A. I would like help in getting a code to loop for the analysis. Here is an example of the spreadsheet that I am trying to analyze. The file name is Book 1 and the sheet name is Sheet 1. I am running Python 3.6.1, pandas 0.20.1, and openpyxl 2.4.7. I am providing the code I have so far.
IIUC, use pandas module to achieve this:
import pandas as pd
df = pd.read_excel('yourfile.xlsx')
maxdf = df.groupby('ID').max()
maxdf will have the result you are looking for.
Let's say you have file test.xlsx with worksheet ws1. Try:
from openpyxl import load_workbook
wb = load_workbook(filename='test.xlsx')
ws = wb['ws1']
for col in ws.columns:
col_max = 0
for cell in col:
if cell.value > col_max:
col_max = cell.value
print('next max:', col_max)
I'm looping over all the rows because I'm not sure what you've expected.
I am working with a big set of data, which has 9 rows (B3:J3 in column 3) and stretches until B1325:J1325. Using Python and the Openpyxl library, I need to get the biggest and second biggest value of each row and print those to a new field in the same row. I already assigned values to single fields manually (headings), but cannot seem to even get the max value in my range automatically written to a new field. My code looks like the following:
for row in ws.rows['B3':'J3']:
sumup = 0.0
for cell in row:
if cell.value != None:
.........
It throws the error:
for row in ws.rows['B3':'J3']:
TypeError: 'generator' object has no attribute '__getitem__'
How could I get to my goal here?
You can you iter_rows to do what you want.
Try this:
for row in ws.iter_rows('B3':'J3'):
sumup = 0.0
for cell in row:
if cell.value != None:
........
Check out this answer for more info:
How we can use iter_rows() in Python openpyxl package?