I wrote my code using xlrd package for extracting specific information from excel file with multiple sheets. I partial match a string and get the value in the next column, and sometimes, I get the values in the next row for the same column depending on the requirement.
The below code is a part of my code using xlrd which works fine to pick value in the next column:
import xlrd
workbook = xlrd.open_workbook('sample_data.xlsx')
for sheet in workbook.sheets():
for rowidx in range(sheet.nrows):
row = sheet.row(rowidx)
row_val = sheet.row_values(rowidx)
for colidx, cell in enumerate(row):
if cell.value == "Student number":
sheet_name.append(sheet.name)
print("Sheet Name =", sheet.name)
customer_num.append(sheet.cell(rowidx,colidx+1).value)
print(cell.value + "=" , sheet.cell(rowidx,colidx+1).value)
But I now need to use openpyxl instead of xlrd to achieve this. It's a technical requirement. And I'm unable to find proper counterparts from the openpyxl package. I'm pretty new to Python too.
It would be very helpful and time saving if someone who has good knowledge of both xlrd and openpyxl can help me on how to replicate my above code using openpyxl. Thanks a lot.
for ws in wb:
for row in ws:
for cell in row:
if cell.value == "Student number":
print(sheet.title)
print("{0} = {1}".format(cell.value, cell.offset(column=1).value))
Related
I ma trying to use pandas dataframes to read the last sheet of a spreadsheet since I don't need the rest. how do I tell python just to take the last one? I can not find a flag on the documentation that says how to do this. I can specify the sheet with sheet_name flag but this does not work for me since I don't know how many sheets I have
raw_excel = pd.read_excel(path, sheet_name=0)
You can use the ExcelFile function.
xl = pd.ExcelFile(path)
# See all sheet names
sheet_names = xl.sheet_names
# Last sheet name
last_sheet = sheet_names[-1]
# Read a last sheet to DataFrame
xl.parse(last_sheet)
I'm brand new to coding and to this forum, so please accept my apologies in advance for being a newbie and probably not understanding what i'm supposed to say!
I was asked a question which I didn't know how to approach earlier. The user was trying to collect cell values in multiple rows from Excel (split out by a delimiter) and then create one complete column of single values in rows. Example in picture1 below. Source file is how the data is received and output is what the user is trying to do with it:
I hope I have explained that correctly. I'm looking for some python code that will automate it. There could be thousands of values that need putting into rows
Thanks in advance!
Andy
Have a look at the openpyxl package:
https://openpyxl.readthedocs.io/en/stable/index.html
This allows you to directly access cells in your excel sheet within python.
As some of your cells seem to contain multiple values separated by semicolons you could read the cells as strings and use the
splitstring = somelongstring.split(';')
to seperate the values. This results in a list containing the separated values
Basic manipulations using this package are described in this tutorial:
https://openpyxl.readthedocs.io/en/stable/tutorial.html
Edit:
An example iterating over all columns in a worksheet would be:
from openpyxl import load_workbook
wb = load_workbook('test.xlsx')
for row in wb.iter_cols(values_only=True):
for value in row:
do_something(value)
I was able to find some code online and butcher is to get what I needed. Here is the code I ended up with:
import pandas as pd
iris = pd.read_csv('iris.csv')
from itertools import chain
# return list from series of comma-separated strings
def chainer(s):
return list(chain.from_iterable(s.str.split(',')))
# calculate lengths of splits
lens = iris['Order No'].str.split(',').map(len)
# create new dataframe, repeating or chaining as appropriate
res = pd.DataFrame({'Order No': np.repeat(iris['Order No'], lens),'Order No': chainer(iris['Order No'])})
I (newcomer) try to read from an excel document several tables and read in a new format in a single csv.
In the csv, i need the following fields: year (from a global variable), month (from a global variable), outlet (name of the tablesheet); rowvalue [a] (string to explain the row), columnvalue [1] (string to explain the cloumn), cellvalue (float)
The corresponding values must then be entered in these.
From the respective tables, only RowNum 6 to 89 need to be read
#BWA-Reader
#read the excel spreadsheet with all sheets
#Python 3.6
Importe
import openpyxl
import xlrd
from PIL import Image as PILImage
import csv
# year value of the Business analysis
year = "2018"
# month value of the Business analysis
month = "11"
# .xlxs path
wb = openpyxl.load_workbook("BWA Zusammenfassung 18-11.xlsx")
print("Found your Spreadsheet")
# List of sheets
sheets = wb.get_sheet_names()
# remove unneccessary sheets
list_to_remove = ("P",'APn','AP')
sheets_clean = list(set(sheets).difference(set(list_to_remove)))
print("sheets to load: " + str(sheets_clean))
# for loop for every sheet based on sheets_clean
for sheet in sheets_clean:
# for loop to build list for row and cell value
all_rows = []
for row in wb[sheet].rows:
current_row = []
for cell in row:
current_row.append (cell.value)
all_rows.append(current_row)
print(all_rows)
# i´m stucked -.-´
I expect an output like:
2018;11;Oldenburg;total_sales;monthly;145840.00
all sheets in one csv
Thank you so much for every idea how to solve my project!
The complete answer to this question is very dependent on the actual dataset.
I would recommend looking into pandas' read_excel() function. This will make it so much easier to extract the needed rows/columns/cells, all without looping through all of the sheets.
You might need some tutorials on pandas in order to get there, but judging by what you are trying to do, pandas might be a useful skill to have in the future!
Below is the python code to load a excel workbook and write some data on the sheet.
import openpyxl as op
from openpyxl import Workbook
new_excel = op.load_workbook('SpreadSheet.xlsx',read_only=False, keep_vba= True)
spreadsheet = new_excel .get_sheet_by_name('Input Quote')
spreadsheet['B30'] = 'VAU'
spreadsheet['D30'] = 1000
spreadsheet['F30'] = 5000
1) How can save this workbook in a separate excel(.xlsx) file?
2) If the
excel has formulas , how can they be automatically triggered by
openpyxl API ?
1.
import openpyxl
workbook = openpyxl.load_workbook('SpreadSheet.xlsx')
workbook.save('NewWorkbook.xlsx')
2.
I am not sure if there is a better way to handle this, but when I need to check to see if there is a formula, I get the value of the cell (formulas are just strings) and do some data validation to check what the formula is. Then I replicate the functionality via Python, and output the appropriate data.
import openpyxl
wb=openpyxl.load_workbook('Book_1.xlsx')
ws=wb['Sheet_1']
I am trying to analyze an excel spreadsheet using openpyxl. My goal is to get the max number from column D for each group of numbers in column A. I would like help in getting a code to loop for the analysis. Here is an example of the spreadsheet that I am trying to analyze. The file name is Book 1 and the sheet name is Sheet 1. I am running Python 3.6.1, pandas 0.20.1, and openpyxl 2.4.7. I am providing the code I have so far.
IIUC, use pandas module to achieve this:
import pandas as pd
df = pd.read_excel('yourfile.xlsx')
maxdf = df.groupby('ID').max()
maxdf will have the result you are looking for.
Let's say you have file test.xlsx with worksheet ws1. Try:
from openpyxl import load_workbook
wb = load_workbook(filename='test.xlsx')
ws = wb['ws1']
for col in ws.columns:
col_max = 0
for cell in col:
if cell.value > col_max:
col_max = cell.value
print('next max:', col_max)
I'm looping over all the rows because I'm not sure what you've expected.