Change number format using headers - openpyxl - excel

I have an Excel file in which I want to convert the number formatting from 'General' to 'Date'. I know how to do so for one column when referring to the column letter:
workbook = openpyxl.load_workbook('path\filename.xlsx')
worksheet = workbook['Sheet1']
for row in range(2, worksheet.max_row+1):
ws["{}{}".format(ColNames['Report_date'], row)].number_format='yyyy-mm-dd;#'
As you can see, I now use the column letter "D" to point out the column that I want to be formatted differently. Now, I would like to use the header in row 1 called "Start_Date" to refer to this column. I tried a method from the following post to achieve this: select a column by its name - openpyxl. However, that resulted in a KeyError: "Start_Date":
# Create a dictionary of column names
ColNames = {}
Current = 0
for COL in worksheet.iter_cols(1, worksheet.max_column):
ColNames[COL[0].value] = Current
Current += 1
for row in range(2, worksheet.max_row+1):
ws["{}{}".format(ColNames['Start_Date'], row)].number_format='yyyy-mm-dd;#'
EDIT
This method results in the following error:
AttributeError: 'tuple' object has no attribute 'number_format'
Additionally, I have more columns from which the number formatting needs to be changed. I have a list with the names of those columns:
DateColumns = ['Start_Date', 'End_Date', 'Birthday']
Is there a way that I can use the list DateColumns so that I can save some lines of code?
Thanks in advance.
Please note that I posted a similar question earlier. The following post was referred to as an answer Python: Simulating CSV.DictReader with OpenPyXL. However, I don't see how the answers in that post can be adjusted to my needs.

You need to know which columns you want to change the number format on which you have conveniently put into a list, so why not just use that list.
Get the headers in your sheet, check if the Header is in the DateColumns list, if so then update all the entries in that column from row 2 to max with the date format you want...
...
DateColumns = ['Start_Date', 'End_Date', 'Birthday']
for COL in worksheet.iter_cols(min_row=1,max_row=1):
header = COL[0]
if header.value in DateColumns:
for row in range(2, worksheet.max_row+1):
worksheet.cell(row, COL[0].column).number_format='yyyy-mm-dd;#'

Related

Reading an Excel file with united cells in Python

I have an excel table of the following type (the problem described below is driven by the presence of the united cells).
I am using read_excel from pandas to read it.
What I want: I would like to use the values in the first column as an index, and to have the values in the third column combined in one cell, e.g. like here.
What I get from directly applying read_excel can be seen here.
If needed: please see the code used to read the file below (I am reading it from google drive in google colab):
path = '/content/drive/MyDrive/ExampleFile.xlsx'
pd.read_excel(path, header = 0, index_col = 0)
Could you please help?
Please let me know if anything in the question is unclear.
here is one way to accomplish it. I created the xls similar to yours, the first column had a heading of sno
# fill the null values with values from previous rows
df=df.ffill()
# combine the rows where class is the same and create a new column
df=df.assign(comb=df.groupby(['class'])['type'].transform(lambda x: ','.join(x)))
# drop the duplicated rows
df2=df.drop_duplicates(subset=['class','comb'])[['class','comb']]
class comb
0 fruit apple,orange
2 toys car,truck,train

Get the value from another cell if a pattern matches a string in another cell of the same row

Hi,
I have an Excel Workbook with above data. I use Pandas to read through the Excel sheet in my Python script.
My requirement is if I find YES in the Primary Key column, I need to get the Data Element value of that row.
So from the above sample, I need to get the 3 Data Element's value into an array variable.
I tried with the below piece of code and couldn't achieve it.
PK_COLUMNS={}
workbook_sheet = pd.read_excel(excel_file_name,sheet_name=i,keep_default_na=False)
workbook_sheet=workbook_sheet.fillna("NULL", inplace = False)
df=pd.DataFrame(workbook_sheet,columns=[0])
total_rows=len(df.axes[0])
h=0
while h<total_rows:
CURRENT_COLUMN=workbook_sheet["Primary Key"].fillna("NULL", inplace = False)
#print(CURRENT_COLUMN)
for i in CURRENT_COLUMN:
if str(i).upper() == 'YES':
PK_COLUMNS[h]=workbook_sheet.iat[h,0].strip()
print(PK_COLUMNS)
h=h+1
else:
print ("NULL")
print(PK_COLUMNS)
Any help on this is highly appreciated. Python 3.7 with Pandas.

Python - Error populating values to spreadsheet (using xlsxwriter)

I have pulled some data from a xml file that looks as below:
Parent
Child
Action
new
ID
54467
Type
None
Group
Name
None
ID
ab
COMMENTS
HTRER
REMARKS
LKO
CUSTOMER
HELLO
In the above sample, the first row represent the header while the row below that represents the corresponding value. I am trying to have these written to a spreadsheet such that each header is written on first row with the corresponding value in the subsequent row. I am trying to do that using the code below but see that some of the values do not get written.
Given below is the code I am using to write it to a spreadsheet using xlsxwriter:
row = 0
col = 0
row1 = 1
col1 = 0
for elem in tree.iter():
worksheet.write(row, col, elem.tag)
for subelem in elem:
worksheet.write(row1, col1, subelem.text)
col +=1
Could anyone advice as to why the values are not getting correctly or does the code above needs some edit. Thanks

Comparing items in an Excel file with Openpyxl in Python

I am working with a big set of data, which has 9 rows (B3:J3 in column 3) and stretches until B1325:J1325. Using Python and the Openpyxl library, I need to get the biggest and second biggest value of each row and print those to a new field in the same row. I already assigned values to single fields manually (headings), but cannot seem to even get the max value in my range automatically written to a new field. My code looks like the following:
for row in ws.rows['B3':'J3']:
sumup = 0.0
for cell in row:
if cell.value != None:
.........
It throws the error:
for row in ws.rows['B3':'J3']:
TypeError: 'generator' object has no attribute '__getitem__'
How could I get to my goal here?
You can you iter_rows to do what you want.
Try this:
for row in ws.iter_rows('B3':'J3'):
sumup = 0.0
for cell in row:
if cell.value != None:
........
Check out this answer for more info:
How we can use iter_rows() in Python openpyxl package?

Openpyxl to check for keywords, then modify next to cells to contain those keywords and total found

I'm using python 3.x and openpyxl to parse an excel .xlsx file.
For each row, I check a column (C) to see if any of those keywords match.
If so, I add them to a separate list variable and also determine how many keywords were matched.
I then want to add the actual keywords into the next cell, and the total of keywords into the cell after. This is where I am having trouble, actually writing the results.
contents of the keywords.txt and results.xlsx file
here
import openpyxl
# Here I read a keywords.txt file and input them into a keywords variable
# I throwaway the first line to prevent a mismatch due to the unicode BOM
with open("keywords.txt") as f:
f.readline()
keywords = [line.rstrip("\n") for line in f]
# Load the workbook
wb = openpyxl.load_workbook("results.xlsx")
ws = wb.get_sheet_by_name("Sheet")
# Iterate through every row, only looking in column C for the keyword match.
for row in ws.iter_rows("C{}:E{}".format(ws.min_row, ws.max_row)):
# if there's a match, add to the keywords_found list
keywords_found = [key for key in keywords if key in row[0].value]
# if any keywords found, enter the keywords in column D
# and how many keywords into column E
if len(keywords_found):
row[1].value = keywords_found
row[2].value = len(keywords_found)
Now, I understand where I'm going wrong, in that ws.iter_rows(..) returns a tuple, which can't be modified. I figure I could two for loops, one for each row, and another for the columns in each row, but this test is a small example of a real-world scenario, where the amount of rows are in the tens of thousands.
I'm not quite sure which is the best way to go about this. Thankyou in advance for any help that you can provide.
Use the ws['C'] and then the offset() method of the relevant cell.
Thanks Charlie for the offset() tip. I modified the code slightly and now it works a treat.
for row in ws.iter_rows("C{}:C{}"...)
for cell in row:
....
if len(keywords_found):
cell.offset(0,1).value = str(keywords_found)
cell.offset(0,2).value = str(len(keywords_found))

Resources