I have a pandas dataframe that looks as shown in the screenshot. I want to apply conditional formatting using xlsxwriter to make the values of column "C" bold if column "B" value is "Total".
The below code doesnot seem to work
bold = workbook.add_format({'bold': True})
l = ['C3:C500']
for columns in l:
worksheet.conditional_format(columns, {'type': 'text',
'criteria': 'containing',
'value': 'Total',
'font_color': "gray"})
Here is my updated code:
l = ['C3:C500']
for columns in l:
worksheet.conditional_format(columns, {'type': 'formula',
'criteria': '=$B3="Total"',
'format': bold})
worksheet.conditional_format(columns, {'type': 'formula',
'criteria': '=$B3!="Total"',
'font_color': "gray"})
The key to using conditional formats in XlsxWriter is to figure out what you want to do in Excel first.
In this case if you want to format a cell based on the value in another cell you need to use the "formula" conditional format type. You also need to make sure that you get the range and absolute values (the ones with $ signs) correct.
Here is a working example based on your code:
import xlsxwriter
workbook = xlsxwriter.Workbook('conditional_format.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write('B3', 'Total')
worksheet.write('B4', 'Foo')
worksheet.write('B5', 'Bar')
worksheet.write('B6', 'Total')
worksheet.write('C3', 'Yes')
worksheet.write('C4', 'Yes')
worksheet.write('C5', 'Yes')
worksheet.write('C6', 'Yes')
bold = workbook.add_format({'bold': True})
l = ['C3:C500']
for columns in l:
worksheet.conditional_format(columns, {'type': 'formula',
'criteria': '=$B3="Total"',
'format': bold})
workbook.close()
Output:
Related
I have a csv file. After doing certain process, it has to be saved as an excel file.
I am opening it as pandas dataframe and after doing some cleaning (renaming and rearranging columns, dropping few columns), i have to replace null values or if the cell value is "N/A" to "DN". Currently i am using two lines of code for this.
df.replace('', np.nan, inplace = True)
df.replace('N/A', np.nan, inplace = True)
df = df.fillna("DN")
Then, i have to highlight cells which has the value "DN" with yellow color
I am trying with the code mentioned in this post How Do I Highlight Rows Of Data? Python Pandas issue. But in the output excel nothing is getting highlighted. Below is the code i am currently working with
df.replace('', np.nan, inplace = True)
df.replace('N/A', np.nan, inplace = True)
df = df.fillna("NA")
df.index = np.arange(1, len(df) + 1)
def high_color(val):
color = 'yellow' if val == 'NA' else ''
return 'color: {}'.format(color)
result = df.style.applymap(high_color)
writer_orig = pd.ExcelWriter(out_name, engine='xlsxwriter')
df.to_excel(writer_orig, sheet_name='report', index=True, index_label="S_No", freeze_panes=(1,1))
workbook = writer_orig.book
worksheet = writer_orig.sheets['report']
# Add a header format.
header_format = workbook.add_format({
'bold': True,
'fg_color': '#ffcccc',
'border': 1})
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num + 1, value, header_format)
writer_orig.close()
Any kind of suggestions will be greatly helpful.
You can't save a Styler Object to an Excel spreadsheet by using pandas.ExcelWriter.
class pandas.ExcelWriter(path, engine=None, date_format=None,
datetime_format=None, mode='w', storage_options=None,
if_sheet_exists=None, engine_kwargs=None, **kwargs)
Class for writing DataFrame objects into excel sheets.
You need to use worksheet.conditional_format from xlsxwriter to highlight a value in every cell. Also, you can pass na_values as a kwarg to pandas.read_csv to automatically consider a list of values as NaN.
from xlsxwriter.utility import xl_rowcol_to_cell
df = pd.read_csv('/tmp/inputfile.csv', na_values=['', 'N/A']).fillna('DN')
l = df.columns.get_indexer(df.columns).tolist()
xshape = list(map(xl_col_to_name, [e+1 for e in l]))
max_row, max_col = df.shape
with pd.ExcelWriter("/tmp/outputfile.xlsx") as writer:
df.to_excel(writer, sheet_name='report', index=True,
index_label='S_No', freeze_panes=(1,1))
wb = writer.book
ws = writer.sheets['report']
format_header = wb.add_format({'bold': True, 'fg_color': '#ffcccc', 'border': 1})
for idx, col in enumerate(['S_No'] + list(df.columns)):
ws.write(0, idx, col, format_header)
format_dn = wb.add_format({'bg_color':'yellow', 'font_color': 'black'})
ws.conditional_format(f'{xshape[0]}2:{xshape[-1]}{str(max_row+1)}',
{'type': 'cell', 'criteria': '==',
'value': '"DN"', 'format': format_dn})
Output :
You have to export to excel with result Styler:
# Demo
def high_color(val):
return 'background-color: yellow' if val == 'NA' else None
result = df.style.applymap(high_color)
result.to_excel('styler1.xlsx')
df.to_excel('styler2.xlsx')
Export from result
Export from df
Name = [list(['Amy', 'A', 'Angu']),
list(['Jon', 'Johnson']),
list(['Bob', 'Barker'])]
Other = [list(['Amy', 'Any', 'Anguish']),
list(['Jon', 'Jan']),
list(['Baker', 'barker'])]
import pandas as pd
df = pd.DataFrame({'Other' : Other,
'ID': ['E123','E456','E789'],
'Other_ID': ['A123','A456','A789'],
'Name' : Name,
})
ID Name Other Other_ID
0 E123 [Amy, A, Angu] [Amy, Any, Anguish] A123
1 E456 [Jon, Johnson] [Jon, Jan] A456
2 E789 [Bob, Barker] [Baker, barker] A789
I have the df as seen above. I want to make columns ID, Name and Other into a dictionary with they key being ID. I tried this according to python pandas dataframe columns convert to dict key and value
todict = dict(zip(df.ID, df.Name))
Which is close to what I want
{'E123': ['Amy', 'A', 'Angu'],
'E456': ['Jon', 'Johnson'],
'E789': ['Bob', 'Barker']}
But I would like to get this output that includes values from Other column
{'E123': ['Amy', 'A', 'Angu','Amy', 'Any','Anguish'],
'E456': ['Jon', 'Johnson','Jon','Jan'],
'E789': ['Bob', 'Barker','Baker','barker']
}
And If I put the third column Other it gives me errors
todict = dict(zip(df.ID, df.Name, df.Other))
How do I get the output I want?
Why not just combine the Name and Other column before creating a dict of the Name column.
df['Name'] = df['Name'] + df['Other']
dict(zip(df.ID, df.Name))
Gives
{'E123': ['Amy', 'A', 'Angu', 'Amy', 'Any', 'Anguish'],
'E456': ['Jon', 'Johnson', 'Jon', 'Jan'],
'E789': ['Bob', 'Barker', 'Baker', 'barker']}
I have a dataframe as below. I want to apply conditional formatting on column "Data2" using the column name. I know how to define format for a specific column but I am not sure how to define it based on column name as shown below.
So basically I want to do the same formatting on column name(because the order of column might change)
df1 = pd.DataFrame({'Data1': [10, 20, 30],
'Data2': ["a", "b", "c"]})
writer = pd.ExcelWriter('pandas_filter.xlsx', engine='xlsxwriter', )
workbook = writer.book
df1.to_excel(writer, sheet_name='Sheet1', index=False)
worksheet = writer.sheets['Sheet1']
blue = workbook.add_format({'bg_color':'#000080', 'font_color': 'white'})
red = workbook.add_format({'bg_color':'#E52935', 'font_color': 'white'})
l = ['B2:B500']
for columns in l:
worksheet.conditional_format(columns, {'type': 'text',
'criteria': 'containing',
'value': 'a',
'format': blue})
worksheet.conditional_format(columns, {'type': 'text',
'criteria': 'containing',
'value': 'b',
'format': red})
writer.save()
using xlsxwriter with xl_col_to_name we can get the column name using the index.
from xlsxwriter.utility import xl_col_to_name
target_col = xl_col_to_name(df1.columns.get_loc("Data2"))
l = [f'{target_col}2:{target_col}500']
for columns in l:
using opnpyxl with get_column_letter we can get the column name using the index.
from openpyxl.utils import get_column_letter
target_col = get_column_letter(df1.columns.get_loc("Data2") + 1) # add 1 because get_column_letter index start from 1
l = [f'{target_col}2:{target_col}500']
for columns in l:
...
I am using python-docx to extract two tables from a document.
I have iterated over the tables and created a list of lists. Each individual list represents a table, and within that I have dictionaries per row. Each dictionary contains a key / value pair. The key is the column heading from the table and value is the cell contents for that row's data for that column.
I am facing difficulty when creating a data frame for each table and writing each table on a seperate excel sheet.
from docx.api import Document
import pandas as pd
import csv
import json
import unicodedata
document = Document('Sampletable1.docx')
tables = document.tables
print (len(tables))
big_data = []
for table in document.tables:
data = []
Keys = None
for i, row in enumerate(table.rows):
text = (cell.text for cell in row.cells)
if i == 0:
keys = tuple(text)
continue
dic = dict(zip(keys, text))
data.append(dic)
big_data.append(data)
print(big_data)
The output of the above code is:
2
[[{'Asset': 'Growth investments', 'Target investment mix': '66.50%', 'Actual investment mix': '66.30%', 'Variance': '-0.20%'}, {'Asset': 'Defensive investments', 'Target investment mix': '33.50%', 'Actual investment mix': '33.70%', 'Variance': '0.20%'}], [{'Owner': 'REST Super', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER TransPension', 'Product': 'TTR Pension', 'Type': 'New', 'Status': 'New', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing'}]]
How do I access the above lists??
Further I tried to create a pandas data frame
#write the data into a data frame
for thing in big_data:
#print(thing)
df = pd.DataFrame(thing)
print(df)
writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
I got the first table on the excel but unable to work with second table.
I am expecting both the table to be in the same excel workbook(dftable3.xlsx) but in different worksheets(Sheet1,Sheet2)
I have attached the images of the tables.
Thanks in advance
How do I access the above lists??
You already did, by iterating over them, or printing them.
Consider using the pretty-print library:
import pprint
pprint.pprint(big_data)
I am expecting ... different worksheets(Sheet1,Sheet2)
Well, that's unlikely, given the constant 'Sheet1' argument you supplied.
Here is one way to accomplish that:
writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
for i, thing in enumerate(big_data):
df = pd.DataFrame(thing)
df.to_excel(writer, sheet_name=f'Sheet{i}')
writer.save()
Note the scope of writer -- it must be longer lived than each of the constituent dfs.
I am trying to wrap text in python dataframe columns but this code is working for values in columns and not header of column.
I am using below code (taken form stackoverflow). Kindly suggest how to wrap header of dataframe
long_text = 'aa aa ss df fff ggh ttr tre ww rr tt ww errr t ttyyy eewww rr55t e'
data = {'a':[long_text, long_text, 'a'],'c': [long_text,long_text,long_text],
'b':[1,2,3]}
df = pd.DataFrame(data)
#choose columns of df for wrapping
cols_for_wrap = ['a','c']
writer = pd.ExcelWriter('aaa.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
#modifyng output by style - wrap
workbook = writer.book
worksheet = writer.sheets['Sheet1']
wrap_format = workbook.add_format({'text_wrap': True})
#get positions of columns
for col in df.columns.get_indexer(cols_for_wrap):
#map by dict to format like "A:A"
excel_header = d[col] + ':' + d[col]
#None means not set with
worksheet.set_column(excel_header, None, wrap_format)
#for with = 20
worksheet.set_column(excel_header, 10, wrap_format)
writer.save()
In the header_format piece that jmcnamara linked, you can add or remove any formats you want or do not want.
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1})
# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
worksheet.write(0, col_num + 1, value, header_format)
My code looks like this:
h_format = workbook.add_format({'text_wrap': True})
...
...
...
for col_num, value in enumerate(df_new.columns.values):
worksheet.write(0, col_num, value, format)
writer.save()
This is covered almost exactly in the Formatting of the Dataframe headers section of the XlsxWriter docs.