Openpyxl 2.6.0 save issue - python-3.x

I have an issue while I try to save an Excel workbook with comments.
Without any comments in the Excel file, there is no issue with my scripts. I simply use:
wb_archive = load_workbook(archive_file)
However, if the file I want to save has comments, it doesn't work and I have the message:
AttributeError: 'NoneType' object has no attribute 'read'
So, I open the file with the method:
wb_archive = load_workbook(archive_file, keep_vba=True)
First run is ok, however, the second one always fails with the error:
KeyError: "There is no item named 'xl/sharedStrings.xml' in the archive"
Am I wrong somewhere in my code?
# coding: utf8
# !/usr/bin/env python3
"""
Program to extract Excel data to an archive
Program Python version 3.5
"""
# Standard Library
from pathlib import Path
from datetime import date
# External Libraries
from openpyxl import load_workbook
# Import the interface
# Import the project .py files
filein = "file1.xlsx"
fileout = "file2.xlsx"
def xlarchive(source_file, source_sheet, archive_file, archive_sheet, source_start_line=0, archiving_method="NEW", option_date=False):
"""
Function to save data from an Excel Workbook (source) to another one (archive).
Variables shall be check before calling the function.
:param source_file: file where data are copied from (source)
:type source_file: Path file
:param source_sheet: name of the sheet where data are located on the source file
:type source_sheet: string
:param archive_file: file where data are copied to (destination)
:type archive_file: Path file
:param archive_sheet: name of the sheet where data have to be copied on the destination file
:type archive_sheet: string
:param source_start_line:
:type source_start_line: int
:param archiving_method: defines if the destination file has to be created
:type archiving_method: string
:param option_date: defines if the extraction data shall be recorded
:type option_date: bool
:return: None
"""
wb_source = load_workbook(source_file)
#keep_vba = true to avoid issue with comments
ws_source = wb_source.get_sheet_by_name(source_sheet)
wb_archive = load_workbook(archive_file)
ws_archive = wb_archive.get_sheet_by_name(archive_sheet)
if archiving_method == "NEW":
# index of [archive_sheet] sheet
idx = wb_archive.sheetnames.index(archive_sheet)
# remove [ws_archive]
wb_archive.remove(ws_archive)
# create an empty sheet [ws_archive] using old index
wb_archive.create_sheet(archive_sheet, idx)
ws_archive = wb_archive.get_sheet_by_name(archive_sheet)
# If extraction has been performed the same day, previous data will be replaced
# Date are store in Excel under format YYYY-MM-DD HH:MM:SS.
# extractiondate is from datetime.now().date() and its format is YYYY-MM-DD
# Comparison thanks to string is needed
# As Openpyxl does not enable to delete row, the below code clear data and find the first empty occurrence
if option_date == True:
j = 0
for i in range(ws_archive.max_row, 1, -1):
if str(ws_archive.cell(row=i, column=1).value)[0:10] == str(date.today()):
j=j+1
ws_archive.delete_rows(ws_archive.max_row - j + 1, j)
for row in ws_source.iter_rows(min_row=source_start_line):
complete_row = []
for item in row:
complete_row.append(item.value)
if option_date is True:
complete_row.insert(0, str(date.today()))
ws_archive.append(complete_row)
wb_archive.save(archive_file)
xlarchive(filein, "Sheet1", fileout, "Sheet1", option_date=True, archiving_method="False", source_start_line=2)

Related

Extract some data from a text file

I am not so experienced in Python.
I have a “CompilerWarningsAllProtocol.txt” file that contains something like this:
" adm_1 C:\Work\CompilerWarnings\adm_1.h type:warning Reason:wunused
adm_2 E:\Work\CompilerWarnings\adm_basic.h type:warning Reason:undeclared variable
adm_X C:\Work\CompilerWarnings\adm_X.h type:warning Reason: Unknown ID"
How can I extract these three paths(C:..., E:..., C:...) from the txt file and to fill an Excel column named “Affected Item”.?
Can I do it with re.findall or re.search methods?
For now the script is checkling if in my location exists the input txt file and confirms it. After that it creates the blank excel file with headers, but I don't know how to populate the excel file with these paths written in column " Affected Item" let's say.
thanks for help. I will copy-paste the code:
import os
import os.path
import re
import xlsxwriter
import openpyxl
from jira import JIRA
import pandas as pd
import numpy as np
# Print error message if no "CompilerWarningsAllProtocol.txt" file exists in the folder
inputpath = 'D:\Work\Python\CompilerWarnings\Python_CompilerWarnings\CompilerWarningsAllProtocol.txt'
if os.path.isfile(inputpath) and os.access(inputpath, os.R_OK):
print(" 'CompilerWarningsAllProtocol.txt' exists and is readable")
else:
print("Either the file is missing or not readable")
# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('CompilerWarningsFresh.xlsx')
worksheet = workbook.add_worksheet('Results')
# Widen correspondingly the columns.
worksheet.set_column('A:A', 20)
worksheet.set_column('B:AZ', 45)
# Create the headers
headers=('Module','Affected Item', 'Issue', 'Class of Issue', 'Issue Root Cause', 'Type of Issue',
'Source of Issue', 'Test sequence', 'Current Issue appearances in module')
# Create the bold headers and font size
format1 = workbook.add_format({'bold': True, 'font_color': 'black',})
format1.set_font_size(14)
format1.set_border()
row=col=0
for item in (headers):
worksheet.write(row, col, item, format1)
col += 1
workbook.close()
I agree with #dantechguy that csv is probably easier (and more light weight) than writing a real xlsx file, but if you want to stick to Excel format, the code below will work. Also, based on the code you've provided, you don't need to import openpyxl, jira, pandas or numpy.
The regex here matches full paths with any drive letter A-Z, followed by "type:warning". If you don't need to check for the warning and simply want to get every path in the file, you can delete everything in the regex after S+. And if you know you'll only ever want drives C and E, just change A-Z to CE.
warningPathRegex = r"[A-Z]:\\\S+(?=\s*type:warning)"
compilerWarningFile = r"D:\Work\Python\CompilerWarnings\Python_CompilerWarnings\CompilerWarningsAllProtocol.txt"
warningPaths = []
with open(compilerWarningFile, 'r') as f:
fullWarningFile = f.read()
warningPaths = re.findall(warningPathRegex, fullWarningFile)
# ... open Excel file, then before workbook.close():
pathColumn = 1 # Affected item
for num, warningPath in enumerate(warningPaths):
worksheet.write(num + 1, pathColumn, warningPath) # num + 1 to skip header row

Updating excel sheet with Pandas without overwriting the file

I am trying to update an excel sheet with Python codes. I read specific cell and update it accordingly but Padadas overwrites the entire excelsheet which I loss other pages as well as formatting. Anyone can tell me how I can avoid it?
Record = pd.read_excel("Myfile.xlsx", sheet_name'Sheet1', index_col=False)
Record.loc[1, 'WORDS'] = int(self.New_Word_box.get())
Record.loc[1, 'STATUS'] = self.Stat.get()
Record.to_excel("Myfile.xlsx", sheet_name='Student_Data', index =False)
My code are above, as you can see, I only want to update few cells but it overwrites the entire excel file. I tried to search for answer but couldn't find any specific answer.
Appreciate your help.
Update: Added more clarifications
Steps:
1) Read the sheet which needs changes in a dataframe and make changes in that dataframe.
2) Now the changes are reflected in the dataframe but not in the sheet. Use the following function with the dataframe in step 1 and name of the sheet to be modified. You will use the truncate_sheet param to completely replace the sheet of concern.
The function call would be like so:
append_df_to_excel(filename, df, sheet_name, startrow=0, truncate_sheet=True)
from openpyxl import load_workbook
import pandas as pd
def append_df_to_excel(filename, df, sheet_name="Sheet1", startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn"t exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: "/path/to/file.xlsx")
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: "Sheet1")
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
Returns: None
"""
# ignore [engine] parameter if it was passed
if "engine" in to_excel_kwargs:
to_excel_kwargs.pop("engine")
writer = pd.ExcelWriter(filename, engine="openpyxl")
# Python 2.x: define [FileNotFoundError] exception if it doesn"t exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
if "index" not in to_excel_kwargs:
to_excel_kwargs["index"] = False
try:
# try to open an existing workbook
if "header" not in to_excel_kwargs:
to_excel_kwargs["header"] = True
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
to_excel_kwargs["header"] = False
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title: ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
to_excel_kwargs["header"] = True
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
We can't replace openpyxl engine here to write excel files as asked in comment. Refer reference 2.
References:
1) https://stackoverflow.com/a/38075046/6741053
2) xlsxwriter: is there a way to open an existing worksheet in my workbook?

Import multiple .xlsx files & perform arithmetic operation with data operated data write in single new xlsx file with multiple sheet using openpyxl?

'''
I am using openpyxl to open one xlsx file & making few arithmetic operation then the saving it in new xlsx file. Now that i want to import many files and want to operate same things and store all file results in single xlsx file multiple sheet.
'''
from openpyxl import Workbook
import openpyxl
wb= openpyxl.load_workbook(filename=r"C:\Users\server\Desktop\Python\Data.xlsx", read_only=True)
# resading file from
ws = wb['Sheet1'] # moving into sheet1
# Comprehension
row_data = [ [cell.value for cell in row] for row in ws.rows] # looping through row data in sheet
header_data = row_data[0] # leaving header data by slicing
row_data = row_data[1:] #storing xlsx file data into 2D list
[ dp.append(dp[1]*dp[2])for dp in row_data] # perfornming multplication opertion columnwise, lets say coulmn 1 * column 2 in a row_data and appending into next column
wb.close()# closing the worksheet
wb = openpyxl.Workbook() # opening new worksheet
ws = wb.active # sheet 1 is active`enter code here`
ws.append(header_data) # header data writtten
for row in row_data: # 2D list data is writng in sheet 1
ws.append(row)
wb.save(r"C:\Users\server\Desktop\Python\Result.xlsx")
'''I am able store multiple xlsx files in a list, Now i want to access each file data and perform few arithmetic operation , finally results data need to store in single xlsx file with multiple sheets in it
'''
from openpyxl import Workbook
import openpyxl
import os
location=r"C:\Users\server\Desktop\Python\Data.xlsx" # will get folder location here where many xlsx files are present
counter = 0 #keep a count of all files found
xlsx_files = [] #list to store all xlsx files found at location
for file in os.listdir(wb):
try:
if file.endswith(".xlsx"):
print ("xlsx file found:\t", file)
xlsx_files.append(str(file))
counter = counter+1
except Exception as e:
raise e
print ("No files found here!")
print ("Total files found:\t", counter)

Convert multiple .txt files into single .csv file (python)

I need to convert a folder with around 4,000 .txt files into a single .csv with two columns:
(1) Column 1: 'File Name' (as specified in the original folder);
(2) Column 2: 'Content' (which should contain all text present in the corresponding .txt file).
Here you can see some of the files I am working with.
The most similar question to mine here is this one (Combine a folder of text files into a CSV with each content in a cell) but I could not implement any of the solutions presented there.
The last one I tried was the Python code proposed in the aforementioned question by Nathaniel Verhaaren but I got the exact same error as the question's author (even after implementing some suggestions):
import os
import csv
dirpath = 'path_of_directory'
output = 'output_file.csv'
with open(output, 'w') as outfile:
csvout = csv.writer(outfile)
csvout.writerow(['FileName', 'Content'])
files = os.listdir(dirpath)
for filename in files:
with open(dirpath + '/' + filename) as afile:
csvout.writerow([filename, afile.read()])
afile.close()
outfile.close()
Other questions which seemed similar to mine (for example, Python: Parsing Multiple .txt Files into a Single .csv File?, Merging multiple .txt files into a csv, and Converting 1000 text files into a single csv file) do not solve this exact problem I presented (and I could not adapt the solutions presented to my case).
I had a similar requirement and so I wrote the following class
import os
import pathlib
import glob
import csv
from collections import defaultdict
class FileCsvExport:
"""Generate a CSV file containing the name and contents of all files found"""
def __init__(self, directory: str, output: str, header = None, file_mask = None, walk_sub_dirs = True, remove_file_extension = True):
self.directory = directory
self.output = output
self.header = header
self.pattern = '**/*' if walk_sub_dirs else '*'
if isinstance(file_mask, str):
self.pattern = self.pattern + file_mask
self.remove_file_extension = remove_file_extension
self.rows = 0
def export(self) -> bool:
"""Return True if the CSV was created"""
return self.__make(self.__generate_dict())
def __generate_dict(self) -> defaultdict:
"""Finds all files recursively based on the specified parameters and returns a defaultdict"""
csv_data = defaultdict(list)
for file_path in glob.glob(os.path.join(self.directory, self.pattern), recursive = True):
path = pathlib.Path(file_path)
if not path.is_file():
continue
content = self.__get_content(path)
name = path.stem if self.remove_file_extension else path.name
csv_data[name].append(content)
return csv_data
#staticmethod
def __get_content(file_path: str) -> str:
with open(file_path) as file_object:
return file_object.read()
def __make(self, csv_data: defaultdict) -> bool:
"""
Takes a defaultdict of {k, [v]} where k is the file name and v is a list of file contents.
Writes out these values to a CSV and returns True when complete.
"""
with open(self.output, 'w', newline = '') as csv_file:
writer = csv.writer(csv_file, quoting = csv.QUOTE_ALL)
if isinstance(self.header, list):
writer.writerow(self.header)
for key, values in csv_data.items():
for duplicate in values:
writer.writerow([key, duplicate])
self.rows = self.rows + 1
return True
Which can be used like so
...
myFiles = r'path/to/files/'
outputFile = r'path/to/output.csv'
exporter = FileCsvExport(directory = myFiles, output = outputFile, header = ['File Name', 'Content'], file_mask = '.txt')
if exporter.export():
print(f"Export complete. Total rows: {exporter.rows}.")
In my example directory, this returns
Export complete. Total rows: 6.
Note: rows does not count the header if present
This generated the following CSV file:
"File Name","Content"
"Test1","This is from Test1"
"Test2","This is from Test2"
"Test3","This is from Test3"
"Test4","This is from Test4"
"Test5","This is from Test5"
"Test5","This is in a sub-directory"
Optional parameters:
header: Takes a list of strings that will be written as the first line in the CSV. Default None.
file_mask: Takes a string that can be used to specify the file type; for example, .txt will cause it to only match .txt files. Default None.
walk_sub_dirs: If set to False, it will not search in sub-directories. Default True.
remove_file_extension: If set to False, it will cause the file name to be written with the file extension included; for example, File.txt instead of just File. Default True.

Python script that checks the same sheet in 300 xlsx files, compares it to a master sheet and updates it accordingly

I have a list of around 300 excel files, all named following this pattern [aA-zZ]{1}[0-9]{5}.xlsx and a master file. I'm trying to put together a python script that reads the same sheet/column in each file, compares it to the master's file sheet and updates it accordingly.
I've been trying openpyxl but I'm hopelessly stuck, any help is very much appreciated.
#!Python3
import openpyxl
import pandas as pd
import os
# Move to the correct location
path = "/usr/tmp/files"
os.chdir(path)
# First we open the master file
wb = load_workbook('master.xlsx')
# grab master worksheet in master.xlsx
ws = wb.active('Sheet1')
#Second we open the rest of the files that include changes and compare with the data in master.xlsx
def main():
for f in files:
wb2 = load_workbook(f)
ws2 = wb2['Sheet1']
#read first workbook to get data
wb2 = load_workbook(filename = '.xlsx')
ws2 = wb2.get_sheet_by_name(name = 'Sheet1')
#Iterate through worksheet and compare with master sheet for changes
for row in ws.iter_rows():
for cell in row:
cellContent = str(cell.value)
if cellContent == 'yes'
wb = load_workbook('master.xlsx', optimized_write=True)
# Update cell contents
ws[cell] = cellContent
# Save workbook
wb.save('master.xlsx')
if __name__ == '__main__':
main()
Thanks!!
!!!!!EDITED CODE!!!!!
#!Python3
From openpyxl import *
import pandas as pd
import os
import re
# Move to the correct location
path = "/usr/tmp/files"
os.chdir(path)
# First we open the master file
wb = load_workbook('master.xlsx')
# grab master worksheet in master.xlsx
ws = wb.get_sheet_by_name('Sheet1')
# Open the rest of the files that include changes and compare with the data in master.xlsx
def main():
files = [f for f in os.listdir('.') if re.match(r'[A-Za-z][0-9]{5}\.xlsx', f)]
#read each workbook to get data
for f in files:
wb2 = load_workbook(f)
ws2 = wb2.get_sheet_by_name('Sheet1')
#Iterate through worksheet and compare with master sheet for changes
for row in ws2.iter_rows():
for cell in row:
cellContent = str(cell.value)
if cellContent == "yes":
wb = Workbook(write_only = True)
# Update cell content
ws[cell.coordinate] = str(cellContent)
else:
continue
# Save workbook
wb.save('master.xlsx')
if __name__ == '__main__':
main()
Depending on the size of the files I'd be tempted to do this in two steps: first read in the files in read-only mode and identify whether they need correcting or not.
Then go through the list of the files that need correcting and update only these. This is probably the fastest way to do this because it stops you loading all the worksheets of all the workbooks into memory.
NB. the syntax you're using is for older versions of openpyxl and no longer supported. I strongly advise you update to >= 2.4 and refer to the official documentation first and foremost.
To get the matching files, first:
import re
then in the first line of main() define files:
files = [f for f in os.listdir('.') if re.match(r'[A-Za-z][0-9]{5}\.xlsx', f)]
Note: I changed your regex pattern on the assumption that the filenames of interest are one letter (upper or lower case) followed by 5 digits followed by '.xlsx'.
Hope that helps

Resources