How can I create an excel file with multiple sheets that stores content of a text file using python - excel

I need to create an excel file and each sheet contains the contents of a text file in my directory, for example if I've two text file then I'll have two sheets and each sheet contains the content of the text file.
I've managed to create the excel file but I could only fill it with the contents of the last text file in my directory, howevr, I need to read all my text files and save them into excel.
This is my code so far:
import os
import glob
import xlsxwriter
file_name='WriteExcel.xlsx'
path = 'C:/Users/khouloud.ayari/Desktop/khouloud/python/Readfiles'
txtCounter = len(glob.glob1(path,"*.txt"))
for filename in glob.glob(os.path.join(path, '*.txt')):
f = open(filename, 'r')
content = f.read()
print (len(content))
workbook = xlsxwriter.Workbook(file_name)
ws = workbook.add_worksheet("sheet" + str(i))
ws.set_column(0, 1, 30)
ws.set_column(1, 2, 25)
parametres = (
['file', content],
)
# Start from the first cell. Rows and
# columns are zero indexed.
row = 0
col = 0
# Iterate over the data and write it out row by row.
for name, parametres in (parametres):
ws.write(row, col, name)
ws.write(row, col + 1, parametres)
row += 1
workbook.close()
example:
if I have two text file, the content of the first file is 'hello', the content of the second text file is 'world', in this case I need to create two worksheets, first worksheet needs to store 'hello' and the second worksheet needs to store 'world'.
but my two worksheets contain 'world'.

I recommend to use pandas. It in turn uses xlsxwriter to write data (whole tables) to excel files but makes it much easier - with literally couple lines of code.
import pandas as pd
df_1 = pd.DataFrame({'data': ['Hello']})
sn_1 = 'hello'
df_2 = pd.DataFrame({'data': ['World']})
sn_2 = 'world'
filename_excel = '1.xlsx'
with pd.ExcelWriter(filename_excel) as writer:
for df, sheet_name in zip([df_1, df_2], [sn_1, sn_2]):
df.to_excel(writer, index=False, header=False, sheet_name=sheet_name)

Related

How do I convert multiple multiline txt files to excel - ensuring each file is its own line, then each line of text is it own row? Python3

Using openpyxl and Path I aim to:
Create multiple multiline .txt files,
then insert .txt content into a .xlsx file ensuring file 1 is in column 1 and each line has its own row.
I thought to create a nested list then loop through it to insert the text. I cannot figure how to ensure that all the nested list string is displayed. This is what I have so far which nearly does what I want however it's just a repeat of the first line of text.
from pathlib import Path
import openpyxl
listOfText = []
wb = openpyxl.Workbook() # Create a new workbook to insert the text files
sheet = wb.active
for txtFile in range(5): # create 5 text files
createTextFile = Path('textFile' + str(txtFile) + '.txt')
createTextFile.write_text(f'''Hello, this is a multiple line text file.
My Name is x.
This is text file {txtFile}.''')
readTxtFile = open(createTextFile)
listOfText.append(readTxtFile.readlines()) # nest the list from each text file into a parent list
textFileList = len(listOfText[txtFile]) # get the number of lines of text from the file. They are all 3 as made above
# Each column displays text from each text file
for row in range(1, txtFile + 1):
for col in range(1, textFileList + 1):
sheet.cell(row=row, column=col).value = listOfText[txtFile][0]
wb.save('importedTextFiles.xlsx')
The output is 4 columns/4 rows. All of which say the same 'Hello, this is a multiple line text file.'
Appreciate any help with this!
The problem is in the for loop while writing, change the line sheet.cell(row=row, column=col).value = listOfText[txtFile][0] to sheet.cell(row=col, column=row).value = listOfText[row-1][col-1] and it will work

Extract some data from a text file

I am not so experienced in Python.
I have a “CompilerWarningsAllProtocol.txt” file that contains something like this:
" adm_1 C:\Work\CompilerWarnings\adm_1.h type:warning Reason:wunused
adm_2 E:\Work\CompilerWarnings\adm_basic.h type:warning Reason:undeclared variable
adm_X C:\Work\CompilerWarnings\adm_X.h type:warning Reason: Unknown ID"
How can I extract these three paths(C:..., E:..., C:...) from the txt file and to fill an Excel column named “Affected Item”.?
Can I do it with re.findall or re.search methods?
For now the script is checkling if in my location exists the input txt file and confirms it. After that it creates the blank excel file with headers, but I don't know how to populate the excel file with these paths written in column " Affected Item" let's say.
thanks for help. I will copy-paste the code:
import os
import os.path
import re
import xlsxwriter
import openpyxl
from jira import JIRA
import pandas as pd
import numpy as np
# Print error message if no "CompilerWarningsAllProtocol.txt" file exists in the folder
inputpath = 'D:\Work\Python\CompilerWarnings\Python_CompilerWarnings\CompilerWarningsAllProtocol.txt'
if os.path.isfile(inputpath) and os.access(inputpath, os.R_OK):
print(" 'CompilerWarningsAllProtocol.txt' exists and is readable")
else:
print("Either the file is missing or not readable")
# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('CompilerWarningsFresh.xlsx')
worksheet = workbook.add_worksheet('Results')
# Widen correspondingly the columns.
worksheet.set_column('A:A', 20)
worksheet.set_column('B:AZ', 45)
# Create the headers
headers=('Module','Affected Item', 'Issue', 'Class of Issue', 'Issue Root Cause', 'Type of Issue',
'Source of Issue', 'Test sequence', 'Current Issue appearances in module')
# Create the bold headers and font size
format1 = workbook.add_format({'bold': True, 'font_color': 'black',})
format1.set_font_size(14)
format1.set_border()
row=col=0
for item in (headers):
worksheet.write(row, col, item, format1)
col += 1
workbook.close()
I agree with #dantechguy that csv is probably easier (and more light weight) than writing a real xlsx file, but if you want to stick to Excel format, the code below will work. Also, based on the code you've provided, you don't need to import openpyxl, jira, pandas or numpy.
The regex here matches full paths with any drive letter A-Z, followed by "type:warning". If you don't need to check for the warning and simply want to get every path in the file, you can delete everything in the regex after S+. And if you know you'll only ever want drives C and E, just change A-Z to CE.
warningPathRegex = r"[A-Z]:\\\S+(?=\s*type:warning)"
compilerWarningFile = r"D:\Work\Python\CompilerWarnings\Python_CompilerWarnings\CompilerWarningsAllProtocol.txt"
warningPaths = []
with open(compilerWarningFile, 'r') as f:
fullWarningFile = f.read()
warningPaths = re.findall(warningPathRegex, fullWarningFile)
# ... open Excel file, then before workbook.close():
pathColumn = 1 # Affected item
for num, warningPath in enumerate(warningPaths):
worksheet.write(num + 1, pathColumn, warningPath) # num + 1 to skip header row

Updating excel sheet with Pandas without overwriting the file

I am trying to update an excel sheet with Python codes. I read specific cell and update it accordingly but Padadas overwrites the entire excelsheet which I loss other pages as well as formatting. Anyone can tell me how I can avoid it?
Record = pd.read_excel("Myfile.xlsx", sheet_name'Sheet1', index_col=False)
Record.loc[1, 'WORDS'] = int(self.New_Word_box.get())
Record.loc[1, 'STATUS'] = self.Stat.get()
Record.to_excel("Myfile.xlsx", sheet_name='Student_Data', index =False)
My code are above, as you can see, I only want to update few cells but it overwrites the entire excel file. I tried to search for answer but couldn't find any specific answer.
Appreciate your help.
Update: Added more clarifications
Steps:
1) Read the sheet which needs changes in a dataframe and make changes in that dataframe.
2) Now the changes are reflected in the dataframe but not in the sheet. Use the following function with the dataframe in step 1 and name of the sheet to be modified. You will use the truncate_sheet param to completely replace the sheet of concern.
The function call would be like so:
append_df_to_excel(filename, df, sheet_name, startrow=0, truncate_sheet=True)
from openpyxl import load_workbook
import pandas as pd
def append_df_to_excel(filename, df, sheet_name="Sheet1", startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn"t exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: "/path/to/file.xlsx")
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: "Sheet1")
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
Returns: None
"""
# ignore [engine] parameter if it was passed
if "engine" in to_excel_kwargs:
to_excel_kwargs.pop("engine")
writer = pd.ExcelWriter(filename, engine="openpyxl")
# Python 2.x: define [FileNotFoundError] exception if it doesn"t exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
if "index" not in to_excel_kwargs:
to_excel_kwargs["index"] = False
try:
# try to open an existing workbook
if "header" not in to_excel_kwargs:
to_excel_kwargs["header"] = True
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
to_excel_kwargs["header"] = False
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title: ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
to_excel_kwargs["header"] = True
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
We can't replace openpyxl engine here to write excel files as asked in comment. Refer reference 2.
References:
1) https://stackoverflow.com/a/38075046/6741053
2) xlsxwriter: is there a way to open an existing worksheet in my workbook?

Import multiple .xlsx files & perform arithmetic operation with data operated data write in single new xlsx file with multiple sheet using openpyxl?

'''
I am using openpyxl to open one xlsx file & making few arithmetic operation then the saving it in new xlsx file. Now that i want to import many files and want to operate same things and store all file results in single xlsx file multiple sheet.
'''
from openpyxl import Workbook
import openpyxl
wb= openpyxl.load_workbook(filename=r"C:\Users\server\Desktop\Python\Data.xlsx", read_only=True)
# resading file from
ws = wb['Sheet1'] # moving into sheet1
# Comprehension
row_data = [ [cell.value for cell in row] for row in ws.rows] # looping through row data in sheet
header_data = row_data[0] # leaving header data by slicing
row_data = row_data[1:] #storing xlsx file data into 2D list
[ dp.append(dp[1]*dp[2])for dp in row_data] # perfornming multplication opertion columnwise, lets say coulmn 1 * column 2 in a row_data and appending into next column
wb.close()# closing the worksheet
wb = openpyxl.Workbook() # opening new worksheet
ws = wb.active # sheet 1 is active`enter code here`
ws.append(header_data) # header data writtten
for row in row_data: # 2D list data is writng in sheet 1
ws.append(row)
wb.save(r"C:\Users\server\Desktop\Python\Result.xlsx")
'''I am able store multiple xlsx files in a list, Now i want to access each file data and perform few arithmetic operation , finally results data need to store in single xlsx file with multiple sheets in it
'''
from openpyxl import Workbook
import openpyxl
import os
location=r"C:\Users\server\Desktop\Python\Data.xlsx" # will get folder location here where many xlsx files are present
counter = 0 #keep a count of all files found
xlsx_files = [] #list to store all xlsx files found at location
for file in os.listdir(wb):
try:
if file.endswith(".xlsx"):
print ("xlsx file found:\t", file)
xlsx_files.append(str(file))
counter = counter+1
except Exception as e:
raise e
print ("No files found here!")
print ("Total files found:\t", counter)

For Loop - Reading in all excel tabs into Panda Df's

I have an .xlsx book and I would like to write a function or loop that would create Panda(s) DF's for each tab in excel. So for example, let's say that I have an excel book called book.xlsx and tabs called sheet1 - sheet6. I would like to read in the excel file and create 6 Panda DF's (sheet1 - sheet6) from a function or loop?
To load the file:
path = '../files_to_load/my_file.xlsx'
print(path)
excel_file = pd.ExcelFile(path)
print('File uploaded ✔')
To get a specific sheet:
# Get a specific sheet
raw_data = excel_file.parse('sheet1')
Here an example for the Loop:
You will have all of you sheets stored in a list. All the sheets will be dataframes
In [1]:
import pandas as pd
path = 'my_path/my_file.xlsx'
excel_file = pd.ExcelFile(path)
sheets = []
for sheet in excel_file.sheet_names:
data = excel_file.parse(sheet)
sheets.append(data)
You need to set sheet_name argument to None - it would create an ordered dictionary of sheets stored as dataframes.
dataframes = pd.read_excel(file_name, sheet_name=None)
>>> type(dataframes)
<class 'collections.OrderedDict'>
>>> type(dataframes['first']) # `first` is the name a sheet
<class 'pandas.core.frame.DataFrame'>

Resources