I am working with RapidMiner at the moment and am trying to copy my RapidMiner results which are in xlsx files to txt files in order to do some further processing with python. I do have plain text in column A (A1-A1500) as well as the according filename in column C (C1-C1500).
Now my question:
Is there any possibility (I am thinking of the xlrd module) to read the content of every cell in column A and print this to a new created txt file with the filename being given in corresponding column C?
As I have never worked with the xlrd module before I am a bit lost at the moment...
I can recommend openpyxl for every tasks concerning .xlsx handling.
For your requirements:
from openpyxl import *
import os
p = 'path/to/the/folder/with/your/.xlsx'
files = [_ for _ in os.listdir(p) if _.endswith('.xlsx')]
for f in files:
wb = load_workbook(os.path.join(p, f))
ws = wb['name_of_sheet']
for row in ws.rows:
with open(row[2].value+'.txt', 'w') as outfile:
outfile.write(row[0].value)
Good day! So, I'm not sure I understand your question correctly, but have you tried a combination of Read Excel operator with the Loop Examples operator? Your loop subprocess could then use Write CSV operator or similar.
Thanks to #corinna the final code is:
from openpyxl import *
import os
p = r'F:\Results'
files = [_ for _ in os.listdir(p) if _ .endswith('.xlsx')]
os.chdir(r"F:\Results")
for f in files:
file_location = load_workbook(os.path.join(p, f))
sheet = file_location['Normal']
for row in sheet.rows:
with open(row[2].value + '.txt', "w") as outfile:
outfile.write(row[0].value)
Related
This drives me crazy.
I have the following csv file:
Short name;Calculation;29221
peter;foster;1,755345
karin;paris;0,2343543
john;dee;0
lisa;long;1,434534
lauren;lovely;0,123124
linda;loss;0,0234
I read this file in pandas, print it and everything looks fine in pandas.
Then I write it to an existing excel workbook and the values are partly corrupted.
THis is my code
import pandas as pd
import xlwings as xw
#öffne csv
QTH = pd.read_csv(r"C:/Users/A692517/PhytonStuff/testCSVtoExcel.csv",sep = ';')#,
# engine = 'python')
for idx, row in QTH.iterrows():
#c=QoSFTTH[row[2]].at[idx]
myString = str(row[2])
row[2]=myString
#ziel workbook
fn="C:/Users/A692517/PhytonStuff/myClist.xlsx"
wb = xw.Book(fn)
ws = wb.sheets["Tabelle1"]
#schreibe QoSFTTH dataframe in zielworkbook
ws["A1"].options(pd.DataFrame, header=1, index=False, expand='table').value = QTH
wb.save(fn)
wb.close()
When I export the Excel result in a new csv(;) you see what I mean:
Short name;Calculation;29221,00
peter;foster;1755345,00
karin;paris;0,2343543
john;dee;0,00
lisa;long;1434534,00
lauren;lovely;0,123124
linda;loss;0,0234
You may have stumbled on a pd.read_csv bug found via this stack question. Change the engine to engine = c and try thousands=','.
pd.read_csv('path', sep=';', thousands=',', engine='c')
I am not so experienced in Python.
I have a “CompilerWarningsAllProtocol.txt” file that contains something like this:
" adm_1 C:\Work\CompilerWarnings\adm_1.h type:warning Reason:wunused
adm_2 E:\Work\CompilerWarnings\adm_basic.h type:warning Reason:undeclared variable
adm_X C:\Work\CompilerWarnings\adm_X.h type:warning Reason: Unknown ID"
How can I extract these three paths(C:..., E:..., C:...) from the txt file and to fill an Excel column named “Affected Item”.?
Can I do it with re.findall or re.search methods?
For now the script is checkling if in my location exists the input txt file and confirms it. After that it creates the blank excel file with headers, but I don't know how to populate the excel file with these paths written in column " Affected Item" let's say.
thanks for help. I will copy-paste the code:
import os
import os.path
import re
import xlsxwriter
import openpyxl
from jira import JIRA
import pandas as pd
import numpy as np
# Print error message if no "CompilerWarningsAllProtocol.txt" file exists in the folder
inputpath = 'D:\Work\Python\CompilerWarnings\Python_CompilerWarnings\CompilerWarningsAllProtocol.txt'
if os.path.isfile(inputpath) and os.access(inputpath, os.R_OK):
print(" 'CompilerWarningsAllProtocol.txt' exists and is readable")
else:
print("Either the file is missing or not readable")
# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('CompilerWarningsFresh.xlsx')
worksheet = workbook.add_worksheet('Results')
# Widen correspondingly the columns.
worksheet.set_column('A:A', 20)
worksheet.set_column('B:AZ', 45)
# Create the headers
headers=('Module','Affected Item', 'Issue', 'Class of Issue', 'Issue Root Cause', 'Type of Issue',
'Source of Issue', 'Test sequence', 'Current Issue appearances in module')
# Create the bold headers and font size
format1 = workbook.add_format({'bold': True, 'font_color': 'black',})
format1.set_font_size(14)
format1.set_border()
row=col=0
for item in (headers):
worksheet.write(row, col, item, format1)
col += 1
workbook.close()
I agree with #dantechguy that csv is probably easier (and more light weight) than writing a real xlsx file, but if you want to stick to Excel format, the code below will work. Also, based on the code you've provided, you don't need to import openpyxl, jira, pandas or numpy.
The regex here matches full paths with any drive letter A-Z, followed by "type:warning". If you don't need to check for the warning and simply want to get every path in the file, you can delete everything in the regex after S+. And if you know you'll only ever want drives C and E, just change A-Z to CE.
warningPathRegex = r"[A-Z]:\\\S+(?=\s*type:warning)"
compilerWarningFile = r"D:\Work\Python\CompilerWarnings\Python_CompilerWarnings\CompilerWarningsAllProtocol.txt"
warningPaths = []
with open(compilerWarningFile, 'r') as f:
fullWarningFile = f.read()
warningPaths = re.findall(warningPathRegex, fullWarningFile)
# ... open Excel file, then before workbook.close():
pathColumn = 1 # Affected item
for num, warningPath in enumerate(warningPaths):
worksheet.write(num + 1, pathColumn, warningPath) # num + 1 to skip header row
I need to compare two excel files and a csv file, then write some data from one excel file to another.
It looks like this:
CSV file with names which I will compare. For example (spam, eggs)
First Excel file with name and value of it. For example (spam, 100)
Second Excel file with name. For example (eggs)
Now, when I input file (second) into program I need to ensure that eggs == spam with csv file and then save value of 100 to the eggs.
For operating on excel files I'm using openpyxl and for csv I'm using csv.
Can I count on your help? Maybe there are better libraries to do that, because my trials proved to be a total failure.
Got it by myself. Some complex way, but it works like I wanted to. Will be glad for some tips to it.
import openpyxl
import numpy as np
lines = np.genfromtxt("csvtest.csv", delimiter=";", dtype=None)
compdict = dict()
for i in range(len(lines)):
compdict[lines[i][0]] = lines[i][1]
wb1 = openpyxl.load_workbook('inputtest.xlsx')
wb2 = openpyxl.load_workbook(filename='spistest.xlsx')
ws = wb1.get_sheet_by_name('Sheet1')
spis = wb2.get_sheet_by_name('Sheet1')
for row in ws.iter_rows(min_row=1, max_row=ws.max_row, min_col=1):
for cell in row:
if cell.value in compdict:
for wiersz in spis.iter_rows(min_row=1, max_row=spis.max_row, min_col=1):
for komorka in wiersz:
if komorka.value == compdict[cell.value]:
cena = spis.cell(row=komorka.row, column=2)
ws.cell(row=cell.row, column=2, value=cena.value)
wb1.save('inputtest.xlsx')
wb2.close()
I have an xlsx file with 1 sheet.
I am trying to open it using python 3 (xlrd lib), but I get an empty file!
I use this code:
file_errors_location = "C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx"
workbook_errors = xlrd.open_workbook(file_errors_location)
and I have no errors, but when I type:
workbook_errors.nsheets
I get "0", even the file has some sheets... when I type:
workbook_errors
I get:
xlrd.book.Book object at 0x2..
any help? thanks
You can use Pandas pandas.read_excel just like pandas.read_csv:
import pandas as pd
file_errors_location = 'C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx'
df = pd.read_excel(file_errors_location)
print(df)
There are two modules for reading xls file : openpyxl and xlrd
This script allow you to transform a excel data to list of dictionnaries using xlrd
import xlrd
workbook = xlrd.open_workbook('C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx')
workbook = xlrd.open_workbook('C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock the name of the column
for col in range(worksheet.ncols):
first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnary
data =[]
for row in range(1, worksheet.nrows):
elm = {}
for col in range(worksheet.ncols):
elm[first_row[col]]=worksheet.cell_value(row,col)
data.append(elm)
print data
Unfortunately, the python engine 'xlrd' that is required to read the Excel docs has explicitly removed support for anything other than xls files.
So here's how you can do it now -
Install openpyxl:
https://openpyxl.readthedocs.io/en/stable/
Change your pandas code to:
pandas.read_excel('cat.xlsx', engine='openpyxl')
Note: This worked for me with the latest version of Pandas (i.e. 1.1.5). Previously, I was using version 0.24.0 and it didn't work so I had to update to latest version.
Another way to do it:
import openpyxl
workbook_errors = openpyxl.Workbook()
workbook_errors = openpyxl.load_workbook(file_errors_location)
import string
import xlrd
import xlsxwriter
workbook = xlsxwriter.Workbook('C:\T\file.xlsx')
worksheet = workbook.add_worksheet()
book = open_workbook(r'C:\T\test.xls','r')
sheet = book.sheet_by_index(0)
for row_index in range(sheet.nrows):
for col_index in range(sheet.ncols):
print sheet.cell(row_index,0).value
x = sheet.cell(row_index,0).value
worksheet.write_string(row_index,col_index,x)
workbook.close()
I'm a skiddy to python. Here i'm trying to read the xls file with xlrd for data and copy it to another xlsx file through xlsxwriter module. but the data won't get pasted in the created xlsx sheet. Please guide me through this. Above is my exact code. Please correct me if any wrong.
A volley of Thanks in advance.
Your example program almost works. Mainly it needs the open_workbook() method to be prefixed with a class and it is better to use XlsxWriter write() instead of write_string() unless you are sure that all the data you are reading is of a string type. Also, the program was only reading values from column 0.
Here is the same example with those changes in place. I've also renamed the variables in_ and out_ to make it clearer which module is calling which method:
import xlrd
import xlsxwriter
out_workbook = xlsxwriter.Workbook('file.xlsx')
out_worksheet = out_workbook.add_worksheet()
in_workbook = xlrd.open_workbook(r'test.xls', 'r')
in_worksheet = in_workbook.sheet_by_index(0)
for row_index in range(in_worksheet.nrows):
for col_index in range(in_worksheet.ncols):
cell_value = in_worksheet.cell(row_index, col_index).value
out_worksheet.write(row_index, col_index, cell_value)
print cell_value
out_workbook.close()