how to open xlsx file with python 3 - python-3.x

I have an xlsx file with 1 sheet.
I am trying to open it using python 3 (xlrd lib), but I get an empty file!
I use this code:
file_errors_location = "C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx"
workbook_errors = xlrd.open_workbook(file_errors_location)
and I have no errors, but when I type:
workbook_errors.nsheets
I get "0", even the file has some sheets... when I type:
workbook_errors
I get:
xlrd.book.Book object at 0x2..
any help? thanks

You can use Pandas pandas.read_excel just like pandas.read_csv:
import pandas as pd
file_errors_location = 'C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx'
df = pd.read_excel(file_errors_location)
print(df)

There are two modules for reading xls file : openpyxl and xlrd
This script allow you to transform a excel data to list of dictionnaries using xlrd
import xlrd
workbook = xlrd.open_workbook('C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx')
workbook = xlrd.open_workbook('C:\\Users\\atheelm\\Documents\\python excel mission\\errors1.xlsx', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock the name of the column
for col in range(worksheet.ncols):
first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnary
data =[]
for row in range(1, worksheet.nrows):
elm = {}
for col in range(worksheet.ncols):
elm[first_row[col]]=worksheet.cell_value(row,col)
data.append(elm)
print data

Unfortunately, the python engine 'xlrd' that is required to read the Excel docs has explicitly removed support for anything other than xls files.
So here's how you can do it now -
Install openpyxl:
https://openpyxl.readthedocs.io/en/stable/
Change your pandas code to:
pandas.read_excel('cat.xlsx', engine='openpyxl')
Note: This worked for me with the latest version of Pandas (i.e. 1.1.5). Previously, I was using version 0.24.0 and it didn't work so I had to update to latest version.

Another way to do it:
import openpyxl
workbook_errors = openpyxl.Workbook()
workbook_errors = openpyxl.load_workbook(file_errors_location)

Related

csv (;) to excel and back to csv(;), comma disapears

This drives me crazy.
I have the following csv file:
Short name;Calculation;29221
peter;foster;1,755345
karin;paris;0,2343543
john;dee;0
lisa;long;1,434534
lauren;lovely;0,123124
linda;loss;0,0234
I read this file in pandas, print it and everything looks fine in pandas.
Then I write it to an existing excel workbook and the values are partly corrupted.
THis is my code
import pandas as pd
import xlwings as xw
#öffne csv
QTH = pd.read_csv(r"C:/Users/A692517/PhytonStuff/testCSVtoExcel.csv",sep = ';')#,
# engine = 'python')
for idx, row in QTH.iterrows():
#c=QoSFTTH[row[2]].at[idx]
myString = str(row[2])
row[2]=myString
#ziel workbook
fn="C:/Users/A692517/PhytonStuff/myClist.xlsx"
wb = xw.Book(fn)
ws = wb.sheets["Tabelle1"]
#schreibe QoSFTTH dataframe in zielworkbook
ws["A1"].options(pd.DataFrame, header=1, index=False, expand='table').value = QTH
wb.save(fn)
wb.close()
When I export the Excel result in a new csv(;) you see what I mean:
Short name;Calculation;29221,00
peter;foster;1755345,00
karin;paris;0,2343543
john;dee;0,00
lisa;long;1434534,00
lauren;lovely;0,123124
linda;loss;0,0234
You may have stumbled on a pd.read_csv bug found via this stack question. Change the engine to engine = c and try thousands=','.
pd.read_csv('path', sep=';', thousands=',', engine='c')

How to insert a new column in an excel workbook using xlwings library in Python?

I need to add a new empty column in between two columns of an existing excel using xlwings. How do I do that?
I need to use xlwings library itself as the project requirements need that library.
Please help me with the code
I am using this code :
import xlwings as xw
from xlwings.constants import DeleteShiftDirection
wb = xw.Book('input_file.xlsm')
wb.sheets['Sheet 1'].delete()
wb.sheets['Sheet 3'].delete()
sheet = wb.sheets['Sheet 2']
sheet.range('1:1').api.Delete(DeleteShiftDirection.xlShiftUp)
sheet.pictures[0].delete()
wb.sheets['Sheet 2'].range('I:I').insert()
wb.save('input_file.xlsm')
As #moken already commented:
# import the lib
import xlwings as xw
# create a workbook
wb = xw.Book()
# for the first sheet (index 0) in range from A to A insert a column
wb.sheets[0].range('A:A').insert()
If you already have an xml file, you may open it with pandas:
import pandas as pd
# new dataframe
df = pd.read_xml("path.xml")
Then it is up to you how to manipulate with the data

How to read sheet names of excel sheet from S3 in AWS Wrangler?

I have an excel sheet which is placed in S3 and I want to read sheet names of excel sheet.
I have read excel sheet with aws wrangler using awswrangler.s3.read_excel(path)
How can I read sheetnames using AWS Wrangler using Python?
According to the awswrangler docs of the read_excel() function:
This function accepts any Pandas’s read_excel() argument.
And in pandas:
sheet_name : str, int, list, or None, default 0
so you could try something like this:
import awswrangler as wr
wr.s3.read_excel(file_uri,sheet_name=your_sheet)
I am currently facing a similar problem in AWS Glue, but did not manage to get it working yet.
I'm not sure you can in Wrangler, or at least I haven't been able to figure it out. You can use Wrangler to download the sheet to a temporary file, then use pyxlsb/openpyxl (using both to cover all formats):
from openpyxl import load_workbook
from pyxlsb import open_workbook
import awswrangler as wr
import os
import pandas as pd
s3_src = 's3://bucket/folder/workbook.xlsb'
filename = os.path.basename(s3_src)
wr.s3.download(path=s3_src, local_file=filename)
if filename.endswith('.xlsb'):
workbook = open_workbook(filename)
sheets = workbook.sheets
else:
workbook = load_workbook(filename)
sheets = workbook.sheetnames
# Load all sheets into an array of dataframes
dfs = [pd.read_excel(filename, sheet_name=s) for s in sheets]
# Or now that you have the sheet names, load using Wrangler
dfs = [wr.s3.read_excel(s3_src, sheet_name=s) for s in sheets]
You could extract the names of the sheets & pass them as inputs to another process that does the extraction.

How to handle excel software using python?

I want to handle Open,creat new file, Saveas, delete etc using python
Use pyexcel to handle the file.
You don't handle the software (excel.exe) therefore you handle the files such as .csv or .xlsx.
$ pip install pyexcel
Here the documentation:
Read the docs pyexce
Sample:
>>> import pyexcel
>>> reader = pyexcel.Reader("example.csv""")
>>> print reader[1][1]
5
Openpyxl is a great module that I have been using to work with Excel files (docs):
pip3 install openpyxl
creating and saving a Workbook:
from openpyxl import Workbook
wb = Workbook()
wb.save('balances.xlsx')
Selecting a specific worksheet:
worksheet = wb["Sheet1"]
Reading and writing data from worksheet:
c = ws['A4']
ws['A4'] = 4
or:
d = ws.cell(row=4, column=2)

Iteration error writing file to excel with python

import string
import xlrd
import xlsxwriter
workbook = xlsxwriter.Workbook('C:\T\file.xlsx')
worksheet = workbook.add_worksheet()
book = open_workbook(r'C:\T\test.xls','r')
sheet = book.sheet_by_index(0)
for row_index in range(sheet.nrows):
for col_index in range(sheet.ncols):
print sheet.cell(row_index,0).value
x = sheet.cell(row_index,0).value
worksheet.write_string(row_index,col_index,x)
workbook.close()
I'm a skiddy to python. Here i'm trying to read the xls file with xlrd for data and copy it to another xlsx file through xlsxwriter module. but the data won't get pasted in the created xlsx sheet. Please guide me through this. Above is my exact code. Please correct me if any wrong.
A volley of Thanks in advance.
Your example program almost works. Mainly it needs the open_workbook() method to be prefixed with a class and it is better to use XlsxWriter write() instead of write_string() unless you are sure that all the data you are reading is of a string type. Also, the program was only reading values from column 0.
Here is the same example with those changes in place. I've also renamed the variables in_ and out_ to make it clearer which module is calling which method:
import xlrd
import xlsxwriter
out_workbook = xlsxwriter.Workbook('file.xlsx')
out_worksheet = out_workbook.add_worksheet()
in_workbook = xlrd.open_workbook(r'test.xls', 'r')
in_worksheet = in_workbook.sheet_by_index(0)
for row_index in range(in_worksheet.nrows):
for col_index in range(in_worksheet.ncols):
cell_value = in_worksheet.cell(row_index, col_index).value
out_worksheet.write(row_index, col_index, cell_value)
print cell_value
out_workbook.close()

Resources