Passing XLSX sheet to csvDictReader - excel

Is there a way that I can pass an Excel sheet to csv DictReader without needing to create a new csv file from the sheet?
I want to be able to access the data contained in the Excel sheet with the same way that one can access data with csv DictReader

Per #Clade's suggestion I used pandas.read_excel
import csv
import easygui as eg
import pandas as pd
sheets = ['UC Apps', 'IOS Devices']
for sheet in sheets:
xlsx_data = pd.read_excel(eg.fileopenbox(), sheet, index_col=None)
csv_as_string = xlsx_data.to_csv(index=False)
reader = csv.DictReader(csv_as_string.splitlines())
for row in reader:
print(row)

Related

How to insert a new column in an excel workbook using xlwings library in Python?

I need to add a new empty column in between two columns of an existing excel using xlwings. How do I do that?
I need to use xlwings library itself as the project requirements need that library.
Please help me with the code
I am using this code :
import xlwings as xw
from xlwings.constants import DeleteShiftDirection
wb = xw.Book('input_file.xlsm')
wb.sheets['Sheet 1'].delete()
wb.sheets['Sheet 3'].delete()
sheet = wb.sheets['Sheet 2']
sheet.range('1:1').api.Delete(DeleteShiftDirection.xlShiftUp)
sheet.pictures[0].delete()
wb.sheets['Sheet 2'].range('I:I').insert()
wb.save('input_file.xlsm')
As #moken already commented:
# import the lib
import xlwings as xw
# create a workbook
wb = xw.Book()
# for the first sheet (index 0) in range from A to A insert a column
wb.sheets[0].range('A:A').insert()
If you already have an xml file, you may open it with pandas:
import pandas as pd
# new dataframe
df = pd.read_xml("path.xml")
Then it is up to you how to manipulate with the data

How to read sheet names of excel sheet from S3 in AWS Wrangler?

I have an excel sheet which is placed in S3 and I want to read sheet names of excel sheet.
I have read excel sheet with aws wrangler using awswrangler.s3.read_excel(path)
How can I read sheetnames using AWS Wrangler using Python?
According to the awswrangler docs of the read_excel() function:
This function accepts any Pandas’s read_excel() argument.
And in pandas:
sheet_name : str, int, list, or None, default 0
so you could try something like this:
import awswrangler as wr
wr.s3.read_excel(file_uri,sheet_name=your_sheet)
I am currently facing a similar problem in AWS Glue, but did not manage to get it working yet.
I'm not sure you can in Wrangler, or at least I haven't been able to figure it out. You can use Wrangler to download the sheet to a temporary file, then use pyxlsb/openpyxl (using both to cover all formats):
from openpyxl import load_workbook
from pyxlsb import open_workbook
import awswrangler as wr
import os
import pandas as pd
s3_src = 's3://bucket/folder/workbook.xlsb'
filename = os.path.basename(s3_src)
wr.s3.download(path=s3_src, local_file=filename)
if filename.endswith('.xlsb'):
workbook = open_workbook(filename)
sheets = workbook.sheets
else:
workbook = load_workbook(filename)
sheets = workbook.sheetnames
# Load all sheets into an array of dataframes
dfs = [pd.read_excel(filename, sheet_name=s) for s in sheets]
# Or now that you have the sheet names, load using Wrangler
dfs = [wr.s3.read_excel(s3_src, sheet_name=s) for s in sheets]
You could extract the names of the sheets & pass them as inputs to another process that does the extraction.

For Loop - Reading in all excel tabs into Panda Df's

I have an .xlsx book and I would like to write a function or loop that would create Panda(s) DF's for each tab in excel. So for example, let's say that I have an excel book called book.xlsx and tabs called sheet1 - sheet6. I would like to read in the excel file and create 6 Panda DF's (sheet1 - sheet6) from a function or loop?
To load the file:
path = '../files_to_load/my_file.xlsx'
print(path)
excel_file = pd.ExcelFile(path)
print('File uploaded ✔')
To get a specific sheet:
# Get a specific sheet
raw_data = excel_file.parse('sheet1')
Here an example for the Loop:
You will have all of you sheets stored in a list. All the sheets will be dataframes
In [1]:
import pandas as pd
path = 'my_path/my_file.xlsx'
excel_file = pd.ExcelFile(path)
sheets = []
for sheet in excel_file.sheet_names:
data = excel_file.parse(sheet)
sheets.append(data)
You need to set sheet_name argument to None - it would create an ordered dictionary of sheets stored as dataframes.
dataframes = pd.read_excel(file_name, sheet_name=None)
>>> type(dataframes)
<class 'collections.OrderedDict'>
>>> type(dataframes['first']) # `first` is the name a sheet
<class 'pandas.core.frame.DataFrame'>

How to save a Dataframe into an excel sheet without deleting other sheets?

I am triyng to pull some data from a stock market and saving them in different excel files. Every stock trade process has different timeframes like 1m, 3m, 5m, 15m and so on..
I want to create an excel file for each stock and different sheets for each time frames.
My code creates excel file for a stock (symbol) and adds sheets into it (1m,3m,5m...) and saves the file and then pulls the data from stock market api and saves into correct sheet. Such as ETH/BTC, create the file and sheets and pull "1m" data and save it into "1m" sheet.
Code creates file and sheets, I tested it.
The problem is after dataframe is written into excel file it deletes all other sheets. I tried to pull all data for each symbol. But when I opened the excel file only last time frame (1w) has been written and all other sheets are deleted. So please help.
I checked other problems but didn't find the same problem. At last part I am not trying to add a new sheet I am trying to save df to existed sheet.
#get_bars function pulls the data
def get_bars(symbol, interval):
.
.
.
return df
...
timeseries=['1m','3m','5m','15m','30m','1h','2h','4h','6h','12h','1d','1w']
from pandas import ExcelWriter
from openpyxl import load_workbook
for symbol in symbols:
file = ('C:/Users/mi/Desktop/Kripto/' + symbol + '.xlsx')
workbook = xlsxwriter.Workbook(file)
workbook.close()
wb = load_workbook(file)
for x in range(len(timeseries)):
ws = wb.create_sheet(timeseries[x])
print(wb.sheetnames)
wb.save(file)
workbook.close()
xrpusdt = get_bars(symbol,interval='1m')
writer = pd.ExcelWriter(file, engine='xlsxwriter')
xrpusdt.to_excel(writer, sheet_name='1m')
writer.save()
I think instead of defining the ExcelWriter as a variable, you need to use it in a With statement and use the append mode since you have already created an excel file using xlsxwriter like below
for x in range(len(timeseries)):
xrpusdt = get_bars(symbol,interval=timeseries[x])
with pd.ExcelWriter(file,engine='openpyxl', mode='a') as writer:
xrpusdt.to_excel(writer, sheet_name=timeseries[x])
And in your code above, you're using a static interval as "1m" in the xrpusdt variable which is changed into variable in this code.
Resources:
Pandas ExcelWriter: here you can see the use-case of append mode https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.ExcelWriter.html#pandas.ExcelWriter
Pandas df.to_excel: here you can see how to write to more than one sheet
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html

Python Merge Multiple Excel sheets to form a summary sheet

I need to merge data from multiple sheets of an Excel to form a new summary sheet using Python. I am using pandas to read the excel sheets and create new summary sheet. After concatenation the table format is getting lost i.e. Header and borders.
Is there a way to read from source sheet with the format and write to final sheet.
if first is not possible how to format the data after concatenation
Python Code to concatenate:
import pandas as pd
df = []
xlsFile = "some path excel"
sheetNames = ['Sheet1', 'Sheet2','Sheet3']
for nms in sheetNames:
data = pd.read_excel(xlsFile, sheet_name = nms, header=None, skiprows=1)
df.append(data)
final = "some other path excel "
df = pd.concat(df)
df.to_excel(final, index=False, header=None)
Sheet 1 Input Data
Sheet 2 Input Data
Sheet 3 Input Data
Summary Sheet output
You can try the following code:
df = pd.concat(pd.read_excel('some path excel.xlsx', sheet_name=None), ignore_index=True)
If you set sheet_name=None you can read all the sheets in the workbook at one time.
I suggest you the library xlrd
(https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966
and https://github.com/python-excel/xlrd)
It is a good library to do that.
from xlrd import open_workbook
path = '/Users/.../Desktop/Workbook1.xls'
wb = open_workbook(path, formatting_info=True)
sheet = wb.sheet_by_name("Sheet1")
cell = sheet.cell(0, 0) # The first cell
print("cell.xf_index is", cell.xf_index)
fmt = wb.xf_list[cell.xf_index]
print("type(fmt) is", type(fmt))
print("Dumped Info:")
fmt.dump()
see also:
Using XLRD module and Python to determine cell font style (italics or not)
and How to read excel cell and retain or detect its format in Python (I brought the above code from this address)

Resources