Way to compare two excel files and CSV file - excel

I need to compare two excel files and a csv file, then write some data from one excel file to another.
It looks like this:
CSV file with names which I will compare. For example (spam, eggs)
First Excel file with name and value of it. For example (spam, 100)
Second Excel file with name. For example (eggs)
Now, when I input file (second) into program I need to ensure that eggs == spam with csv file and then save value of 100 to the eggs.
For operating on excel files I'm using openpyxl and for csv I'm using csv.
Can I count on your help? Maybe there are better libraries to do that, because my trials proved to be a total failure.

Got it by myself. Some complex way, but it works like I wanted to. Will be glad for some tips to it.
import openpyxl
import numpy as np
lines = np.genfromtxt("csvtest.csv", delimiter=";", dtype=None)
compdict = dict()
for i in range(len(lines)):
compdict[lines[i][0]] = lines[i][1]
wb1 = openpyxl.load_workbook('inputtest.xlsx')
wb2 = openpyxl.load_workbook(filename='spistest.xlsx')
ws = wb1.get_sheet_by_name('Sheet1')
spis = wb2.get_sheet_by_name('Sheet1')
for row in ws.iter_rows(min_row=1, max_row=ws.max_row, min_col=1):
for cell in row:
if cell.value in compdict:
for wiersz in spis.iter_rows(min_row=1, max_row=spis.max_row, min_col=1):
for komorka in wiersz:
if komorka.value == compdict[cell.value]:
cena = spis.cell(row=komorka.row, column=2)
ws.cell(row=cell.row, column=2, value=cena.value)
wb1.save('inputtest.xlsx')
wb2.close()

Related

Appending data from multiple excel files into a single excel file without overwriting using python pandas

Here is my current code below.
I have a specific range of cells (from a specific sheet) that I am pulling out of multiple (~30) excel files. I am trying to pull this information out of all these files to compile into a single new file appending to that file each time. I'm going to manually clean up the destination file for the time being as I will improve this script going forward.
What I currently have works fine for a single sheet but I overwrite my destination every time I add a new file to the read in list.
I've tried adding the mode = 'a' and a couple different ways to concat at the end of my function.
import pandas as pd
def excel_loader(fname, sheet_name, new_file):
xls = pd.ExcelFile(fname)
df1 = pd.read_excel(xls, sheet_name, nrows = 20)
print(df1[1:15])
writer = pd.ExcelWriter(new_file)
df1.insert(51, 'Original File', fname)
df1.to_excel(new_file)
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
for name in names:
excel_loader(name, 'specific_sheet_name', destination)
Thanks for any help in advance can't seem to find an answer to this exact situation on here. Cheers.
Ideally you want to loop through the files and read the data into a list, then concatenate the individual dataframes, then write the new dataframe. This assumes the data being pulled is the same size/shape and the sheet name is the same. If sheet name is changing, look into zip() function to send filename/sheetname tuple.
This should get you started:
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
#read all files first
df_hold_list = []
for name in names:
xls = pd.ExcelFile(name)
df = pd.read_excel(xls, sheet_name, nrows = 20)
df_hold_list.append(df)
#concatenate dfs
df1 = pd.concat(df_hold_list, axis=1) # axis is 1 or 0 depending on how you want to cancatenate (horizontal vs vertical)
#write new file - may have to correct this piece - not sure what functions these are
writer = pd.ExcelWriter(destination)
df1.to_excel(destination)

How to save a Dataframe into an excel sheet without deleting other sheets?

I am triyng to pull some data from a stock market and saving them in different excel files. Every stock trade process has different timeframes like 1m, 3m, 5m, 15m and so on..
I want to create an excel file for each stock and different sheets for each time frames.
My code creates excel file for a stock (symbol) and adds sheets into it (1m,3m,5m...) and saves the file and then pulls the data from stock market api and saves into correct sheet. Such as ETH/BTC, create the file and sheets and pull "1m" data and save it into "1m" sheet.
Code creates file and sheets, I tested it.
The problem is after dataframe is written into excel file it deletes all other sheets. I tried to pull all data for each symbol. But when I opened the excel file only last time frame (1w) has been written and all other sheets are deleted. So please help.
I checked other problems but didn't find the same problem. At last part I am not trying to add a new sheet I am trying to save df to existed sheet.
#get_bars function pulls the data
def get_bars(symbol, interval):
.
.
.
return df
...
timeseries=['1m','3m','5m','15m','30m','1h','2h','4h','6h','12h','1d','1w']
from pandas import ExcelWriter
from openpyxl import load_workbook
for symbol in symbols:
file = ('C:/Users/mi/Desktop/Kripto/' + symbol + '.xlsx')
workbook = xlsxwriter.Workbook(file)
workbook.close()
wb = load_workbook(file)
for x in range(len(timeseries)):
ws = wb.create_sheet(timeseries[x])
print(wb.sheetnames)
wb.save(file)
workbook.close()
xrpusdt = get_bars(symbol,interval='1m')
writer = pd.ExcelWriter(file, engine='xlsxwriter')
xrpusdt.to_excel(writer, sheet_name='1m')
writer.save()
I think instead of defining the ExcelWriter as a variable, you need to use it in a With statement and use the append mode since you have already created an excel file using xlsxwriter like below
for x in range(len(timeseries)):
xrpusdt = get_bars(symbol,interval=timeseries[x])
with pd.ExcelWriter(file,engine='openpyxl', mode='a') as writer:
xrpusdt.to_excel(writer, sheet_name=timeseries[x])
And in your code above, you're using a static interval as "1m" in the xrpusdt variable which is changed into variable in this code.
Resources:
Pandas ExcelWriter: here you can see the use-case of append mode https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.ExcelWriter.html#pandas.ExcelWriter
Pandas df.to_excel: here you can see how to write to more than one sheet
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html

Python Merge Multiple Excel sheets to form a summary sheet

I need to merge data from multiple sheets of an Excel to form a new summary sheet using Python. I am using pandas to read the excel sheets and create new summary sheet. After concatenation the table format is getting lost i.e. Header and borders.
Is there a way to read from source sheet with the format and write to final sheet.
if first is not possible how to format the data after concatenation
Python Code to concatenate:
import pandas as pd
df = []
xlsFile = "some path excel"
sheetNames = ['Sheet1', 'Sheet2','Sheet3']
for nms in sheetNames:
data = pd.read_excel(xlsFile, sheet_name = nms, header=None, skiprows=1)
df.append(data)
final = "some other path excel "
df = pd.concat(df)
df.to_excel(final, index=False, header=None)
Sheet 1 Input Data
Sheet 2 Input Data
Sheet 3 Input Data
Summary Sheet output
You can try the following code:
df = pd.concat(pd.read_excel('some path excel.xlsx', sheet_name=None), ignore_index=True)
If you set sheet_name=None you can read all the sheets in the workbook at one time.
I suggest you the library xlrd
(https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966
and https://github.com/python-excel/xlrd)
It is a good library to do that.
from xlrd import open_workbook
path = '/Users/.../Desktop/Workbook1.xls'
wb = open_workbook(path, formatting_info=True)
sheet = wb.sheet_by_name("Sheet1")
cell = sheet.cell(0, 0) # The first cell
print("cell.xf_index is", cell.xf_index)
fmt = wb.xf_list[cell.xf_index]
print("type(fmt) is", type(fmt))
print("Dumped Info:")
fmt.dump()
see also:
Using XLRD module and Python to determine cell font style (italics or not)
and How to read excel cell and retain or detect its format in Python (I brought the above code from this address)

How to loop through Excel documents in Python?

I have downloaded a module called openpyxl which I intend to use to extract data (parts numbers) out from our many Excel files and then write them into a single file. I have not used Python much and am wondering how I could alter the the following code so that the script would open a spreadsheet, run some code on that spreadsheet, and then move onto the next one. If it was a list or a string of some sort I could write for loops for it but for actual spreadsheets I don't know how this would be done.
Can anyone offer any advice on how to loop through documents like this?
from openpyxl import load_workbook
>>> wb2 = load_workbook('test.xlsx')
>>> print wb2.get_sheet_names()
['Sheet2', 'New Title', 'Sheet1']
You could get a list of the spreadsheets with os.listdir() and then extract the data using a for loop, like so:
import os
from openpyxl import load_workbook
path = "path/to/folder" # The folder containing the spreadsheets
sheets = os.listdir(path)
for sheet in sheets:
wb2 = load_workbook(os.path.join(path, sheet))
print(wb2.get_sheet_names())
wb2._archive.close()

Read Excel Cells and Copy content to txt file

I am working with RapidMiner at the moment and am trying to copy my RapidMiner results which are in xlsx files to txt files in order to do some further processing with python. I do have plain text in column A (A1-A1500) as well as the according filename in column C (C1-C1500).
Now my question:
Is there any possibility (I am thinking of the xlrd module) to read the content of every cell in column A and print this to a new created txt file with the filename being given in corresponding column C?
As I have never worked with the xlrd module before I am a bit lost at the moment...
I can recommend openpyxl for every tasks concerning .xlsx handling.
For your requirements:
from openpyxl import *
import os
p = 'path/to/the/folder/with/your/.xlsx'
files = [_ for _ in os.listdir(p) if _.endswith('.xlsx')]
for f in files:
wb = load_workbook(os.path.join(p, f))
ws = wb['name_of_sheet']
for row in ws.rows:
with open(row[2].value+'.txt', 'w') as outfile:
outfile.write(row[0].value)
Good day! So, I'm not sure I understand your question correctly, but have you tried a combination of Read Excel operator with the Loop Examples operator? Your loop subprocess could then use Write CSV operator or similar.
Thanks to #corinna the final code is:
from openpyxl import *
import os
p = r'F:\Results'
files = [_ for _ in os.listdir(p) if _ .endswith('.xlsx')]
os.chdir(r"F:\Results")
for f in files:
file_location = load_workbook(os.path.join(p, f))
sheet = file_location['Normal']
for row in sheet.rows:
with open(row[2].value + '.txt', "w") as outfile:
outfile.write(row[0].value)

Resources