Sometimes we open multi sheet excel file, do some operations in one sheet and then save it back in the same file or make a new file while saving. Given the operations are done in pandas dataframe, how can I copy back the result to the target sheet?
import openpyxl as op
from openpyxl.utils.dataframe import dataframe_to_rows
import pandas as pd
wbk=op.load_workbook("fileName.xlsx")
wsht=wbk['verbList']
#create dataframe with sheet data and operate
df = pd.read_excel("fileName.xlsx", sheet_name="verbList")
df.insert(0,"newCol2","") #sample operation
dataframe_to_rows(df, index=False, header=True) #dataframe converted to rows
#for loop from dataframe_to_rows moves back rows to excel file
#trying to avoid loops here
wsht["B1"].value="verbs"
wbk.save(basePath + "fileName-update.xlsx")
Any idea anyone?
If any other python excel library does the job, please let know.
Related
I am exporting my dataframe to an Excel and conditionally formatting it with colors (So no PyExcelerate for me) and what takes the most time by far is the conversion toPandas, i was wondering if there is a way to do it with the spark dataframe, the code is this:
excel_writer_global = pd.ExcelWriter("excel_output.xlsx", engine='xlsxwriter')
# Create a Pandas dataframe from some data.
print_seconds_since_start("To pandas")
pd_df_a_escribir = df_a_escribir.toPandas()
print_seconds_since_start("Fin to pandas")
# Convert the dataframe to an XlsxWriter Excel object.
pd_df_a_escribir.to_excel(excel_writer, sheet_name=name_hoja)
# Get the xlsxwriter workbook and worksheet objects.
workbook = excel_writer.book
worksheet = excel_writer.sheets[name_hoja]
It would have to be a quicker solution as that is the problem right now,
thanks a lot in advance!
I am reading an xlsx file using Python's Pandas pd.read_excel(myfile.xlsx,sheet_name="my_sheet",header=2) and writing the df to a csv file using df.to_csv.
The excel file contains several columns with percentage values in it (e.g. 27.44 %). In the dataframe the values are getting converted to 0.2744, i don't want any modification in data. How can i achieve this?
I already tried:
using lambda function to convert back value from 0.2744 to 27.44 % but this i don't want this because the column names/index are not fixed. It can be any col contain the % values
df = pd.read_excel(myexcel.xlsx,sheet_name="my_sheet",header=5,dtype={'column_name':str}) - Didn't work
df = pd.read_excel(myexcel.xlsx,sheet_name="my_sheet",header=5,dtype={'column_name':object}) - Didn't work
Tried xlrd module also, but that too converted % values to float.
df = pd.read_excel(myexcel.xlsx,sheet_name="my_sheet")
df.to_csv(mycsv.csv,sep=",",index=False)
from your xlsx save the file directly in csv format
To import your csv file use pandas library as follow:
import pandas as pd
df=pd.read_csv('my_sheet.csv') #in case your file located in the same directory
more information on pandas.read_csv
So I have been trying to create a dataframe from a mysql database using pandas and python but I have encountered an issue which I need help on.
The issue is when writing the dataframe to excel, it only writes the last row ie, it overwrites all the previous entries and only the last row is written. Please see the code below
import pandas as pd
import numpy
import csv
with open('C:path_to_file\\extract_job_details.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
jobid = str(row[1])
statement = """select jt.job_id ,jt.vendor_data_type,jt.id as TaskId,jt.create_time as CreatedTime,jt.job_start_time as StartedTime,jt.job_completion_time,jt.worker_path, j.id as JobId from dspe.job_task jt JOIN dspe.job j on jt.job_id = j.id where jt.job_id = %(jobid)s"""",
df_mysql = pd.read_sql(statement1, con=mysql_cn)
try:
with pd.ExcelWriter(timestr+'testResult.xlsx', engine='xlsxwriter') as writer:
df_mysql.to_excel(writer, sheet_name='Sheet1')
except pymysql.err.OperationalError as error:
code, message = error.args
mysql_cn.close()
Please can anyone help me identify where I am going wrong?
PS i am a new to pandas and python.
Thanks Carlos
I'm not really sure what you're trying to do reading from disk and a database at the same time...
First, you don't need csv when you're already using Pandas:
df = pd.read_csv("path/to/input/csv")
Next you can simply provide a file path as an argument to to_excel instead of an ExcelWriter instance:
df.to_excel("path/to/desired/excel/file")
If it doesn't actually need to be an excel file you can use:
df.to_csv("path/to/desired/csv/file")
I have a large excel file which I have imported into pandas, made up of 92 sheets.
I want to use a loop or some tool to generate dataframes from the data in each spreadsheet (one dataframe from each spreadsheet), which also automatically names each dataframe.
I have only just started using pandas and jupyter so I am not very experienced at all.
This is the code I have so far:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import datetime
%matplotlib inline
concdata = pd.ExcelFile('Documents/Research Project/Data-Ana/11July-27Dec.xlsx')
I also have a list of all the spreadsheet names:
#concdata.sheet_names
Thanks!
Instead of making each DataFrame its own variable you can assign each sheet a name in a Python dictionary like so:
dfs = {}
for sheet in concdata.sheet_names:
dfs[sheet] = concdata.parse(sheet)
And then access each DataFrame with the sheet name:
dfs['sheet_name_here']
Doing it this way allows you to have amortised O(1) lookup of sheets.
I am reading multiple CSVs (via URL) into multiple Pandas DataFrames and want to store the results of each CSV into separate excel worksheets (tabs). When I keep writer.save() inside the for loop, I only get the last result in a single worksheet. And when I move writer.save() outside the for loop, I only get the first result in a single worksheet. Both are wrong.
import requests
import pandas as pd
from pandas import ExcelWriter
work_statements = {
'sheet1': 'URL1',
'sheet2': 'URL2',
'sheet3': 'URL3'
}
for sheet, statement in work_statements.items():
writer = pd.ExcelWriter('B.xlsx', engine='xlsxwriter')
r = requests.get(statement) # go to URL
df = pd.read_csv(statement) # read from URL
df.to_excel(writer, sheet_name= sheet)
writer.save()
How can I get all three results in three separate worksheets?
You are re-initializing the writer object with each loop. Simply initialize it once before for and save document once after the loop. Also, in read_csv() line, you should be reading in the request content, not the URL (i.e., statement) saved in dictionary:
writer = pd.ExcelWriter('B.xlsx', engine='xlsxwriter')
for sheet, statement in work_statements.items():
r = requests.get(statement) # go to URL
df = pd.read_csv(r.content) # read from URL
df.to_excel(writer, sheet_name= sheet)
writer.save()