I'm trying to write all the currently available pandas dataframe in workspace to excel sheets. By following example from this SO thead, but I'm unable to make it work.
This is my not working code:
alldfs = {var: eval(var) for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)}
for df in alldfs.values():
print(df.name)
fmane = df+".xlsx"
writer = pd.ExcelWriter(fmane)
df.to_excel(writer)
writer.save()
Any help on how to correct this, so that I can pass the dataframe names to a variable, so that the excel filename being written can be same as the dataframe. I'm using spyder 4, python 3.8
Just a small fix will do the job:
alldfs = {var: eval(var) for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)}
for df_name, df in alldfs.items():
print(df_name)
fmane = df_name+".xlsx"
writer = pd.ExcelWriter(fmane)
df.to_excel(writer)
writer.save()
Related
I am exporting my dataframe to an Excel and conditionally formatting it with colors (So no PyExcelerate for me) and what takes the most time by far is the conversion toPandas, i was wondering if there is a way to do it with the spark dataframe, the code is this:
excel_writer_global = pd.ExcelWriter("excel_output.xlsx", engine='xlsxwriter')
# Create a Pandas dataframe from some data.
print_seconds_since_start("To pandas")
pd_df_a_escribir = df_a_escribir.toPandas()
print_seconds_since_start("Fin to pandas")
# Convert the dataframe to an XlsxWriter Excel object.
pd_df_a_escribir.to_excel(excel_writer, sheet_name=name_hoja)
# Get the xlsxwriter workbook and worksheet objects.
workbook = excel_writer.book
worksheet = excel_writer.sheets[name_hoja]
It would have to be a quicker solution as that is the problem right now,
thanks a lot in advance!
So I have been trying to create a dataframe from a mysql database using pandas and python but I have encountered an issue which I need help on.
The issue is when writing the dataframe to excel, it only writes the last row ie, it overwrites all the previous entries and only the last row is written. Please see the code below
import pandas as pd
import numpy
import csv
with open('C:path_to_file\\extract_job_details.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
jobid = str(row[1])
statement = """select jt.job_id ,jt.vendor_data_type,jt.id as TaskId,jt.create_time as CreatedTime,jt.job_start_time as StartedTime,jt.job_completion_time,jt.worker_path, j.id as JobId from dspe.job_task jt JOIN dspe.job j on jt.job_id = j.id where jt.job_id = %(jobid)s"""",
df_mysql = pd.read_sql(statement1, con=mysql_cn)
try:
with pd.ExcelWriter(timestr+'testResult.xlsx', engine='xlsxwriter') as writer:
df_mysql.to_excel(writer, sheet_name='Sheet1')
except pymysql.err.OperationalError as error:
code, message = error.args
mysql_cn.close()
Please can anyone help me identify where I am going wrong?
PS i am a new to pandas and python.
Thanks Carlos
I'm not really sure what you're trying to do reading from disk and a database at the same time...
First, you don't need csv when you're already using Pandas:
df = pd.read_csv("path/to/input/csv")
Next you can simply provide a file path as an argument to to_excel instead of an ExcelWriter instance:
df.to_excel("path/to/desired/excel/file")
If it doesn't actually need to be an excel file you can use:
df.to_csv("path/to/desired/csv/file")
I'm trying to use pandas read_excel to work with a file. The file has two columns of headers so I'm trying to use the multiIndex feature apart of the header keyword argument.
import pandas as pd, os
"""data in 2015 MOR Folder"""
filename = 'MOR-JANUARY 2015.xlsx'
print(os.path.isfile(filename))
df1 = pd.read_excel(filename, header=[0,1], sheetname='MOR')
print(df1)
the error I get is ValueError: Length of new names must be 1, got 2. The file is in this google drive folder https://drive.google.com/drive/folders/0B0ynKIVAlSgidFFySWJoeFByMDQ?usp=sharing
I'm trying to follow the solution posted here
Read excel sheet with multiple header using Pandas
I could be mistaken but I don't think pandas handles parsing excel rows where there are merged cells. So in that first row, the merged cells get parsed as mostly empty cells. You'd need them nicely repeated to act correctly. This is what motivates the ffill below. If you could control the Excel workbook ahead of time and you might be able to use the code you have.
my solution
It's not pretty, but it'll get it done.
filename = 'MOR-JANUARY 2015.xlsx'
df1 = pd.read_excel(filename, sheetname='MOR', header=None)
vals = df1.values
mux = pd.MultiIndex.from_arrays(df1.ffill(1).values[:2, 1:], names=[None, 'DATE'])
df1 = pd.DataFrame(df1.values[2:, 1:], df1.values[2:, 0], mux)
I am reading multiple CSVs (via URL) into multiple Pandas DataFrames and want to store the results of each CSV into separate excel worksheets (tabs). When I keep writer.save() inside the for loop, I only get the last result in a single worksheet. And when I move writer.save() outside the for loop, I only get the first result in a single worksheet. Both are wrong.
import requests
import pandas as pd
from pandas import ExcelWriter
work_statements = {
'sheet1': 'URL1',
'sheet2': 'URL2',
'sheet3': 'URL3'
}
for sheet, statement in work_statements.items():
writer = pd.ExcelWriter('B.xlsx', engine='xlsxwriter')
r = requests.get(statement) # go to URL
df = pd.read_csv(statement) # read from URL
df.to_excel(writer, sheet_name= sheet)
writer.save()
How can I get all three results in three separate worksheets?
You are re-initializing the writer object with each loop. Simply initialize it once before for and save document once after the loop. Also, in read_csv() line, you should be reading in the request content, not the URL (i.e., statement) saved in dictionary:
writer = pd.ExcelWriter('B.xlsx', engine='xlsxwriter')
for sheet, statement in work_statements.items():
r = requests.get(statement) # go to URL
df = pd.read_csv(r.content) # read from URL
df.to_excel(writer, sheet_name= sheet)
writer.save()
Fairly new to coding. I have looked at a couple of other similar questions about appending DataFrames in python but could not solve the problem.
I have the data below (CSV) in an excel xls file:
Venue Name,Cost ,Restriction,Capacity
Cinema,5,over 13,50
Bar,10,over 18,50
Restaurant,15,no restriction,25
Hotel,7,no restriction,100
I am using the code below to try to "filter" out rows which have "no restriction" under the restrictions column. The code seems to work right through to the last line i.e. both print statements are giving me what I would expect.
import pandas as pd
import numpy as np
my_file = pd.ExcelFile("venue data.xlsx")
mydata = my_file.parse(0, index_col = None, na_values = ["NA"])
my_new_file = pd.DataFrame()
for index in mydata.index:
if "no restriction" in mydata.Restriction[index]:
print (mydata.Restriction[index])
print (mydata.loc[index:index])
my_new_file.append(mydata.loc[index:index], ignore_index = True)
Don't loop through dataframes. It's almost never necessary.
Use:
df2 = df[df['Restriction'] != 'no restriction']
Or
df2 = df.query("Restriction != 'no restriction'")