pandas creating a dataframe from mysql database - python-3.x

So I have been trying to create a dataframe from a mysql database using pandas and python but I have encountered an issue which I need help on.
The issue is when writing the dataframe to excel, it only writes the last row ie, it overwrites all the previous entries and only the last row is written. Please see the code below
import pandas as pd
import numpy
import csv
with open('C:path_to_file\\extract_job_details.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
jobid = str(row[1])
statement = """select jt.job_id ,jt.vendor_data_type,jt.id as TaskId,jt.create_time as CreatedTime,jt.job_start_time as StartedTime,jt.job_completion_time,jt.worker_path, j.id as JobId from dspe.job_task jt JOIN dspe.job j on jt.job_id = j.id where jt.job_id = %(jobid)s"""",
df_mysql = pd.read_sql(statement1, con=mysql_cn)
try:
with pd.ExcelWriter(timestr+'testResult.xlsx', engine='xlsxwriter') as writer:
df_mysql.to_excel(writer, sheet_name='Sheet1')
except pymysql.err.OperationalError as error:
code, message = error.args
mysql_cn.close()
Please can anyone help me identify where I am going wrong?
PS i am a new to pandas and python.
Thanks Carlos

I'm not really sure what you're trying to do reading from disk and a database at the same time...
First, you don't need csv when you're already using Pandas:
df = pd.read_csv("path/to/input/csv")
Next you can simply provide a file path as an argument to to_excel instead of an ExcelWriter instance:
df.to_excel("path/to/desired/excel/file")
If it doesn't actually need to be an excel file you can use:
df.to_csv("path/to/desired/csv/file")

Related

Any optimize way to iterate excel and provide data into pd.read_sql() as a string one by one

#here I have to apply the loop which can provide me the queries from excel for respective reports:
df1 = pd.read_sql(SQLqueryB2, con=con1)
df2 = pd.read_sql(ORCqueryC2, con=con2)
if (df1.equals(df2)):
print(Report2 +" : is Pass")
Can we achieve above by something doing like this (by iterating ndarray)
df = pd.read_excel(path) for col, item in df.iteritems():
OR do the only option left to read the excel from "openpyxl" library and iterate row, columns and then provide the values. Hope I am clear with the question, if any doubt please comment me.
You are trying to loop through an excel file, run the 2 queries, see if they match and output the result, correct?
import pandas as pd
from sqlalchemy import create_engine
# add user, pass, database name
con = create_engine(f"mysql+pymysql://{USER}:{PWD}#{HOST}/{DB}")
file = pd.read_excel('excel_file.xlsx')
file['Result'] = '' # placeholder
for i, row in file.iterrows():
df1 = pd.read_sql(row['SQLQuery'], con)
df2 = pd.read_sql(row['Oracle Queries'], con)
file.loc[i, 'Result'] = 'Pass' if df1.equals(df2) else 'Fail'
file.to_excel('results.xlsx', index=False)
This will save a file named results.xlsx that mirrors the original data but adds a column named Result that will be Pass or Fail.
Example results.xlsx:

Write all pandas dataframe in workspace to excel

I'm trying to write all the currently available pandas dataframe in workspace to excel sheets. By following example from this SO thead, but I'm unable to make it work.
This is my not working code:
alldfs = {var: eval(var) for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)}
for df in alldfs.values():
print(df.name)
fmane = df+".xlsx"
writer = pd.ExcelWriter(fmane)
df.to_excel(writer)
writer.save()
Any help on how to correct this, so that I can pass the dataframe names to a variable, so that the excel filename being written can be same as the dataframe. I'm using spyder 4, python 3.8
Just a small fix will do the job:
alldfs = {var: eval(var) for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)}
for df_name, df in alldfs.items():
print(df_name)
fmane = df_name+".xlsx"
writer = pd.ExcelWriter(fmane)
df.to_excel(writer)
writer.save()

Get second column of a data frame using pandas

I am new to Pandas in Python and I am having some difficulties returning the second column of a dataframe without column names just numbers as indexes.
import pandas as pd
import os
directory = 'A://'
sample = 'test.txt'
# Test with Air Sample
fileAir = os.path.join(directory,sample)
dataAir = pd.read_csv(fileAir,skiprows=3)
print(dataAir.iloc[:,1])
The data I am working with would be similar to:
data = [[1,2,3],[1,2,3],[1,2,3]]
Then, using pandas I wanted to have only
[[2,2,2]].
You can use
dataframe_name[column_index].values
like
df[1].values
or
dataframe_name['column_name'].values
like
df['col1'].values

Pandas read_csv to adding some very small values to the dataframe

When i use pandas read_csv, pandas add some little value to the dataframe, it went from -0.079257 to -0.07925700000000001, why is this happening and how can I fix this? It also only happen to some specific values, while others seems fine.
I've tried using float_precision but seems doesn't do anything, I'm new to pandas
df = pd.read_csv('filepath')
print(df.iat[0,0])
Dataset Link
I changed the dataset file type from txt to csv manually using notepad.
Dataset Image
This is because your original data have a np.float32 precision.
import pandas as pd
df = pd.read_csv('./avila/avila-ts.txt')
print(df.iat[0,0]) # 0.13029200000000002
# stored as np.float32
df.to_csv('./my.csv',float_format=np.float32, index_label=False)
df_1 = pd.read_csv('./my.csv')
print(df_1.iat[0,0]) # 0.13029200000000002
# stored as np.float16
df.to_csv('./my.csv',float_format=np.float16, index_label=False)
df_1 = pd.read_csv('./my.csv')
print(df_1.iat[0,0]) # 0.1302
I don't know what your data is structured. could you open the data and check, better still screenshot.
data = pandas.read_csv('filepath')
data.head()

Python Pandas dataframe, how to integrate new columns into a new csv

guys, I need a bit help on Pandas and would appreciate greatly your inputs.
My original file looks like this:
I would like to convert it by mergering some pairs of columns (generating their averages) and returns a new file looking like this:
Also, if possible, I would also like to split the column 'RateDateTime' into two columns, one contains the date, the other contains only the time. How should I do it? I tried coding as belows but it doesn't work:
import pandas as pd
dateparse = lambda x: pd.datetime.strptime(x, '%Y/%m/%d %H:%M:%S')
df = pd.read_csv('data.csv', parse_dates=['RateDateTime'], index_col='RateDateTime',date_parser=dateparse)
a=pd.to_numeric(df['RateAsk_open'])
b=pd.to_numeric(df['RateAsk_high'])
c=pd.to_numeric(df['RateAsk_low'])
d=pd.to_numeric(df['RateAsk_close'])
e=pd.to_numeric(df['RateBid_open'])
f=pd.to_numeric(df['RateBid_high'])
g=pd.to_numeric(df['RateBid_low'])
h=pd.to_numeric(df['RateBid_close'])
df['Open'] = (a+e) /2
df['High'] = (b+f) /2
df['Low'] = (c+g) /2
df['Close'] = (d+h) /2
grouped = df.groupby('CurrencyPair')
Open=grouped['Open']
High=grouped['High']
Low=grouped['Low']
Close=grouped['Close']
w=pd.concat([Open, High,Low,Close], axis=1, keys=['Open', 'High','Low','Close'])
w.to_csv('w.csv')
Python returns:
TypeError: cannot concatenate object of type "<class 'pandas.core.groupby.groupby.SeriesGroupBy'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
Can someone help me please? Many thanks!!!
IIUYC, you don't need grouping here. You can simply update existing dataframe with new columns and specify, what columns you need to save to csv file in to_csv method. Here is example:
df['Open'] = df[['RateAsk_open', 'RateBid_open']].mean(axis=1)
df['RateDate'] = df['RateDateTime'].dt.date
df['RateTime'] = df['RateDateTime'].dt.time
df.to_csv('w.csv', columns=['CurrencyPair', 'Open', 'RateDate', 'RateTime'])

Resources