cannot get shutil.move to move files - python-3.x

So I'm trying to move my csv files from the source folder to the dest folder after performing an action on each file using nested for loops
Below are the nested for loops.
What's happening now is that the top file gets copied into the table in the database, but it doesn't get moved to destination folder after its contents are inserted into the sql table, and then the loop breaks after first run and prints the error in try block.
If I remove the shutil statement, all rows from each csv file successfully copies into database.
Essentially I want that to happen, but I also want to move each file, after I've copied all the data into the table, to the dest folder.
This script will be triggered on a power automate action that will run once a file is added to the folder. So I don't want to add/duplicate the rows in my database from the same file.
I'm also adding variables below this code so you can get an idea of what the function is doing as well.
Thanks for any help you can provide on this, and let me know if more clarification is needed.
My attempt:
for file in dir_list:
source = r"C:\Users\username\source\{}".format(file)
df = pd.read_csv(path2)
df = df.dropna()
rows= df_to_row_tuples(df)
for row in rows:
cursor.execute(sql, row)
conn.commit()
shutil.move(source, destination)
Variables:
def df_to_row_tuples(df):
df = df.fillna('')
rows = [tuple(cell) for cell in df.values]
return rows
conn = sqlite3.connect(r'C:\Users\some.db')
cursor = conn.cursor()
sql = "INSERT INTO tblrandomtble VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
path = r'C:\Users\username\source'
dir_list = os.listdir(path)
source=""
destination= r"C:\Users\username\destination"
df = pd.DataFrame()
rows = tuple()

If the file already exists, the move function will overwrite it, provided you pass the whole path...including the file name
So add the file name to the destination arg of the shutil.move function...

Related

Unable to add worksheets to a xlsx file in Python

I am trying to export data by running dynamically generated SQLs and storing the data into dataframes which I eventually export into an excel sheet. However, through I am able to generate the different results by successfully running the dynamic sqls, I am not able to export it into different worksheets within the same excel file. It eventually overwrites the previous result with the last resultant data.
for func_name in df_data['FUNCTION_NAME']:
sheet_name = func_name
sql = f"""select * from table({ev_dwh_name}.OVERDRAFT.""" + sheet_name + """())"""
print(sql)
dft_tf_data = pd.read_sql(sql,sf_conn)
print('dft_tf_data')
print(dft_tf_data)
# dft.to_excel(writer, sheet_name=sheet_name, index=False)
with tempfile.NamedTemporaryFile('w+b', suffix='.xlsx', delete=False) as fp:
#dft_tf_data.to_excel(writer, sheet_name=sheet_name, index=False)
print('Inside Temp File creation')
temp_file = path + f'/fp.xlsx'
writer = pd.ExcelWriter(temp_file, engine = 'xlsxwriter')
dft_tf_data.to_excel(writer, sheet_name=sheet_name, index=False)
writer.save()
print(temp_file)
I am trying to achieve the below scenario.
Based on the FUNCTION_NAME, it should add a new sheet in the existing excel and then write the data from the query into the worksheet.
The final file should have all the worksheets.
Is there a way to do it. Please suggest.
I'd only expect a file not found that to happen once (first run) if fp.xlsx doesn't exist. fp.xlsx gets created on the line
writer=
if it doesn't exist and since the line is referencing that file it must exist or the file not found error will occur. Once it exists then there should be no problems.
I'm not sure of the reasoning of creating a temp xlsx file. I dont see why it would be needed and you dont appear to use it.
The following works fine for me, where fp.xlsx initially saved as a blank workbook before running the code.
sheet_name = 'Sheet1'
with tempfile.NamedTemporaryFile('w+b', suffix='.xlsx', delete=False) as fp:
print('Inside Temp File creation')
temp_file = path + f'/fp.xlsx'
writer = pd.ExcelWriter(temp_file,
mode='a',
if_sheet_exists='overlay',
engine='openpyxl')
dft_tf_data.to_excel(writer,
sheet_name=sheet_name,
startrow=writer.sheets[sheet_name].max_row+2,
index=False)
writer.save()
print(temp_file)

Python 'for loop' will not export all records

I have a Python program that executes an Oracle stored procedure. The SP creates a temp table and then the Python program queries that table and writes the data to an XML file with formatting.
Forgive the noob question, but for some reason the for loop that I'm using to export to the XML file does not export all records. If I limit the query that creates the XML to 15 rows, it works and creates the file. For any value above 15, the program completes, but the file isn't created.
However, this isn't always consistent. If I do multiple runs for 15, 16, or 17 rows, I get a file. But if I try 20, no file is created. No errors, just no file.
This was the initial code. The 'sql' runs against an Oracle private temp table and formats the XML:
cursor.execute(sql)
rows = cursor.fetchall()
with open(filename, 'a') as f:
f.write('<ROWSET>')
for row in rows:
f.write(" ".join(row))
f.write('</ROWSET>')
cursor.close()
Then I changed it to this, but again, no file is created:
cursor.execute(sql)
with open(filename, 'a') as f:
f.write('<ROWSET>')
while True:
rows = cursor.fechmany(15)
for row in rows:
f.write(" ".join(row))
f.write('</ROWSET>')
cursor.close()
I've run the 'free' command and reviewed it with my DBA, and it doesn't appear to be a memory issue. The typical size of the output table is about 600 rows. The table itself has 36 columns.
The indentation may not look right the way I've pasted it here, but the program does work. I just need a way to export all rows. Any insight would be greatly appreciated.
I'm on a Linux box using Python 3.8.5.
Here is the query (minus proprietary information) that is executed against the temp table in the cursor.execute(sql):
SELECT XMLELEMENT("ROW",
XMLFOREST(
carrier_cd,
prscrbr_full_name,
prscrbr_first_name,
prscrbr_last_name,
d_phys_mstr_id,
prscrbr_id,
prscrbr_addr_line_1,
prscrbr_addr_line_2,
prscrbr_city,
prscrbr_state_cd,
prscrbr_zip,
specialty_cd_1,
specialty,
unique_patient_reviewed,
patient_count_db_oral,
patient_count_cv_aa,
patient_count_cv_lipo,
PDC_DIABETES,
PDC_HTN,
PDC_STATINS,
Rating_Diabetes,
Rating_HTN,
Rating_Statins,
PDC_DIABETES,
PDC_HTN,
PDC_STATINS,
M_PC_DB_ORAL,
M_PC_CV_AA,
M_PC_CV_LIPO,
M_PDC_DIABETES,
M_PDC_HTN,
M_PDC_STATINS
),
XMLAGG
(
XMLFOREST(
case when carrier_hq_cd is not null
then XMLConcat(
XMLELEMENT("PATIENT_ID", patient_id),
XMLELEMENT("PATIENT_NAME", patient_name),
XMLELEMENT("DOB", dob),
XMLELEMENT("PHONE_NO", phone_no),
XMLELEMENT("MEMBER_PDC_DIABETES", MEMBER_PDC_DIABETES),
XMLELEMENT("MEMBER_PDC_HTN", MEMBER_PDC_HTN),
XMLELEMENT("MEMBER_PDC_STATINS", MEMBER_PDC_STATINS)
)
end "PATIENT_INFO"
)
ORDER BY patient_id
)
)XMLOUT
FROM ORA$PTT_QCARD_TEMP
GROUP BY
carrier_cd,
prscrbr_full_name,
prscrbr_first_name,
prscrbr_last_name,
d_phys_mstr_id,
prscrbr_id,
prscrbr_addr_line_1,
prscrbr_addr_line_2,
prscrbr_city,
prscrbr_state_cd,
prscrbr_zip,
specialty_cd_1,
specialty,
unique_patient_reviewed,
patient_count_db_oral,
patient_count_cv_aa,
patient_count_cv_lipo,
PDC_Diabetes,
PDC_HTN,
PDC_Statins,
Rating_Diabetes,
Rating_HTN,
Rating_Statins,
M_PC_DB_ORAL,
M_PC_CV_AA,
M_PC_CV_LIPO,
M_PDC_DIABETES,
M_PDC_HTN,
M_PDC_STATINS
If I could, I'd give #Axe319 credit as his idea that it was a database problem was correct. For some reason, Python didn't like that long XML query, so I incorporated it into the stored procedure. Then, the Python was like this:
# SQL query for XML data.
sql_out = """select * from DATA_OUT"""
cursor.execute(sql_out)
columns = [i[0] for i in cursor.description]
allRows = cursor.fetchall()
# Open the file for writing and write the first row.
xmlFile = open(filename, 'w')
xmlFile.write('<ROWSET>')
# Loop through the allRows data set and write it to the file.
for rows in allRows:
columnNumber = 0
for column in columns:
data = rows[columnNumber]
if data == None:
data = ''
xmlFile.write('%s' % (data))
columnNumber += 1
# Write the final row and close the file.
xmlFile.write('</ROWSET>')
xmlFile.close()

Appending data from multiple excel files into a single excel file without overwriting using python pandas

Here is my current code below.
I have a specific range of cells (from a specific sheet) that I am pulling out of multiple (~30) excel files. I am trying to pull this information out of all these files to compile into a single new file appending to that file each time. I'm going to manually clean up the destination file for the time being as I will improve this script going forward.
What I currently have works fine for a single sheet but I overwrite my destination every time I add a new file to the read in list.
I've tried adding the mode = 'a' and a couple different ways to concat at the end of my function.
import pandas as pd
def excel_loader(fname, sheet_name, new_file):
xls = pd.ExcelFile(fname)
df1 = pd.read_excel(xls, sheet_name, nrows = 20)
print(df1[1:15])
writer = pd.ExcelWriter(new_file)
df1.insert(51, 'Original File', fname)
df1.to_excel(new_file)
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
for name in names:
excel_loader(name, 'specific_sheet_name', destination)
Thanks for any help in advance can't seem to find an answer to this exact situation on here. Cheers.
Ideally you want to loop through the files and read the data into a list, then concatenate the individual dataframes, then write the new dataframe. This assumes the data being pulled is the same size/shape and the sheet name is the same. If sheet name is changing, look into zip() function to send filename/sheetname tuple.
This should get you started:
names = ['sheet1.xlsx', 'sheet2.xlsx']
destination = 'destination.xlsx'
#read all files first
df_hold_list = []
for name in names:
xls = pd.ExcelFile(name)
df = pd.read_excel(xls, sheet_name, nrows = 20)
df_hold_list.append(df)
#concatenate dfs
df1 = pd.concat(df_hold_list, axis=1) # axis is 1 or 0 depending on how you want to cancatenate (horizontal vs vertical)
#write new file - may have to correct this piece - not sure what functions these are
writer = pd.ExcelWriter(destination)
df1.to_excel(destination)

Pandas Copy Values from Rows to other files without disturbing the existing data

I have 20 csv files pertaining to different individuals.
And I have a Main csv file, which is based on the final row values in specific columns. Below are the sample for both kinds of files.
All Individual Files look like this:
alex.csv
name,day,calls,closed,commision($)
alex,25-05-2019,68,6,15
alex,27-05-2019,71,8,20
alex,28-05-2019,65,7,17.5
alex,29-05-2019,68,8,20
stacy.csv
name,day,calls,closed,commision($)
stacy,25-05-2019,82,16,56.00
stacy,27-05-2019,76,13,45.50
stacy,28-05-2019,80,19,66.50
stacy,29-05-2019,79,18,63.00
But the Main File(single day report), which is the output file, looks like this:
name,day,designation,calls,weekly_avg_calls,closed,commision($)
alex,29-05-2019,rep,68,67,8,20
stacy,29-05-2019,sme,79,81,18,63
madhu,29-05-2019,rep,74,77,16,56
gabrielle,29-05-2019,rep,59,61,6,15
I require to copy the required values from the columns(calls,closed,commision($)) of the last line, for end-of-today's report, and then populate it to the Main File(template that already has some columns filled like the {name,day,designation....}).
And so, how can I write a for or a while program, for all the csv files in the "Employee_performance_DB" list.
Employee_performance_DB = ['alex.csv', 'stacy.csv', 'poduzav.csv', 'ankit.csv' .... .... .... 'gabrielle.csv']
for employee_db in Employee_performance_DB:
read_object = pd.read_csv(employee_db)
read_object2 = read_object.tail(1)
read_object2.to_csv("Main_Report.csv", header=False, index=False, columns=["calls", "closed", "commision($)"], mode='a')
How to copy values of {calls,closed,commision($)} from the 'Employee_performance_DB' list of files to the exact column in the 'Main_Report.csv' for those exact empolyees?
Well, as I had no answers for this, it took a while for me to find a solution.
The code below fixed my issue...
# Created a list of all the files in "employees_list"
employees_list = ['alex.csv', ......, 'stacy.csv']
for employees in employees_list:
read_object = pd.read_csv(employees)
read_object2 = read_object.tail(1)
read_object2.to_csv("Employee_performance_DB.csv", index=False, mode='a', header=False)

Deleting a particular column/row from a CSV file using python

I want to delete a particular row from a given user input that matches with a column.
Let's say I get an employee ID and delete all it's corresponding values in the row.
Not sure how to approach this problem and other sources suggest using a temporary csv file to copy all values and re-iterate.
Since these are very primitive requirements, I would just do it manually.
Read it line by line - if you want to delete the current line, just don't write it back.
If you want to delete a column, for each line, parse it as csv (using the module csv - do not use .split(',')!) and discard the correct column.
The upside of these solutions is that it's very light on the memory and as fast as it can be runtime-wise.
That's pretty much the way to do it.
Something like:
import shutil
file_path = "test.csv"
# Creates a test file
data = ["Employee ID,Data1,Data2",
"111,Something,Something",
"222,Something,Something",
"333,Something,Something"]
with open(file_path, 'w') as write_file:
for item in data:
write_file.write(item + "\n")
# /Creates a test file
input("Look at the test.csv file if you like, close it, then press enter.")
employee_ID = "222"
with open(file_path) as read_file:
with open("temp_file.csv", 'w') as temp_file:
for line in read_file:
if employee_ID in line:
next(read_file)
temp_file.write(line)
shutil.move("temp_file.csv", file_path)
If you have other data that may match the employee ID, then you'll have to parse the line and check the employee ID column specifically.

Resources