read_csv won't read from a file list and specified directory - python-3.x

I have an issue with using Python read_csv in a function. If I use the following code, there is no issue. It will read the two files which I can then append to a single dataframe output:
directory = r"\\*my directory*"
files1 = ['001 Data.txt', '002 Data.txt']
def read_data(directory, files):
list_ = []
for file in files:
df = pd.read_csv(directory + '\\' + file, sep='\t', header=0)
...do stuff...
list_.append()
return list_
df_mct20 = read_data(directory, files1) # This will generate my list which I can then concatenate
df_final = pd.concat(df_mct20)
The above code works fine. However, if I call this exact "read_data()" function in a for loop, I get the "No such file or directory" error:
files2 = ['003 Data.txt', '004 Data.txt']
for file in files2:
df2 = read_data(directory, file) # The error shows up here "No such file or directory"
...want to do stuff...
I've tried a number of things and can't seem to get it to work. Any help will be greatly appreciated~

The second input of your function has to be a list of filenames. So you cannot loop in a series of strings into the function itself.
You either need to re-write your function to just take in one filename at a time, or load in your files2 as a list, and not loop the filenames in files2 in.

Related

File comparison in two directories

I am comparing all files in two directories, if comparison is greater than 90% so i continue the outer loop and i want to remove the file in the second directory that was matched so that the second file in the first directory doesn't compare with the file that's already matched.
Here's what i've tried:
for i for i in sorted_files:
for j in sorted_github_files:
#pdb.set_trace()
with open(f'./files/{i}') as f1:
try:
text1 = f1.read()
except:
pass
with open(f'./github_files/{j}') as f2:
try:
text2 = f2.read()
except:
pass
m = SequenceMatcher(None, text1, text2)
print("file1:", i, "file2:", j)
if m.ratio() > 0.90:
os.remove(f'./github_files/{j}')
break
I know i cannot change the iteration once it's in action that's why its returning me file not found error i dont want to use try except blocks. Any ideas appreciated
A couple of things to point out:
Always provide a minimal reproducible example
Your first for loop is not working since you used `for i for i ..``
If you want to iterate over the files in list1 (sorted_files) first, then read the file outside of the second loop
I would add the files that match with a ratio over 0.90 to a new list and remove the files afterward so your items do not change during the iteration
You can find the test-data i created and used here
import os
from difflib import SequenceMatcher
# define your two folders, full paths
first_path = os.path.abspath(r"C:\Users\XYZ\Desktop\testfolder\a")
second_path = os.path.abspath(r"C:\Users\XYZ\Desktop\testfolder\b")
# get files from folder
first_path_files = os.listdir(first_path)
second_path_files = os.listdir(second_path)
# join path and filenames
first_folder = [os.path.join(first_path, f) for f in first_path_files]
second_folder = [os.path.join(second_path, f) for f in second_path_files]
# empty list for matching results
matched_files = []
# iterate over the files in the first folder
for file_one in first_folder:
# read file content
with open(file_one, "r") as f:
file_one_text = f.read()
# iterate over the files in the second folder
for file_two in second_folder:
# read file content
with open(file_two, "r") as f:
file_two_text = f.read()
# match the two file contents
match = SequenceMatcher(None, file_one_text, file_two_text)
if match.ratio() > 0.90:
print(f"Match found ({match.ratio()}): '{file_one}' | '{file_two}'")
# TODO: here you have to decide if you rather want to remove files from the first or second folder
matched_files.append(file_two) # i delete files from the second folder
# remove duplicates from the resulted list
matched_files = list(set(matched_files))
# remove the files
for f in matched_files:
print(f"Removing file: {f}")
os.remove(f)

Read multiple text files, search few strings , replace and write in python

I have 10s of text files in my local directory named something like test1, test2, test3, and so on. I would like to read all these files, search few strings in the files, replace them by other strings and finally save back into my directory in such a way that something like newtest1, newtest2, newtest3, and so on.
For instance, if there was a single file, I would have done following:
#Read the file
with open('H:\\Yugeen\\TestFiles\\test1.txt', 'r') as file :
filedata = file.read()
#Replace the target string
filedata = filedata.replace('32-83 Days', '32-60 Days')
#write the file out again
with open('H:\\Yugeen\\TestFiles\\newtest1.txt', 'w') as file:
file.write(filedata)
Is there any way that I can achieve this in python?
If you use Pyhton 3 you can use the scandir in os library.
Python 3 docs: os.scandir
With that you can get the directory entries.
with os.scandir('H:\\Yugeen\\TestFiles') as it:
Then loop over these entries and your code could look something like this.
Notice I changed the path in your code to the entry object path.
import os
# Get the directory entries
with os.scandir('H:\\Yugeen\\TestFiles') as it:
# Iterate over directory entries
for entry in it:
# If not file continue to next iteration
# This is no need if you are 100% sure there is only files in the directory
if not entry.is_file():
continue
# Read the file
with open(entry.path, 'r') as file:
filedata = file.read()
# Replace the target string
filedata = filedata.replace('32-83 Days', '32-60 Days')
# write the file out again
with open(entry.path, 'w') as file:
file.write(filedata)
If you use Pyhton 2 you can use listdir. (also applicable for python 3)
Python 2 docs: os.listdir
In this case same code structure. But you also need to handle the full path to file since listdir will only return the filename.

Rename multiple files in Python from another list

I am trying to rename multiple files from another list. Like rename the test.wav to test_1.wav from the list ['_1','_2'].
import os
list_2 = ['_1','_2']
path = '/Users/file_process/new_test/'
file_name = os.listdir(path)
for name in file_name:
for ele in list_2:
new_name = name.replace('.wav',ele+'.wav')
os.renames(os.path.join(path,name),os.path.join(path,new_name))
But turns out the error shows "FileNotFoundError: [Errno 2] No such file or directory: /Users/file_process/new_test/test.wav -> /Users/file_process/new_test/test_2.wav
However, the first file in the folder has changed to test_1.wav but not the rest.
You are looping against 1st file with a total list. You have to input both the list and filename in the single for loop.
This can be done using zip(file_name, list_2) function.
This will rename the file with appending whatever is sent through the list. We just have to make sure the list and the number of files are always equal.
Code:
import os
list_2 = ['_1','_2']
path = '/Users/file_process/new_test/'
file_name = os.listdir(path)
for name, ele in zip(file_name, list_2):
new_name = name.replace(name , name[:-4] + ele+'.wav')
print(new_name)
os.renames(os.path.join(path,name),os.path.join(path,new_name))
You've got error in your algorithm.
Your algorithm first gets through the outer loop (for name in file_name) and then in the inner loop, you replace the file test.wav to test_1.wav. At this step, there is no file named test.wav (it has been already replaced as test_1.wav); however, your algorithm, still, tries to rename the file named test.wav to test_2.wav; and can not find it, of course!

Applying function to a list of file-paths and writing csv output to the respective paths

How do I apply a function to a list of file paths I have built, and write an output csv in the same path?
read file in a subfolder -> perform a function -> write file in the
subfolder -> go to next subfolder
#opened xml by filename
with open(r'XML_opsReport 100001.xml', encoding = "utf8") as fd:
Odict_parsedFromFilePath = xmltodict.parse(fd.read())
#func called in func below
def activity_to_df_one_day (list_activity_this_day):
ib_list = [pd.DataFrame(list_activity_this_day[i], columns=list_activity_this_day[i].keys()).drop("#uom") for i in range(len(list_activity_this_day))]
return pd.concat(ib_list)
#Processes parsed xml and writes csv
def activity_to_df_all_days (Odict_parsedFromFilePath, subdir): #writes csv from parsed xml after some processing
nodes_reports = Odict_parsedFromFilePath['opsReports']['opsReport']
list_activity = []
for i in range(len(nodes_reports)):
try:
df = activity_to_df_one_day(nodes_reports[i]['activity'])
list_activity.append(df)
except KeyError:
continue
opsReport = pd.concat(list_activity)
opsReport['dTimStart'] = pd.to_datetime(opsReport['dTimStart'], infer_datetime_format =True)
opsReport.sort_values('dTimStart', axis=0, ascending=True, inplace=True, kind='quicksort', na_position='last')
opsReport.to_csv("subdir\opsReport.csv") #write to the subdir
def scanfolder(): #fetches list of file-paths with desired starting name.
list_files = []
for path, dirs, files in os.walk(r'C:\..\xml_objects'): #directory containing several subfolders
for f in files:
if f.startswith('XML_opsReport'):
list_files.append(os.path.join(path, f))
return list_files
filepaths = scanfolder() #list of file-paths
Every function works well, the xml processing is good, so I am not sharing the xml structure. There are 100+ paths in filepaths , each a different subdirectory. I want to be able to apply above flow in future as well, where I can get filepaths and perform desired actions. It's important to write the csv file to it's sub directory.
To get the directory that a file is in, you can use:
import os
for root, dirs, files, in os.walk(some_dir):
for f in files:
print(root)
output_file = os.path.join(root, "output_file.csv")
print(output_file)
Is that what you're looking for?
Output:
somedir
somedir\output_file.csv
See also Python 3 - travel directory tree with limited recursion depth and Find current directory and file's directory.
Was able to solve with os.path.join.
exceptions_path_list =[]
for i in filepaths:
try:
with open(i, encoding = "utf8") as fd:
doc = xmltodict.parse(fd.read())
activity_to_df_all_days (doc, i)
except ValueError:
exceptions_path_list.append(os.path.dirname(i))
continue
def activity_to_df_all_days (Odict_parsedFromFilePath, filepath):
...
...
...
opsReport.to_csv(os.path.join(os.path.dirname(filepath), "opsReport.csv"))

Extracting file names from a list using single line for loop - python

I am trying to see if I can extract the file names from a os.listdir() output by omitting the '.csv' part in one single line for loop.
for example my list of file names look like this :
files = ['OPS020.csv','OPS340.csv',OPS230.csv','OPS349.csv']
Then all i could do was this
file_names = [f.split('.') for f in files]
file_names = [f[0] for f in file_names]
Is there a more elegant and shorter way to do this ?
the output i'm expecting is
file_names : ['OPS020','OPS340','OPS230','OPS349']
I guess, something like this would work.
from os import path
files = ['OPS020.csv','OPS340.csv','OPS230.csv','OPS349.csv']
filenames = [path.splitext(x)[0] for x in files]
Docs

Resources