List is empty when appending when using recursion - python-3.x

I have two functions. The first one is used to get a list of paths to text files, and the second one is used to iterate over this list of paths and then check if they include the word password. But because of the Try Except statement in the second function, I had to use recursion to make it continue running unless there's another way if possible to provide below. My problem is that the list returned in the second function is empty why and how to fix it?
def search_txt():
"""Function to search the C:\\ for .txt files -> then add them (including full path to file) to a list."""
list_of_txt = []
for dir_path, sub_dir, files in os.walk("C:\\"):
"""Method 1 -> checks the end of the file name (could be used for specific extensions)"""
for file in files:
if file.endswith(".txt"):
list_of_txt.append(os.path.join(dir_path, file))
return list_of_txt
def search_pass_file(list_of_files: list):
"""Function to iterate over each text file, searching if the word "password" is included -> Returns the text
file's path """
list_of_pass = []
if len(list_of_files) != 0:
for i in range(len(list_of_files)):
file = list_of_files.pop()
try:
with open(file, encoding="utf8") as f:
for line in f.readlines():
if "password" in line:
list_of_pass.append(file)
except UnicodeDecodeError:
return search_pass_file(list_of_files)
except PermissionError:
return search_pass_file(list_of_files)
else:
return list_of_pass
if __name__ == '__main__':
myList = search_txt()
print(search_pass_file(myList))

You're returning list_of_pass only if len(list_of_files) == 0 (it's in the else block). Your return statement should occur after the loop (which should be a while one btw)
You can except several errors in one line by putting them in parenthesis: except (UnicodeDecodeError, PermissionError) of except all exceptions (for instance, you're not handling FileNotFoundError).
I'd reduce your function to:
def search_pass_file(list_of_files: list):
"""Function to iterate over each text file, searching if the word "password" is included -> Returns the text
file's path """
list_of_pass = []
while list_of_files:
file = list_of_files.pop()
try:
with open(file, encoding="utf8") as f:
for line in f.readlines():
if "password" in line:
list_of_pass.append(file)
break
except Exception:
list_of_pass += search_pass_file(list_of_files)
return list_of_pass
Edit: also in your except block, you should append the returned value of the recursive function to list_of_pass otherwise you'll lose the files found after the error occurs.

Related

Using Map in Open Binary

trying to use map() in my script which reads files and converts them into binary form.
Cant get the below to work, any help?
def binary_file_reader(file_data):
with open(file_data, 'rb') as binary_file_data:
binary_file_data = binary_file_data.read()
print(binary_file_data)
binary_data = binascii.hexlify(binary_file_data)
binary_data = binary_data.decode("utf-8")
return binary_data
Then the main which calls the above
if __name__ == "__main__":
device_directory = os.getcwd()
for r, d, f in os.walk(device_directory):
for file in f:
file_data = os.path.join(r, file)
all_file_names.append(file_data)
try:
binary_data = map(binary_file_reader, all_file_names)
print(binary_data)
except IOError:
print("cannot read")
Because map applies binary_file_reader to every element inside file_data, it doesn't do what you think it is.
In your case, file_data is your actual file path as a str, e.g., /tmp/a.txt. If you use map on a str, it is applied on every letter, so what you do will be expanded to
binary_file_reader('/')
binary_file_reader('t')
binary_file_reader('m')
binary_file_reader('p')
binary_file_reader('/')
binary_file_reader('a')
binary_file_reader('.')
binary_file_reader('t')
binary_file_reader('x')
binary_file_reader('t')
binary_file_reader(file_data) should produce the desired result.

Bad word filter also deletes embeds

So i have a problem with this code, it doesnt like embeds. It will automaticly delete any embeds from other bots. Is there anyway to stop this from happening.
#client.event
async def on_message(msg):
if msg.author == client.user:
return
with open('BadWords.txt', 'r') as f:
BadWords = f.readlines()
for line in BadWords:
if msg.content in line:
await msg.delete()
await client.process_commands(msg)
Thanks for the help in advance
This most likely happens because you have an empty line in your .txt file, which means that Python matches the empty string with the empty content. You have a few options... for clarity I include the surrounding code
You can check if the specific line is empty
for line in BadWords:
if line == '':
continue
if msg.content in line:
Or you can remove it before you start looping
BadWords = f.readlines()
try:
BadWords.remove('')
except ValueError:
pass
Lastly it's also possible you ignore the message if it has no content, as this can also happen if someone sends a file or attachment.
if msg.author == client.user:
return
if msg == '':
return

Continues variable not triggering while loop

Apologies if this is a repost. I am trying to write a while loop with a continue variable and if/else statement. My issue is that my continue variable is being ignored I cannot find the problem thus far. So far I have moved the while continues == 'y' condition into the else block now I am a bit flummoxed on why this var is being overlooked.
code:
def add_to_existing_file(data):
# data[0]-api response
# data[1]-city
# infile-file object returned from openFile()
# file_name- file name string. check filetype & report version.
continues = 'y' # set up continue variable
while continues == 'y':
file_name = input("Enter File Path to file to be appended: ") # get file from user
if file_name is False:
print("Now Creating Excel File..") # create condition for no user response.
return # if empty response exit function
else:
infile = appends.openFile(file_name) # open file to work with. Returns file object.
added_data = appends.journal_report_1_to_df(infile, file_name, data[0], data[1]) # append selected file to existing df
continues = input("Do you want to append another file? Y or N").lower() # check if new file
return added_data # return new df w/appended data
The problem happens on the last line. You're returning at the end of the first iteration, which exits the loop. This can be fixed by moving the return to the outer scope.
def add_to_existing_file(data):
# data[0]-api response
# data[1]-city
# infile-file object returned from openFile()
# file_name- file name string. check filetype & report version.
continues = 'y' # set up continue variable
while continues == 'y':
file_name = input("Enter File Path to file to be appended: ") # get file from user
if file_name is False:
print("Now Creating Excel File..") # create condition for no user response.
return # if empty response exit function
else:
infile = appends.openFile(file_name) # open file to work with. Returns file object.
added_data = appends.journal_report_1_to_df(infile, file_name, data[0], data[1]) # append selected file to existing df
continues = input("Do you want to append another file? Y or N").lower() # check if new file
return added_data # return new df w/appended data
It should work if you get the second return line (return added_data # return new df w/appended data) to have the same indentation as your while line. As a basic outline for a continue loop:
def function
continues = 'y'
while
if :
elif :
else :
print
continue ?
return

ELIF statements skipped in favor of ELSE

I am writing a few statements utilizing regular expressions to match an output to a given extension. My For loop seems to work fine, I get an answer back for each of the files and if I take one out or add one, I get a return.
What appears to happen though is that my first file is picked up, matched successfully, and the correct output given. The loop then grabs the next file, checks against the first statement, then skips the two ELIF's and gives and output based on my ELSE. Can anyone point out why or if I have it wrong, what is actually going on?
def extmatch():
global dircontents
for file in dircontents:
dircontents = re.search(".+\sbreakout.*\.ttx", file)
if dircontents:
print('File is for TIA')
elif dircontents:
dircontents = re.search('\w+\.csv+$', file)
if dircontents:
print('File is for NMFTA')
elif dircontents:
dircontents = re.search('\w+.\.txt+$', file)
if dircontents:
print('File is for NMFTA')
else:
print('File type not recognized.')
['061419license breakout_ibc_v3_0116.ttx', '2019-06-17_7-49-21.jpg', 'SampleCSV.csv', 'script_test.txt'] <--- these are the files in the dir indicated
File is for TIA
File type not recognized. <---Seems to match to ELSE for each file past the first
File type not recognized.
File type not recognized.
You probably want something like this:
def extmatch(dircontents):
for filename in dircontents:
if filename.lower().endswith(".ttx"):
print('File is for TIA')
elif filename.lower().endswith(".csv"):
print('File is for NMFTA')
elif filename.lower().endswith(".txt"):
print('File is for NMFTA')
else:
print('File type not recognized.')
or even like this:
EXT_ASSIGNMENTS = {
'ttx': 'TIA',
'csv': 'NMFTA',
'txt': 'NMFTA',
}
def extmatch(dircontents):
for filename in dircontents:
ext = filename.lower().split('.')[-1]
if ext in EXT_ASSIGNMENTS:
print('File is for ' + EXT_ASSIGNMENTS[ext])
else:
print('File type not recognized.')
Avoid global variables. If you need to pass information to a function, use an argument.

Store scrape results and search in results with Python and Pandas?

as part of my Ph.D. research, I am scraping numerous webpages and search for keywords within the scrape results.
This is how I do it thus far:
# load data with as pandas data frame with column df.url
df = pd.read_excel('sample.xls', header=0)
# define keyword search function
def contains_keywords(link, keywords):
try:
output = requests.get(link).text
return int(any(x in output for x in keywords))
except:
return "Wrong/Missing URL"
# define the relevant keywords
mykeywords = ('for', 'bar')
# store search results in new column 'results'
df['results'] = df.url.apply(lambda l: contains_keywords(l, mykeywords))
This works just fine. I only have one problem: the list of relevant keywords mykeywordschanges frequently, whilst the webpages stay the same. Running the code takes a long time, since I request over and over.
I have two questions:
(1) Is there a way to store the results of request.get(link).text?
(2) And if so, how to I search within the saved file(s) producing the same result as with the current script?
As always, thank you for your time and help! /R
You can download the content of the urls and save them in separate files in a directory (eg: 'links')
def get_link(url):
file_name = os.path.join('/path/to/links', url.replace('/', '_').replace(':', '_'))
try:
r = requests.get(url)
except Exception as e:
print("Failded to get " + url)
else:
with open(file_name, 'w') as f:
f.write(r.text)
Then modify the contains_keywords function to read local files, so you won't have to use requests every time you run the script.
def contains_keywords(link, keywords):
file_name = os.path.join('/path/to/links', link.replace('/', '_').replace(':', '_'))
try:
with open(file_name) as f:
output = f.read()
return int(any(x in output for x in keywords))
except Exception as e:
print("Can't access file: {}\n{}".format(file_name, e))
return "Wrong/Missing URL"
Edit: i just added a try-except block in get_link and used absolute path for file_name

Resources