Can't check the content of an email - python-3.x

I am trying to read the content of an mbox file and compare it with a list of words also read from a different file. I believe the problem is I am reading them wrong, since the output does not match what I expect knowing the content of the files.
I have tried to read them both as rb and r with no luck. I then tried to put the txt file into a list. Anyway the mbox file cannot be inserted into a list. As further test, I tried to read the content of the email by using the get_payload() function but it returns bytes that are not useful to me.
# Opening the file that contains the balcklisted words and printing it
with open("blacklist.txt",'r') as afile:
buf=afile.read()
print(buf)
# Opening the mbox files
mbox = mailbox.mbox('Andishe.mbox')
# To read the content of the mbox file when its a multiple messages
for message in mbox:
if message.is_multipart():
print ("from :",message['from'])
print ("to :",message['to'])
content = message.as_string()
# print(content)
else:
print ("from :",message['from'])
print ("to :",message['to'])
content = message.as_string()
# print(content)
# To check and see if the black listed words are inside the content of the email
for file in content:
if file in buf:
print("file contains blacklisted words" + file)
else:
print("file does not contain blacklisted words")
I would expect the results to be like this:
some black listed word
file contains blacklisted words + the black listed word
But I am stuck in a loop that keeps printing itself, the following is a part of what gets printed:
file contains blacklisted wordsr
file contains blacklisted wordso
file contains blacklisted wordsm
file contains blacklisted words
I have no idea what those r, o, m stand for or where they are coming from?

I have figured out where I was going wrong:
1- I was reading the content of the txt file wrong. I should have used this:
blacklist=[]
for line in afile:
blacklist.append(line.strip('\n'))
this way, I was getting rid of the end of line charterer and also keeping each line to a word
2- I was also not doing my for loop wrong, since I did not append the content of the mbox file. this fixed the issue:
content_string = ''.join(content)
content_string = content_string.lower()
for word in blacklist:
if word.lower() in content_string:
print("This black listed word exists in content : ",word)

Related

How to search a text file using input method

I have a .txt file that I want to search for specific words, or phrases. I want to be able to use an input to do this. Then I would like the file parsed for the input and printed. Basically something like this:
input("Search For:")I WANT TO ENTER MY SEARCH TERM HERE
print(I WANT TO PRINT WHAT I SEARCHED FOR ABOVE)
I am able to do this another way by creating a variable, and then just changing the variable name as needed, but this is not ideal for me. Any ideas on how to create an input to search my .txt?
word = 'Scrubbing'
#variable to store search term
with open(r'/Users/kev/PycharmProjects/find_text/common.txt', 'r') as fp:
lines = fp.readlines()
# read all lines in a list
for line in lines:
if line.find(word) != -1:
# check if string present on a current line
print(word, 'string exists in file')
print('Line Number:', lines.index(line))
print('Line:', line)

How to read many files have a specific format in python

I am a little bit confused in how to read all lines in many files where the file names have format from "datalog.txt.98" to "datalog.txt.120".
This is my code:
import json
file = "datalog.txt."
i = 97
for line in file:
i+=1
f = open (line + str (i),'r')
for row in f:
print (row)
Here, you will find an example of one line in one of those files:
I need really to your help
I suggest using a loop for opening multiple files with different formats.
To better understand this project I would recommend researching the following topics
for loops,
String manipulation,
Opening a file and reading its content,
List manipulation,
String parsing.
This is one of my favourite beginner guides.
To set the parameters of the integers at the end of the file name I would look into python for loops.
I think this is what you are trying to do
# create a list to store all your file content
files_content = []
# the prefix is of type string
filename_prefix = "datalog.txt."
# loop from 0 to 13
for i in range(0,14):
# make the filename variable with the prefix and
# the integer i which you need to convert to a string type
filename = filename_prefix + str(i)
# open the file read all the lines to a variable
with open(filename) as f:
content = f.readlines()
# append the file content to the files_content list
files_content.append(content)
To get rid of white space from file parsing add the missing line
content = [x.strip() for x in content]
files_content.append(content)
Here's an example of printing out files_content
for file in files_content:
print(file)

Why the output of "open" function doesn't allow me to attribute index?

I started to learn programming in python3 and i am doing a project that reads the content of a text file and tells you how many words are in the file. Being me I always want to challenge myself and tried to add in the output message the name of the file so in the future I will do a GUI for it and so on.
The error that I get is : AttributeError: '_io.TextIOWrapper' object has no attribute 'index'
Here is my code:
# Open text file
document = open("text2.txt", "r+")
# Reads the text file and splits it into arrays
text_split = document.read().split()
# Count the words
words = len(text_split)
# Display the counted words
document_name = document[document.index("name=")]
output = "In the file {} there are {} words.".format(document_name, words)
print (output)
Decided to take #Jean-François Fabre 's advice and abandoned the idea to also output the name of the file (FOR NOW).

issue in saving string list in to text file

I am trying to save and read the strings which are saved in a text file.
a = [['str1','str2','str3'],['str4','str5','str6'],['str7','str8','str9']]
file = 'D:\\Trails\\test.txt'
# writing list to txt file
thefile = open(file,'w')
for item in a:
thefile.write("%s\n" % item)
thefile.close()
#reading list from txt file
readfile = open(file,'r')
data = readfile.readlines()#
print(a[0][0])
print(data[0][1]) # display data read
the output:
str1
'
both a[0][0] and data[0][0] should have the same value, reading which i saved returns empty. What is the mistake in saving the file?
Update:
the 'a' array is having strings on different lengths. what are changes that I can make in saving the file, so that output will be the same.
Update:
I have made changes by saving the file in csv instead of text using this link, incase of text how to save the data ?
You can save the list directly on file and use the eval function to translate the saved data on file in list again. Isn't recommendable but, the follow code works.
a = [['str1','str2','str3'],['str4','str5','str6'],['str7','str8','str9']]
file = 'test.txt'
# writing list to txt file
thefile = open(file,'w')
thefile.write("%s" % a)
thefile.close()
#reading list from txt file
readfile = open(file,'r')
data = eval(readfile.readline())
print(data)
print(a[0][0])
print(data[0][1]) # display data read
print(a)
print(data)
a and data will not have same value as a is a list of three lists.
Whereas data is a list with three strings.
readfile.readlines() or list(readfile) writes all lines in a list.
So, when you perform data = readfile.readlines() python consider ['str1','str2','str3']\n as a single string and not as a list.
So,to get your desired output you can use following print statement.
print(data[0][2:6])

How do I replace the 4th item in a list that is in a file that starts with a particular string?

I need to search for a name in a file and in the line starting with that name, I need to replace the fourth item in the list that is separated my commas. I have began trying to program this with the following code, but I have not got it to work.
with open("SampleFile.txt", "r") as f:
newline=[]
for word in f.line():
newline.append(word.replace(str(String1), str(String2)))
with open("SampleFile.txt", "w") as f:
for line in newline :
f.writelines(line)
#this piece of code replaced every occurence of String1 with String 2
f = open("SampleFile.txt", "r")
for line in f:
if line.startswith(Name):
if line.contains(String1):
newline = line.replace(str(String1), str(String2))
#this came up with a syntax error
You could give some dummy data which would help people to answer your question. I suppose you to backup your data: You can save the edited data to a new file or you can backup the old file to a backup folder before working on the data (think about using "from shutil import copyfile" and then "copyfile(src, dst)"). Otherwise by making a mistake you could easily ruin your data without being able to easily restore them.
You can't replace the string with "newline = line.replace(str(String1), str(String2))"! Think about "strong" as your search term and a line like "Armstrong,Paul,strong,44" - if you replace "strong" with "weak" you would get "Armweak,Paul,weak,44".
I hope the following code helps you:
filename = "SampleFile.txt"
filename_new = filename.replace(".", "_new.")
search_term = "Smith"
with open(filename) as src, open(filename_new, 'w') as dst:
for line in src:
if line.startswith(search_term):
items = line.split(",")
items[4-1] = items[4-1].replace("old", "new")
line = ",".join(items)
dst.write(line)
If you work with a csv-file you should have a look at the csv module.
PS My files contain the following data (the filenames are not in the files!!!):
SampleFile.txt SampleFile_new.txt
Adams,George,m,old,34 Adams,George,m,old,34
Adams,Tracy,f,old,32 Adams,Tracy,f,old,32
Smith,John,m,old,53 Smith,John,m,new,53
Man,Emily,w,old,44 Man,Emily,w,old,44

Resources