Anonymize files,with a array of specified words, Python - python-3.x

So i'm writting an anonymizer and I'm having trouble with figuring out, how to replace a Name in a textfile. I have an array with names that should get anonymized, refered here as text here's my code, it should go into an other file and check if the words match, and if true, it should get replaced. As programming is still a foreign language to me, I would love to read a comprehensive answer
for words in fin_message:
if words == text :
new_list = words.replace(text, "xxx")
print(new_list)
else:
print(words)

Since text is a list, you can't directly compare it to "word", but you can test whether the word is in text:
...
if words in text:
print("xxx")
...
This will, however, print the words in the text file one by one. If instead, you want to print the text file as-is, except for the replacements, you could iterate over the lines of the file, and inside the lines over the banned names. Something like this:
banned_words = ["Peter", "Paul", "Mary"]
with open("my_file.txt") as f:
for line in f:
for forbidden in banned_words:
line.replace(forbidden, "xxx")
print(line)

Related

How can I find all the strings that contains "/1" and remove from a file using Python?

I have this file that contains these kinds of strings "1405079/1" the only common in them is the "/1" at the end. I want to be able to find those strings and remove them, below is sample code
but it's not doing anything.
with open("jobstat.txt","r") as jobstat:
with open("runjob_output.txt", "w") as runjob_output:
for line in jobstat:
string_to_replace = ' */1'
line = line.replace(string_to_replace, " ")
with open("jobstat.txt","r") as jobstat:
with open("runjob_output.txt", "w") as runjob_output:
for line in jobstat:
string_to_replace ='/1'
line =line.rstrip(string_to_replace)
print(line)
Anytime you have a "pattern" you want to match against, use a regular expression. The pattern here, given the information you've provided, is a string with an arbitrary number of digits followed by /1.
You can use re.sub to match against that pattern, and replace instances of it with another string.
import re
original_string= "some random text with 123456/1, and midd42142/1le of words"
pattern = r"\d*\/1"
replacement = ""
re.sub(pattern, replacement, original_string)
Output:
'some random text with , and middle of words'
Replacing instances of the pattern with something else:
>>> re.sub(pattern, "foo", original_string)
'some random text with foo, and middfoole of words'

Problem with reading text then put the text to the list and sort them in the proper way

Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order.
This is the question my problem is I cannot write a proper code and gathering true data, always my code gives me 4 different lists for each raw!
** This is my code**
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
line=line.rstrip()
line =line.split()
if line in last:
print(true)
else:
lst.append(line)
print(lst)
*** the text is here, please copy and paste in text editor***
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
You are not checking the presence of individual words in the list, but rather the presence of the entire list of words in that line.
With some modifications, you can achieve what you are trying to do this way:
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
line = line.rstrip()
words = line.split()
for word in words:
if word not in lst:
lst.append(word)
print(lst)
However, a few things I would like to point out looking at your code:
Why are you using rstrip() instead of strip()?
It is better to use list = [] as opposed to your lst = list(). It is shorter, faster, more Pythonic and avoids the use of this confusing lst variable.
You should want to remove punctuation marks attached to words, eg: ,.: which do not get removed by split()
If you want a loop body to not do anything, use pass. Why are you printing true? Also, in Python, it's True and not true.

Displaying and formatting list from external file in Python

I have an external file that I'm reading a list from, and then printing out the list. So far I have a for loop that is able to read through the list and print out each item in the list, in the same format as it is stored in the external file. My list in the file is:
['1', '10']
['Hello', 'World']
My program so far is:
file = open('Original_List.txt', 'r')
file_contents = file.read()
for i in file_contents.split():
print(i)
file.close()
The output I'm trying to get:
1 10
Hello World
And my current output is:
['1',
'10']
['Hello',
'World']
I'm part way there, I've managed to separate the items in the list into separate lines, but I still need to remove the square brackets, quotation marks, and commas. I've tried using a loop to loop through each item in the line, and only display it if it doesn't contain any square brackets, quotation marks, and commas, but when I do that, it separates the list item into individual characters, rather than leave it as one entire item. I also need to be able to display the first item, then tab it over, and print the second item, etc, so that the output looks identical to the external file, except with the square brackets, quotation marks, and commas removed. Any suggestions for how to do this? I'm new to Python, so any help would be greatly appreciated!
Formatting is your friend.
file = open('Original_List.txt', 'r'))
file_contents = file.readlines() # change this to readlines so that it splits on each line already
for list in file_contents:
for item in eval(list): # be careful when using eval but it suits your use case, basically turns the list on each line into an 'actual' list
print("{:<10}".format(i)) # print each item with 10 spaces of padding and left align
print("\r\n") # print a newline after each line that we have interpreted
file.close()

Different behaviour shown when running the same code for a file and for a list

I have observed this unusual behaviour when I try to do a string slicing on the words in a file and the words in a list.Both the results are quite different.
For example I have a file 'words.txt' which contains the following content
POPE
POPS
ROPE
POKE
COPE
PAPE
NOPE
POLE
When I write the below piece of code, I expect to get a list of words with last letter omitted.
with open("words.txt", "r") as fo:
for l in fo:
print(l[:-1])
But instead I get this result below.No string slicing takes place and the words are similar as before.
POPE
POPS
ROPE
POKE
COPE
PAPE
NOPE
POLE
But if I write the below code, I get what I want
lis = ["POPE", "POPS", "ROPE", "POKE", "COPE", "PAPE", "NOPE", "POLE"]
for i in lis:
print(i[:-1])
I am able to delete the last letter of each of the words as expected.
POP
POP
ROP
POK
COP
PAP
NOP
POL
So why do I see two different results for the same operation [: -1] ?
The line ends with \n in files where as you dont need line endings in lists.
Your actual file contents are as follows
POPE\n
POPS\n
ROPE\n
POKE\n
COPE\n
PAPE\n
NOPE\n
POLE\n
hence the print(l[:-1]) is actually trimming the line ending i.e. \n.
To verify this, declare an empty list before the loop, and add each line to that list and print it. You will find the that the lines contain the \n on every line
stuff = []
with open("words.txt", "r") as fo:
for line in fo:
stuff.append(line)
print stuff
this will print ['POPE\n', 'POPS\n', 'ROPE\n', 'POKE\n']
If I am not wrong, you want to carry out the slicing operation on the file contents. I think you should look into strip() method.

How to decode a text file by extracting alphabet characters and listing them into a message?

So we were given an assignment to create a code that would sort through a long message filled with special characters (ie. [,{,%,$,*) with only a few alphabet characters throughout the entire thing to make a special message.
I've been searching on this site for a while and haven't found anything specific enough that would work.
I put the text file into a pastebin if you want to see it
https://pastebin.com/48BTWB3B
Anywho, this is what I've come up with for code so far
code = open('code.txt', 'r')
lettersList = code.readlines()
lettersList.sort()
for letters in lettersList:
print(letters)
It prints the code.txt out but into short lists, essentially cutting it into smaller pieces. I want it to find and sort out the alphabet characters into a list and print the decoded message.
This is something you can do pretty easily with regex.
import re
with open('code.txt', 'r') as filehandle:
contents = filehandle.read()
letters = re.findall("[a-zA-Z]+", contents)
if you want to condense the list into a single string, you can use a join:
single_str = ''.join(letters)

Resources