Create a list using another list - python-3.x

I have created a list from a file by using .split() method, now I want to loop over this list to get certain strings from it to another empty list. when I use .append() method it copies the entire list with the brackets to the new list which is not what I want. how do I solve this?
this is the code I use:
file_name= input("Enter file name: ")
name= open(file_name)
for word in name:
word=fh.read()
line=word.rstrip()
word=line.split()
for words in word:
lst=list()
if word not in lst:
lst.append(word)
print(lst)

Hope this works.
def fuse(k,n):
k = (5,3,n)
n = []
for i in k:
n.append(i)
return n
print(fuse([5,3],[3,6]))
If you entered ([5,3],[3,6]) as the arguments to fuse, you would get [5,3,[3,6]].

You should pay more attention to your indentation and variable naming.
I think the following is what you tried.
file_name= input("Enter file name: ")
filtered_words = list()
# Open the file with the help of a context manager, which closes the file
# again in the event of an error.
with open(filename) as fd:
# Iterate over all lines of the file.
for line in fd:
# Cut off the white space at the end of the line and split the line
# into tokens, using space as a separator.
words = line.rstrip().split()
# Iterate over all tokens.
for word in words:
if word not in filtered_words:
# Append the token to the new list if it doesn't already exist.
filtered_words.append(word)
print(filtered_words)
If you want to remove all duplicate values, I recommend using a
set().

fname = input("Enter file name: ")
fh = open(fname)
lst=[]
#looping over words in the file
for word in fh:
words=word.split()
#looping through the words in the list
for words in words:
if words not in last:
#append words to the new list
lst.append(words)
lst.sort()
print(lst)

Related

Python how to grep two keywords from a sting in a text file and append into two strings

DELIVERED,machine01,2022-01-20T12:57:06,033,Email [Test1] is delivered by [192.168.0.2]
Above is the content from the text file. I have used split(",") method but I have no idea how to make it works as below. Can anyone help with this?
'DELIVERED', 'machine01', '2022-01-20T12:57:06', '033', 'Test1', '192.168.0.2'
with open('log_file.log', 'r') as f:
for line in f.readlines():
sep = line.split(",")
print(sep)
text = "DELIVERED,machine01,2022-01-20T12:57:06,033,Email [Test1] is delivered by [192.168.0.2]"
result = []
for part in text.split(','): # loops the parts of text separated by ","
result.append(part) # appends this parts into a list
print(result) # prints this list:
['DELIVERED', 'machine01', '2022-01-20T12:57:06', '033', 'Email [Test1] is delivered by [192.168.0.2]']
# or you can do all the same work in just 1 line of code!
result = [part for part in text.split(',')]
print(result)
['DELIVERED', 'machine01', '2022-01-20T12:57:06', '033', 'Email [Test1] is delivered by [192.168.0.2]']
Once you have split using , you then need to use a regular expression to find the contents of the [] in the final string. Since you are doing this over multiple lines, we collect each list in a variable (fields) then print this list of lists at the end:
import re
fields = []
with open('log_file.log', 'r') as f:
for line in f.readlines():
sep = line.split(",")
# Get the last item in the list
last = sep.pop()
# Find the values in [] in last
extras = re.findall(r'\[(.*?)\]', last)
# Add these values back onto sep
sep.extend(extras)
fields.append(sep)
print(fields)
log_file.log:
DELIVERED,machine01,2022-01-20T12:57:06,033,Email [Test1] is delivered by [192.168.0.2]
DELIVERED,machine02,2022-01-20T12:58:06,034,Email [Test2] is delivered by [192.168.0.3]
Result:
[['DELIVERED', 'machine01', '2022-01-20T12:57:06', '033', 'Test1', '192.168.0.2'], ['DELIVERED', 'machine02', '2022-01-20T12:58:06', '034', 'Test2', '192.168.0.3']]

Problem with reading text then put the text to the list and sort them in the proper way

Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order.
This is the question my problem is I cannot write a proper code and gathering true data, always my code gives me 4 different lists for each raw!
** This is my code**
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
line=line.rstrip()
line =line.split()
if line in last:
print(true)
else:
lst.append(line)
print(lst)
*** the text is here, please copy and paste in text editor***
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
You are not checking the presence of individual words in the list, but rather the presence of the entire list of words in that line.
With some modifications, you can achieve what you are trying to do this way:
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
line = line.rstrip()
words = line.split()
for word in words:
if word not in lst:
lst.append(word)
print(lst)
However, a few things I would like to point out looking at your code:
Why are you using rstrip() instead of strip()?
It is better to use list = [] as opposed to your lst = list(). It is shorter, faster, more Pythonic and avoids the use of this confusing lst variable.
You should want to remove punctuation marks attached to words, eg: ,.: which do not get removed by split()
If you want a loop body to not do anything, use pass. Why are you printing true? Also, in Python, it's True and not true.

How to split strings from .txt file into a list, sorted from A-Z without duplicates?

For instance, the .txt file includes 2 lines, separated by commas:
John, George, Tom
Mark, James, Tom,
Output should be:
[George, James, John, Mark, Tom]
The following will create the list and store each item as a string.
def test(path):
filename = path
with open(filename) as f:
f = f.read()
f_list = f.split('\n')
for i in f_list:
if i == '':
f_list.remove(i)
res1 = []
for i in f_list:
res1.append(i.split(', '))
res2 = []
for i in res1:
res2 += i
res3 = [i.strip(',') for i in res2]
for i in res3:
if res3.count(i) != 1:
res3.remove(i)
res3.sort()
return res3
print(test('location/of/file.txt'))
Output:
['George', 'James', 'John', 'Mark', 'Tom']
Your file opening is fine, although the 'r' is redundant since that's the default. You claim it's not, but it is. Read the documentation.
You have not described what task is so I have no idea what's going on there. I will assume that it is correct.
Rather than populating a list and doing a membership test on every iteration - which is O(n^2) in time - can you think of a different data structure that guarantees uniqueness? Google will be your friend here. Once you discover this data structure, you will not have to perform membership checks at all. You seem to be struggling with this concept; the answer is a set.
The input data format is not rigorously defined. Separators may be commas or commas with trailing spaces, and may appear (or not) at the end of the line. Consider making an appropriate regular expression and using its splitting feature to split individual lines, though normal splitting and stripping may be easier to start.
In the following example code, I've:
ignored task since you've said that that's fine;
separated actual parsing of file content from parsing of in-memory content to demonstrate the function without a file;
used a set comprehension to store unique results of all split lines; and
used a generator to sorted that drops empty strings.
from io import StringIO
from typing import TextIO, List
def parse(f: TextIO) -> List[str]:
words = {
word.strip()
for line in f
for word in line.split(',')
}
return sorted(
word for word in words if word != ''
)
def parse_file(filename: str) -> List[str]:
with open(filename) as f:
return parse(f)
def test():
f = StringIO('John, George , Tom\nMark, James, Tom, ')
words = parse(f)
assert words == [
'George', 'James', 'John', 'Mark', 'Tom',
]
f = StringIO(' Han Solo, Boba Fet \n')
words = parse(f)
assert words == [
'Boba Fet', 'Han Solo',
]
if __name__ == '__main__':
test()
I came up with a very simple solution if anyone will need:
lines = x.read().split()
lines.sort()
new_list = []
[new_list.append(word) for word in lines if word not in new_list]
return new_list
with open("text.txt", "r") as fl:
list_ = set()
for line in fl.readlines():
line = line.strip("\n")
line = line.split(",")
[list_.add(_) for _ in line if _ != '']
print(list_)
I think that you missed a comma after Jim in the first line.
You can avoid the use of a loop by using split property :
content=file.read()
my_list=content.split(",")
to delete the occurence in your list you can transform it to set :
my_list=list(set(my_list))
then you can sort it using sorted
so the finale code :
with open("file.txt", "r") as file :
content=file.read()
my_list=content.replace("\n","").replace(" ", "").split(",")
result=sorted(list(set(my_list)))
you can add a key to your sort function

How to remove lines from a file starting with a specific word python3

I am doing this as an assignment. So, I need to read a file and remove lines that start with a specific word.
fajl = input("File name:")
rec = input("Word:")
def delete_lines(fajl, rec):
with open(fajl) as file:
text = file.readlines()
print(text)
for word in text:
words = word.split(' ')
first_word = words[0]
for first in word:
if first[0] == rec:
text = text.pop(rec)
return text
print(text)
return text
delete_lines(fajl, rec)
At the last for loop, I completely lost control of what I am doing. Firstly, I can't use pop. So, once I locate the word, I need to somehow delete lines that start with that word. Additionally, there is also one minor problem with my approach and that is that first_word gets me the first word but the , also if it is present.
Example text from a file(file.txt):
This is some text on one line.
The text is irrelevant.
This would be some specific stuff.
However, it is not.
This is just nonsense.
rec = input("Word:") --- This
Output:
The text is irrelevant.
However, it is not.
You cannot modify an array while you are iterating over it. But you can iterate over a copy to modify the original one
fajl = input("File name:")
rec = input("Word:")
def delete_lines(fajl, rec):
with open(fajl) as file:
text = file.readlines()
print(text)
# let's iterate over a copy to modify
# the original one without restrictions
for word in text[:]:
# compare with lowercase to erase This and this
if word.lower().startswith(rec.lower()):
# Remove the line
text.remove(word)
newtext="".join(text) # join all the text
print(newtext) # to see the results in console
# we should now save the file to see the results there
with open(fajl,"w") as file:
file.write(newtext)
print(delete_lines(fajl, rec))
Tested with your sample text. if you want to erase "this". The startswith method will wipe "this" or "this," alike. This will only delete the text and let any blank lines alone. if you don't want them you can also compare with "\n" and remove them

How to find the number of common words in a text file and delete them in python?

The question is to:
Firstly,find the number of all words in a text file
Secondly, delete the common words like, a, an , and, to, in, at, but,... (it is allowed to write a list of these words)
Thirdly, find the number of the remaining words (unique words)
Make a list of them
the file name should be used as the parameter of the function
I have done the first part of the question
import re
file = open('text.txt', 'r', encoding = 'latin-1')
word_list = file.read().split()
for x in word_list:
print(x)
res = len(word_list)
print ('The number of words in the text:' + str(res))
def uncommonWords (file):
uncommonwords = (list(file))
for i in uncommonwords:
i += 1
print (i)
The code shows till the number of the words and nothing appears after that.
you can do it like this
# list of common words you want to remove
stop_words = set(["is", "the", "to", "in"])
# set to collect unique words
words_in_file = set()
with open("words.txt") as text_file:
for line in text_file:
for word in line.split():
words_in_file.add(word)
# remove common words from word list
unique_words = words_in_file - stop_words
print(list(unique_words))
First, you may want to get rid of punctuation : as showed in this answer, you should do :
nonPunct = re.compile('.*[A-Za-z0-9].*')
filtered = [w for w in text if nonPunct.match(w)]
then, you could do
from collections import Counter
counts = Counter(filtered)
you can then access the list of unique words with list(counts.keys()) and then you can chose to ignore the words you don't want with
[word for word in list(counts.keys()) if word not in common_words]
Hope this answers your question.

Resources