I am trying to write a script to print specific words after a particular string.
Here is the input file
Theyare "playing in the ground", with friends
Theyare "going to Paris", with family
Theyare "motivating to learn new things", by themselves
In the output I am trying to select "are" as the keyword and after "are" I want the text which is in the "" and I want to add the text before space to the "".
output should be
They playing in the ground
They going to Paris
They motivating to learn new things
I can print the rest of the line with the below code but not certain words. So far I have
with open ('input.txt', 'r') as f:
for lines in f:
a = re.search(r'\bare', f):
if a:
print (lines)
Any help would be appreciated
Use a regular expression to extract the parts of the line you want.
with open ('input.txt', 'r') as f:
for lines in f:
m = re.match(r'(.*?) are "(.*?)"')
if m:
print m.group(1) + " " + m.group(2)
The groups in m return the parts of the line that matches the patterns between ().
If your lines always look like the examples you provided you can use string manipulations:
s = 'They are "playing in the ground", with friends'
are_split = s.split('are')
# are_split = ['They ', ' "playing in the ground", with friends']
quote_split = are_split[1].split('"')
# quote_split = [' ', 'playing in the ground', ', with friends']
print(are_split[0] + quote_split[1])
# 'They playing in the ground'
Related
with open("lineup.txt", "r", encoding="utf-8") as file:
ts = list()
fb = list()
for line in file:
line = line[:-1]
data = line.split(",")
if (data[1] == "Fenerbahçe"):
fb.append(line + "\n")
elif (data[1] == "Trabzonspor"):
ts.append(line + "\n")
with open("ts.txt", "w", encoding="utf-8") as file1:
for i in ts:
file1.write(i)
with open("fb.txt", "w", encoding="utf-8") as file2:
for i in fb:
file2.write(i)
print(fb)
print(ts)
And here is some datas from lineup.txt file
U. Çakır, Trabozonspor
Marc Bartra, Trabzonspor
İ. Kahveci, Fenerbahçe
S. Aziz, Fenerbahçe
Trezeguet, Trabzonspor
A. Bayındır, Fenerbahçe
Gustavo Henrique, Fenerbahçe
I am taking ∅ in both lists so I can not write datas into txt. I can't figure it out why
There may be an issue with your split function. If you split on comma, your second word of your list will start with a space. And if that is not your only word after the comma and space, you won't be able to match the equality anyways.
One way to solve this, is by using the "in" operator, avoiding the "split" completely.
for line in file:
if ("Fenerbahçe" in line):
fb.append(line[:-1] + "\n")
elif ("Trabzonspor" in line):
ts.append(line[:-1] + "\n")
If the matching is more complex than that, you may consider using regex and exploit word boundaries before and after your matching city names.
For instance, the .txt file includes 2 lines, separated by commas:
John, George, Tom
Mark, James, Tom,
Output should be:
[George, James, John, Mark, Tom]
The following will create the list and store each item as a string.
def test(path):
filename = path
with open(filename) as f:
f = f.read()
f_list = f.split('\n')
for i in f_list:
if i == '':
f_list.remove(i)
res1 = []
for i in f_list:
res1.append(i.split(', '))
res2 = []
for i in res1:
res2 += i
res3 = [i.strip(',') for i in res2]
for i in res3:
if res3.count(i) != 1:
res3.remove(i)
res3.sort()
return res3
print(test('location/of/file.txt'))
Output:
['George', 'James', 'John', 'Mark', 'Tom']
Your file opening is fine, although the 'r' is redundant since that's the default. You claim it's not, but it is. Read the documentation.
You have not described what task is so I have no idea what's going on there. I will assume that it is correct.
Rather than populating a list and doing a membership test on every iteration - which is O(n^2) in time - can you think of a different data structure that guarantees uniqueness? Google will be your friend here. Once you discover this data structure, you will not have to perform membership checks at all. You seem to be struggling with this concept; the answer is a set.
The input data format is not rigorously defined. Separators may be commas or commas with trailing spaces, and may appear (or not) at the end of the line. Consider making an appropriate regular expression and using its splitting feature to split individual lines, though normal splitting and stripping may be easier to start.
In the following example code, I've:
ignored task since you've said that that's fine;
separated actual parsing of file content from parsing of in-memory content to demonstrate the function without a file;
used a set comprehension to store unique results of all split lines; and
used a generator to sorted that drops empty strings.
from io import StringIO
from typing import TextIO, List
def parse(f: TextIO) -> List[str]:
words = {
word.strip()
for line in f
for word in line.split(',')
}
return sorted(
word for word in words if word != ''
)
def parse_file(filename: str) -> List[str]:
with open(filename) as f:
return parse(f)
def test():
f = StringIO('John, George , Tom\nMark, James, Tom, ')
words = parse(f)
assert words == [
'George', 'James', 'John', 'Mark', 'Tom',
]
f = StringIO(' Han Solo, Boba Fet \n')
words = parse(f)
assert words == [
'Boba Fet', 'Han Solo',
]
if __name__ == '__main__':
test()
I came up with a very simple solution if anyone will need:
lines = x.read().split()
lines.sort()
new_list = []
[new_list.append(word) for word in lines if word not in new_list]
return new_list
with open("text.txt", "r") as fl:
list_ = set()
for line in fl.readlines():
line = line.strip("\n")
line = line.split(",")
[list_.add(_) for _ in line if _ != '']
print(list_)
I think that you missed a comma after Jim in the first line.
You can avoid the use of a loop by using split property :
content=file.read()
my_list=content.split(",")
to delete the occurence in your list you can transform it to set :
my_list=list(set(my_list))
then you can sort it using sorted
so the finale code :
with open("file.txt", "r") as file :
content=file.read()
my_list=content.replace("\n","").replace(" ", "").split(",")
result=sorted(list(set(my_list)))
you can add a key to your sort function
I am doing this as an assignment. So, I need to read a file and remove lines that start with a specific word.
fajl = input("File name:")
rec = input("Word:")
def delete_lines(fajl, rec):
with open(fajl) as file:
text = file.readlines()
print(text)
for word in text:
words = word.split(' ')
first_word = words[0]
for first in word:
if first[0] == rec:
text = text.pop(rec)
return text
print(text)
return text
delete_lines(fajl, rec)
At the last for loop, I completely lost control of what I am doing. Firstly, I can't use pop. So, once I locate the word, I need to somehow delete lines that start with that word. Additionally, there is also one minor problem with my approach and that is that first_word gets me the first word but the , also if it is present.
Example text from a file(file.txt):
This is some text on one line.
The text is irrelevant.
This would be some specific stuff.
However, it is not.
This is just nonsense.
rec = input("Word:") --- This
Output:
The text is irrelevant.
However, it is not.
You cannot modify an array while you are iterating over it. But you can iterate over a copy to modify the original one
fajl = input("File name:")
rec = input("Word:")
def delete_lines(fajl, rec):
with open(fajl) as file:
text = file.readlines()
print(text)
# let's iterate over a copy to modify
# the original one without restrictions
for word in text[:]:
# compare with lowercase to erase This and this
if word.lower().startswith(rec.lower()):
# Remove the line
text.remove(word)
newtext="".join(text) # join all the text
print(newtext) # to see the results in console
# we should now save the file to see the results there
with open(fajl,"w") as file:
file.write(newtext)
print(delete_lines(fajl, rec))
Tested with your sample text. if you want to erase "this". The startswith method will wipe "this" or "this," alike. This will only delete the text and let any blank lines alone. if you don't want them you can also compare with "\n" and remove them
I'm kind of on a time crunch, but this was one of my problems in my homework assignment. I am stuck, and I don't know what to do or how to proceed.
Our assignment was to open various text files and within each of the text files, we are supposed to add each word into a dictionary in which the key is the document number it came from, and the value is the word.
For example, one text file would be:
1
Hello, how are you?
I am fine and you?
Each of the text files begin with a number corresponding to it's title (for example, "document1.txt" begins with "1", "document2.txt" begins with "2", etc)
My teacher gave us this coding to help with stripping the punctuation and the lines, but I am having a hard time figuring out where to implement it.
data = re.split("[ .,:;!?\s\b]+|[\r\n]+", line)
data = filter(None, data)
I don't really understand where the filter(None, data) stuff comes into play, because all it does is return a code line of what it represents in memory.
Here's my code so far:
def invertFile(list_of_file_names):
import re
diction = {}
emplist = []
fordiction = []
for x in list_of_file_names:
afile = open(x, 'r')
with afile as f:
for line in f:
savedSort = filterText(f)
def filterText(line):
import re
word_delimiters = [' ', ',', ';', ':', '.','?','!']
data = re.split("[ .,:;!?\s\b]+|[\r\n]+", f)
key, value = data[0], data[1:]
diction[key] = value
How do I make it so each word is appended into a dictionary, where the key is the document it comes from, and the value are the words in the document? Thank you.
I have a long text (winter's tale). Now I want search for the word 'Luzifer' and than the complete line, which includes the word 'Luzifer' should be printed. With complete line I means all between2 dots.
My scrip is printing 'Luzifer' and all following words til end of line dot. But I want have the full line.
For example. the text line is:
'Today Luzifer has a bad day. And he is ill'
My scrip is printing: 'Luzifer has a bad day.'
But I need the complete line inclusive today.
Is there a function or way to rad back ?
Here my script:
#!/usr/bin/python3.6
# coding: utf-8
import re
def suchen(regAusdruck, textdatei):
f = open(textdatei, 'r', encoding='utf-8')
rfctext = f.read()
f.close()
return re.findall(regAusdruck, rfctext)
pattern1 = r'\bLuzifer\b[^.;:!?]{2,}'
print(suchen(pattern1, "tale.txt"))
One of the most straightforward ways of handling this is to read in your entire text (hopefully it is not too big), split on '.', and then return the strings that contain your search word. For good measure, I think it will be useful to replace the newline characters with a space so that you don't have any strings broken into multiple lines.
def suchen(regAusdruck, textdatei):
with open(textdatei, 'r', encoding='utf-8') as f:
entire_text = f.read()
entire_text = entire_text.replace('\n', ' ') # replace newlines with space
sentences = entire_text.split('.')
return [sentence for sentence in sentences if regAusdruck in sentence]
# Alternatively...
# return list(filter(lambda x: regAusdruck in x, sentences))
print(suchen('Luzifer', "tale.txt"))
If you really need to use a regular expression (which may be the case for more complicated searches) a modification is only needed in the return statement.
def suchen(regAusdruck, textdatei):
with open(textdatei, 'r', encoding='utf-8') as f:
entire_text = f.read()
entire_text = entire_text.replace('\n', ' ') # replace newlines with space
sentences = entire_text.split('.')
# We assume you passed in a compiled regular expression object.
return [sentence for sentence in sentences if regAusdruck.search(sentence)]
# Alternatively...
# return list(filter(regAusdruck.search, sentences))
import re
print(suchen(re.compile(r'\bluzifer\b', flags=re.IGNORECASE), "tale.txt"))