with open("lineup.txt", "r", encoding="utf-8") as file:
ts = list()
fb = list()
for line in file:
line = line[:-1]
data = line.split(",")
if (data[1] == "Fenerbahçe"):
fb.append(line + "\n")
elif (data[1] == "Trabzonspor"):
ts.append(line + "\n")
with open("ts.txt", "w", encoding="utf-8") as file1:
for i in ts:
file1.write(i)
with open("fb.txt", "w", encoding="utf-8") as file2:
for i in fb:
file2.write(i)
print(fb)
print(ts)
And here is some datas from lineup.txt file
U. Çakır, Trabozonspor
Marc Bartra, Trabzonspor
İ. Kahveci, Fenerbahçe
S. Aziz, Fenerbahçe
Trezeguet, Trabzonspor
A. Bayındır, Fenerbahçe
Gustavo Henrique, Fenerbahçe
I am taking ∅ in both lists so I can not write datas into txt. I can't figure it out why
There may be an issue with your split function. If you split on comma, your second word of your list will start with a space. And if that is not your only word after the comma and space, you won't be able to match the equality anyways.
One way to solve this, is by using the "in" operator, avoiding the "split" completely.
for line in file:
if ("Fenerbahçe" in line):
fb.append(line[:-1] + "\n")
elif ("Trabzonspor" in line):
ts.append(line[:-1] + "\n")
If the matching is more complex than that, you may consider using regex and exploit word boundaries before and after your matching city names.
Related
I am currently comparing the text of one file to that of another file.
The method: for each row in the source text file, check each row in the compare text file.
If the word is present in the compare file then write the word and write 'present' next to it.
If the word is not present then write the word and write not_present next to it.
so far I can do this fine by printing to the console output as shown below:
import sys
filein = 'source.txt'
compare = 'compare.txt'
source = 'source.txt'
# change to lower case
with open(filein,'r+') as fopen:
string = ""
for line in fopen.readlines():
string = string + line.lower()
with open(filein,'w') as fopen:
fopen.write(string)
# search and list
with open(compare) as f:
searcher = f.read()
if not searcher:
sys.exit("Could not read data :-(")
#search and output the results
with open(source) as f:
for item in (line.strip() for line in f):
if item in searcher:
print(item, ',present')
else:
print(item, ',not_present')
the output looks like this:
dog ,present
cat ,present
mouse ,present
horse ,not_present
elephant ,present
pig ,present
what I would like is to put this into a pandas dataframe, preferably 2 columns, one for the word and the second for its state . I cant seem to get my head around doing this.
I am making several assumptions here to include:
Compare.txt is a text file consisting of a list of single words 1 word per line.
Source.txt is a free flowing text file, which includes multiple words per line and each word is separated by a space.
When comparing to determine if a compare word is in source, is is found if and only if, no punctuation marks (i.e. " ' , . ?, etc) are appended to the word in source .
The output dataframe will only contain the words found in compare.txt.
The final output is a printed version of the pandas dataframe.
With these assumptions:
import pandas as pd
from collections import defaultdict
compare = 'compare.txt'
source = 'source.txt'
rslt = defaultdict(list)
def getCompareTxt(fid: str) -> list:
clist = []
with open(fid, 'r') as cmpFile:
for line in cmpFile.readlines():
clist.append(line.lower().strip('\n'))
return clist
cmpList = getCompareTxt(compare)
if cmpList:
with open(source, 'r') as fsrc:
items = []
for item in (line.strip().split(' ') for line in fsrc):
items.extend(item)
print(items)
for cmpItm in cmpList:
rslt['Name'].append(cmpItm)
if cmpItm in items:
rslt['State'].append('Present')
else:
rslt['State'].append('Not Present')
df = pd.DataFrame(rslt, index=range(len(cmpList)))
print(df)
else:
print('No compare data present')
For instance, the .txt file includes 2 lines, separated by commas:
John, George, Tom
Mark, James, Tom,
Output should be:
[George, James, John, Mark, Tom]
The following will create the list and store each item as a string.
def test(path):
filename = path
with open(filename) as f:
f = f.read()
f_list = f.split('\n')
for i in f_list:
if i == '':
f_list.remove(i)
res1 = []
for i in f_list:
res1.append(i.split(', '))
res2 = []
for i in res1:
res2 += i
res3 = [i.strip(',') for i in res2]
for i in res3:
if res3.count(i) != 1:
res3.remove(i)
res3.sort()
return res3
print(test('location/of/file.txt'))
Output:
['George', 'James', 'John', 'Mark', 'Tom']
Your file opening is fine, although the 'r' is redundant since that's the default. You claim it's not, but it is. Read the documentation.
You have not described what task is so I have no idea what's going on there. I will assume that it is correct.
Rather than populating a list and doing a membership test on every iteration - which is O(n^2) in time - can you think of a different data structure that guarantees uniqueness? Google will be your friend here. Once you discover this data structure, you will not have to perform membership checks at all. You seem to be struggling with this concept; the answer is a set.
The input data format is not rigorously defined. Separators may be commas or commas with trailing spaces, and may appear (or not) at the end of the line. Consider making an appropriate regular expression and using its splitting feature to split individual lines, though normal splitting and stripping may be easier to start.
In the following example code, I've:
ignored task since you've said that that's fine;
separated actual parsing of file content from parsing of in-memory content to demonstrate the function without a file;
used a set comprehension to store unique results of all split lines; and
used a generator to sorted that drops empty strings.
from io import StringIO
from typing import TextIO, List
def parse(f: TextIO) -> List[str]:
words = {
word.strip()
for line in f
for word in line.split(',')
}
return sorted(
word for word in words if word != ''
)
def parse_file(filename: str) -> List[str]:
with open(filename) as f:
return parse(f)
def test():
f = StringIO('John, George , Tom\nMark, James, Tom, ')
words = parse(f)
assert words == [
'George', 'James', 'John', 'Mark', 'Tom',
]
f = StringIO(' Han Solo, Boba Fet \n')
words = parse(f)
assert words == [
'Boba Fet', 'Han Solo',
]
if __name__ == '__main__':
test()
I came up with a very simple solution if anyone will need:
lines = x.read().split()
lines.sort()
new_list = []
[new_list.append(word) for word in lines if word not in new_list]
return new_list
with open("text.txt", "r") as fl:
list_ = set()
for line in fl.readlines():
line = line.strip("\n")
line = line.split(",")
[list_.add(_) for _ in line if _ != '']
print(list_)
I think that you missed a comma after Jim in the first line.
You can avoid the use of a loop by using split property :
content=file.read()
my_list=content.split(",")
to delete the occurence in your list you can transform it to set :
my_list=list(set(my_list))
then you can sort it using sorted
so the finale code :
with open("file.txt", "r") as file :
content=file.read()
my_list=content.replace("\n","").replace(" ", "").split(",")
result=sorted(list(set(my_list)))
you can add a key to your sort function
I am trying to write a script to print specific words after a particular string.
Here is the input file
Theyare "playing in the ground", with friends
Theyare "going to Paris", with family
Theyare "motivating to learn new things", by themselves
In the output I am trying to select "are" as the keyword and after "are" I want the text which is in the "" and I want to add the text before space to the "".
output should be
They playing in the ground
They going to Paris
They motivating to learn new things
I can print the rest of the line with the below code but not certain words. So far I have
with open ('input.txt', 'r') as f:
for lines in f:
a = re.search(r'\bare', f):
if a:
print (lines)
Any help would be appreciated
Use a regular expression to extract the parts of the line you want.
with open ('input.txt', 'r') as f:
for lines in f:
m = re.match(r'(.*?) are "(.*?)"')
if m:
print m.group(1) + " " + m.group(2)
The groups in m return the parts of the line that matches the patterns between ().
If your lines always look like the examples you provided you can use string manipulations:
s = 'They are "playing in the ground", with friends'
are_split = s.split('are')
# are_split = ['They ', ' "playing in the ground", with friends']
quote_split = are_split[1].split('"')
# quote_split = [' ', 'playing in the ground', ', with friends']
print(are_split[0] + quote_split[1])
# 'They playing in the ground'
Input
02000|42163,54|
03100|4|6070,00
03110|||6070,00|00|00|
00000|31751150201912001|01072000600074639|
02000|288465,76|
03100|11|9060,00
03110|||1299,00|00|
03110||||7761,00|00|
03100|29|14031,21
03110|||14031,21|00|
00000|31757328201912001|01072000601021393|
Code
prev = ''
with open('out.txt') as f:
for line in f:
if prev.startswith('03110') and line.startswith('03110'):
print(prev.strip()+ '|03100|XX|PARCELA|' + line)
prev = line
Hi, I have this code that search if two consecutives lines startswith 03110 and print those line, but I wanna transforme the code so it prints or write at .txt also the rest of the lines
Output should be like this
02000|42163,54|
03100|4|6070,00
03110|||6070,00|00|00|
00000|31751150201912001|01072000600074639|
02000|288465,76|
03100|11|9060,00
03110|||1299,00|00|3100|XX|PARCELA|03110||||7761,00|00|
03100|29|14031,21
03110|||14031,21|00|
00000|31757328201912001|01072000601021393|
I´m know that I´m getting only those two lines merged, because that is the command at print()
03110|||1299,00|00|3100|XX|PARCELA|03110||||7761,00|00|
But I don´t know to make the desire output, can anyone help me with my code?
# I assume the input is in a text file:
with open('myFile.txt', 'r') as my_file:
splited_line = [line.rstrip().split('|') for line in my_file] # this will split every line as a separate list
new_list = []
for i in range(len(splited_line)):
try:
if splited_line[i][0] == '03110' and splited_line[i-1][0] == '03110': # if the current line and the previous line start with 03110
first = '|'.join(splited_line[i-1])
second = '|'.join(splited_line[i])
newLine = first + "|03100|XX|PARCELA|"+ second
new_list.append(newLine)
elif splited_line[i][0] == '03110' and splited_line[i+1][0] == '03110': # to escape duplicating in the list
pass
else:
line = '|'.join(splited_line[i])
new_list.append(line)
except IndexError:
pass
# To write the new_list to text files
with open('new_file' , 'a') as f:
for item in new_list:
print(item)
f.write(item + '\n')
I have a long text (winter's tale). Now I want search for the word 'Luzifer' and than the complete line, which includes the word 'Luzifer' should be printed. With complete line I means all between2 dots.
My scrip is printing 'Luzifer' and all following words til end of line dot. But I want have the full line.
For example. the text line is:
'Today Luzifer has a bad day. And he is ill'
My scrip is printing: 'Luzifer has a bad day.'
But I need the complete line inclusive today.
Is there a function or way to rad back ?
Here my script:
#!/usr/bin/python3.6
# coding: utf-8
import re
def suchen(regAusdruck, textdatei):
f = open(textdatei, 'r', encoding='utf-8')
rfctext = f.read()
f.close()
return re.findall(regAusdruck, rfctext)
pattern1 = r'\bLuzifer\b[^.;:!?]{2,}'
print(suchen(pattern1, "tale.txt"))
One of the most straightforward ways of handling this is to read in your entire text (hopefully it is not too big), split on '.', and then return the strings that contain your search word. For good measure, I think it will be useful to replace the newline characters with a space so that you don't have any strings broken into multiple lines.
def suchen(regAusdruck, textdatei):
with open(textdatei, 'r', encoding='utf-8') as f:
entire_text = f.read()
entire_text = entire_text.replace('\n', ' ') # replace newlines with space
sentences = entire_text.split('.')
return [sentence for sentence in sentences if regAusdruck in sentence]
# Alternatively...
# return list(filter(lambda x: regAusdruck in x, sentences))
print(suchen('Luzifer', "tale.txt"))
If you really need to use a regular expression (which may be the case for more complicated searches) a modification is only needed in the return statement.
def suchen(regAusdruck, textdatei):
with open(textdatei, 'r', encoding='utf-8') as f:
entire_text = f.read()
entire_text = entire_text.replace('\n', ' ') # replace newlines with space
sentences = entire_text.split('.')
# We assume you passed in a compiled regular expression object.
return [sentence for sentence in sentences if regAusdruck.search(sentence)]
# Alternatively...
# return list(filter(regAusdruck.search, sentences))
import re
print(suchen(re.compile(r'\bluzifer\b', flags=re.IGNORECASE), "tale.txt"))