Question about editting very long sentence with python - python-3.x

I want examine the hexa sentence.
with open("C:/python_tria/HEX/sample/test.zip", "rb+") as f:
stri = str(f. read())
sta=stri.find('this is where to start')
end=stri.find('this is where to end')
My plan is extract the part between 'sta' through 'end'.
What is the solution I could take?

You could try using re.findall on the file text to find what you are looking for:
with open("C:/python_tria/HEX/sample/test.zip", "rb+") as f:
stri = str(f.read())
matches = re.findall(r'this is where to start.*?this is where to end', stri, flags=re.DOTALL)
print(matches[0]) # print the first match

Related

How to split strings from .txt file into a list, sorted from A-Z without duplicates?

For instance, the .txt file includes 2 lines, separated by commas:
John, George, Tom
Mark, James, Tom,
Output should be:
[George, James, John, Mark, Tom]
The following will create the list and store each item as a string.
def test(path):
filename = path
with open(filename) as f:
f = f.read()
f_list = f.split('\n')
for i in f_list:
if i == '':
f_list.remove(i)
res1 = []
for i in f_list:
res1.append(i.split(', '))
res2 = []
for i in res1:
res2 += i
res3 = [i.strip(',') for i in res2]
for i in res3:
if res3.count(i) != 1:
res3.remove(i)
res3.sort()
return res3
print(test('location/of/file.txt'))
Output:
['George', 'James', 'John', 'Mark', 'Tom']
Your file opening is fine, although the 'r' is redundant since that's the default. You claim it's not, but it is. Read the documentation.
You have not described what task is so I have no idea what's going on there. I will assume that it is correct.
Rather than populating a list and doing a membership test on every iteration - which is O(n^2) in time - can you think of a different data structure that guarantees uniqueness? Google will be your friend here. Once you discover this data structure, you will not have to perform membership checks at all. You seem to be struggling with this concept; the answer is a set.
The input data format is not rigorously defined. Separators may be commas or commas with trailing spaces, and may appear (or not) at the end of the line. Consider making an appropriate regular expression and using its splitting feature to split individual lines, though normal splitting and stripping may be easier to start.
In the following example code, I've:
ignored task since you've said that that's fine;
separated actual parsing of file content from parsing of in-memory content to demonstrate the function without a file;
used a set comprehension to store unique results of all split lines; and
used a generator to sorted that drops empty strings.
from io import StringIO
from typing import TextIO, List
def parse(f: TextIO) -> List[str]:
words = {
word.strip()
for line in f
for word in line.split(',')
}
return sorted(
word for word in words if word != ''
)
def parse_file(filename: str) -> List[str]:
with open(filename) as f:
return parse(f)
def test():
f = StringIO('John, George , Tom\nMark, James, Tom, ')
words = parse(f)
assert words == [
'George', 'James', 'John', 'Mark', 'Tom',
]
f = StringIO(' Han Solo, Boba Fet \n')
words = parse(f)
assert words == [
'Boba Fet', 'Han Solo',
]
if __name__ == '__main__':
test()
I came up with a very simple solution if anyone will need:
lines = x.read().split()
lines.sort()
new_list = []
[new_list.append(word) for word in lines if word not in new_list]
return new_list
with open("text.txt", "r") as fl:
list_ = set()
for line in fl.readlines():
line = line.strip("\n")
line = line.split(",")
[list_.add(_) for _ in line if _ != '']
print(list_)
I think that you missed a comma after Jim in the first line.
You can avoid the use of a loop by using split property :
content=file.read()
my_list=content.split(",")
to delete the occurence in your list you can transform it to set :
my_list=list(set(my_list))
then you can sort it using sorted
so the finale code :
with open("file.txt", "r") as file :
content=file.read()
my_list=content.replace("\n","").replace(" ", "").split(",")
result=sorted(list(set(my_list)))
you can add a key to your sort function

How to print specific characters in a line using python?

I am trying to write a script to print specific words after a particular string.
Here is the input file
Theyare "playing in the ground", with friends
Theyare "going to Paris", with family
Theyare "motivating to learn new things", by themselves
In the output I am trying to select "are" as the keyword and after "are" I want the text which is in the "" and I want to add the text before space to the "".
output should be
They playing in the ground
They going to Paris
They motivating to learn new things
I can print the rest of the line with the below code but not certain words. So far I have
with open ('input.txt', 'r') as f:
for lines in f:
a = re.search(r'\bare', f):
if a:
print (lines)
Any help would be appreciated
Use a regular expression to extract the parts of the line you want.
with open ('input.txt', 'r') as f:
for lines in f:
m = re.match(r'(.*?) are "(.*?)"')
if m:
print m.group(1) + " " + m.group(2)
The groups in m return the parts of the line that matches the patterns between ().
If your lines always look like the examples you provided you can use string manipulations:
s = 'They are "playing in the ground", with friends'
are_split = s.split('are')
# are_split = ['They ', ' "playing in the ground", with friends']
quote_split = are_split[1].split('"')
# quote_split = [' ', 'playing in the ground', ', with friends']
print(are_split[0] + quote_split[1])
# 'They playing in the ground'

How to replace all occurrence of a string in file except the first occurrence using Python?

Here is the content of my file
I want to replace all occurrence of pyt_batch_id with any number but not the first occurrence.
I tried the below method and it is working as expected but I don't think this the best approach.
s = open("BATCH_ROLLBACK.txt").read()
s = s.replace('pyt_batch_id', '123456')
f = open("BATCH_ROLLBACK.txt", 'w')
f.write(s)
f.close()
f2 = open('BATCH_ROLLBACK.txt', 'r')
contents = f2.read().replace('123456', 'pyt_batch_id',1)
f2.close()
f2 = open('BATCH_ROLLBACK.txt', 'w')
f2.write(contents)
f2.close()
output :-
Could anyone please suggest other alternative methods?
Found similar question but that is for a line not for file.
How to replace all occurences except the first one?

Can anyone help to fecth only dictionary format data from a file(example of data shown below)

Here is the how the file is setup:
Some lines written here.
Line one written here.
line two written here.
key1:value1
key2:value2
key3:value3
key4:value4
All above keys and values are mentioned. Then:
Line three written here.
key5:value5 key6:value6
key7:value7
I have tried this way but did not get the desired result..
with open(r'/home/rajat/PycharmProjects/MyProject/testfile.txt') as f:
lines = f.readlines()
regex = re.compile(r'''
[\S]+:
(?:
\s
(?!\S+:)\S+
)+
''', re.VERBOSE)
matches = regex.findall(str(lines))
for match in matches:
print(match)
Finally I got it what i wanted by myself...
Exact code below for my solution.
import re
with open(r'/home/coding_learner/PycharmProjects/MyProject/testfile.txt', 'r') as f:
lines = f.readlines()
lines = map(lambda s: s.strip(), lines)
r = re.compile(".[\S]+: [\S]+")
newlist = list(filter(r.match, lines))
print(newlist)

print complete line which includes a search word. the eol is a dot not a line feed

I have a long text (winter's tale). Now I want search for the word 'Luzifer' and than the complete line, which includes the word 'Luzifer' should be printed. With complete line I means all between2 dots.
My scrip is printing 'Luzifer' and all following words til end of line dot. But I want have the full line.
For example. the text line is:
'Today Luzifer has a bad day. And he is ill'
My scrip is printing: 'Luzifer has a bad day.'
But I need the complete line inclusive today.
Is there a function or way to rad back ?
Here my script:
#!/usr/bin/python3.6
# coding: utf-8
import re
def suchen(regAusdruck, textdatei):
f = open(textdatei, 'r', encoding='utf-8')
rfctext = f.read()
f.close()
return re.findall(regAusdruck, rfctext)
pattern1 = r'\bLuzifer\b[^.;:!?]{2,}'
print(suchen(pattern1, "tale.txt"))
One of the most straightforward ways of handling this is to read in your entire text (hopefully it is not too big), split on '.', and then return the strings that contain your search word. For good measure, I think it will be useful to replace the newline characters with a space so that you don't have any strings broken into multiple lines.
def suchen(regAusdruck, textdatei):
with open(textdatei, 'r', encoding='utf-8') as f:
entire_text = f.read()
entire_text = entire_text.replace('\n', ' ') # replace newlines with space
sentences = entire_text.split('.')
return [sentence for sentence in sentences if regAusdruck in sentence]
# Alternatively...
# return list(filter(lambda x: regAusdruck in x, sentences))
print(suchen('Luzifer', "tale.txt"))
If you really need to use a regular expression (which may be the case for more complicated searches) a modification is only needed in the return statement.
def suchen(regAusdruck, textdatei):
with open(textdatei, 'r', encoding='utf-8') as f:
entire_text = f.read()
entire_text = entire_text.replace('\n', ' ') # replace newlines with space
sentences = entire_text.split('.')
# We assume you passed in a compiled regular expression object.
return [sentence for sentence in sentences if regAusdruck.search(sentence)]
# Alternatively...
# return list(filter(regAusdruck.search, sentences))
import re
print(suchen(re.compile(r'\bluzifer\b', flags=re.IGNORECASE), "tale.txt"))

Resources