edit two parts of text document python - python-3.x

Similar to my recent question asked:
I have a text file contain some data using this piece of code
def Add_score():
with open("users.txt") as myFile:
for num, line in enumerate(myFile, 1):
if name in line:
line_found = num
break
It finds the line that has a specific name. The line would look like this.
Name: whatever Username: whatever password: whatever score: 25 goes: 3
I need to be able to add number to score as well as goes
Change 3 to 4 and change 25 to 26

Here you are:
line = 'Name: Username: password: whatever score: 25 goes: 3'
print(line)
lineSplitted = line.split()
print(lineSplitted)
updatedLine = " ".join(lineSplitted[0:5] + [str(int(lineSplitted[5])+1)] + [lineSplitted[6]] + [str(int(lineSplitted[7])+1)])
print(updatedLine)
prints:
Name: Username: password: whatever score: 25 goes: 3
['Name:', 'Username:', 'password:', 'whatever', 'score:', '25', 'goes:', '3']
Name: Username: password: whatever score: 26 goes: 4

Related

How to insert list of words inside particular regex

import re
text = """STAR PLUS LIMITED Unit B & C, 15/F, Casey Aberdeen House, 38 Heung Yip Road, Wong Chuk Hang, Hong Kong. Tel: (852)2511 0112 Fax: 2507 4300 Email: info#starplushk.com Ref No: LSM25781 SALES Sales Quote No: SP21-SQ10452 Buyer's Ref: LSM-021042-5 Messers JSC "Tander" Russian Federation 350002 Krasnodar"""
ref_no = re.findall(r"(?:(?<=Buyer's Ref: )|(?<=Ref No: ))[\w\d-]+",text)
print(ref_no)
Required solution: ['LSM25781', 'LSM-021042-5']
The script above outputs this, but I have man keywords, so I want to generate the regex dynamically. How can I do that?
Tried:
ref_keywords = ["Buyer's Ref:","Ref No:","Reference number:"]
b = r"(?:(?<=" + '|'.join(ref_keyword)+ r" ))[\w\d-]+"
ref_no = re.findall(b, text)
print(ref_no)
This results in the following error
Traceback (most recent call last):
File "/home/v/.config/JetBrains/PyCharm2021.3/scratches/scratch_2.py", line 7, in <module>
ref_no = re.findall(regex, text)
File "/home/v/.pyenv/versions/3.9.5/lib/python3.9/re.py", line 241, in findall
return _compile(pattern, flags).findall(string)
File "/home/v/.pyenv/versions/3.9.5/lib/python3.9/re.py", line 304, in _compile
p = sre_compile.compile(pattern, flags)
File "/home/v/.pyenv/versions/3.9.5/lib/python3.9/sre_compile.py", line 768, in compile
code = _code(p, flags)
File "/home/v/.pyenv/versions/3.9.5/lib/python3.9/sre_compile.py", line 607, in _code
_compile(code, p.data, flags)
File "/home/v/.pyenv/versions/3.9.5/lib/python3.9/sre_compile.py", line 182, in _compile
raise error("look-behind requires fixed-width pattern")
re.error: look-behind requires fixed-width pattern
Process finished with exit code 1
Is there a solution to add list of keywords inside regex. I cannot use "|" because I have many list of keywords.
key_word = ['key1', 'key2', 'key2']
combined_word = ""
for key in key_word:
combined_word += "|"+key
import re
sentence = "I wana delete key1 and key2 but also key3."
re.split(combined_word, sentence)
You can do it the following way (you almost had it, you just need to create all the regex characters for each keyword:
import re
text = """STAR PLUS LIMITED Unit B & C, 15/F, Casey Aberdeen House, 38 Heung Yip Road, Wong Chuk Hang, Hong Kong. Tel: (852)2511 0112 Fax: 2507 4300 Email: info#starplushk.com Ref No: LSM25781 SALES Sales Quote No: SP21-SQ10452 Buyer's Ref: LSM-021042-5 Messers JSC "Tander" Russian Federation 350002 Krasnodar"""
ref_keywords = ["Buyer's Ref:", "Ref No:", "Reference number:"]
def keyword_to_regex(keyword: str) -> str:
# you missed creating these for each keyword
return f"(?<={keyword} )"
regex_for_all_keywords = r"(?:" + "|".join(map(keyword_to_regex, ref_keywords)) + r")[\w\d-]+"
ref_no = re.findall(regex_for_all_keywords, text)
print(ref_no) # ['LSM25781', 'LSM-021042-5']

After a seperator there is a key in loop. How to keep it?

---------------------------
CompanyID: 000000000000
Pizza: 2 3.15 6.30
spaghetti: 1 7 7
ribye: 2 40 80
---------------------------
CompanyID: 000000000001
burger: 1 3.15 6.30
spaghetti: 1 7 7
ribye: 2 40 80
--------------------------
I'm doing a for loop over a list of lines. Every line is an item of a list. I need to keep the companyID while looking for a user input.
While this is printing the variable x=True. I cant take company ID to print it.
a='-'
for line in lines:
if a in line:
companyID= next(line)
if product in line:
x=True
TypeError: 'str' object is not an iterator
You can use your line seperator to identify when new data starts. Once you see the line with "----" then you can start collecing info in a new dictionary. for each line take its key and value by splitting on ":" and create the entry in the dictionary.
When you see the next "----" line you know thats the end of the data for this company so then do your check to see if they have the product and if so print the company id from the dictionary.
line_seperator_char = '-'
company_data = {}
product = 'burger'
with open('data.dat') as lines:
for line in lines:
line = line.rstrip()
if line.startswith(line_seperator_char):
if product in company_data:
print(f'{company_data["CompanyID"]} contains the product {product}')
company_data = {}
else:
key, value = line.split(':')
company_data[key] = value
OUTPUT
000000000001 contains the product burger
No it doesnt run. Could you explain what does "[1] means near split()[1]?
Another try that doesnt run is
y=[]
y=lines[1].split(' ')
for line in lines:
y=line.split(' ')
if len(y[1])==10:
companyID=y[1]
if product in line:
x=True
Thanks for the answers.Something that finally worked in my case was that:
y=[]
y=line[1].split(' ')
a='-'
for line in lines:
if line.startswith("CompanyID:"):
y=line.split(' ')
companyID=y[1]
if product in line:
x=True

Text file to CSV conversion

I have a text file which have content like :
Name: Aar saa
Last Name: sh
DOB: 1997-03-22
Phone: 1212222
Graduation: B.Tech
Specialization: CSE
Graduation Pass Out: 2019
Graduation Percentage: 60
Higher Secondary Percentage: 65
Higher Secondary School Name: Guru Nanak Dev University,amritsar
City: hyd
Venue Details: CMR College of Engineering & Technology (CMRCET) Medchal Road, TS � 501401
Name: bfdg df
Last Name: df
DOB: 2005-12-16
Phone: 2222222
Graduation: B.Tech
Specialization: EEE
Graduation Pass Out: 2018
Graduation Percentage: 45
Higher Secondary Percentage: 45
Higher Secondary School Name: asddasd
City: vjd
Venue Details: Prasad V. Potluri Siddhartha Institute Of Technology, Kanuru, AP - 520007
Name: cc dd ee
Last Name: ee
DOB: 1995-07-28
Phone: 444444444
Graduation: B.Tech
Specialization: ECE
Graduation Pass Out: 2019
Graduation Percentage: 75
Higher Secondary Percentage: 93
Higher Secondary School Name: Sasi institute of technology and engineering
City: hyd
Venue Details: CMR College of Engineering & Technology (CMRCET) Medchal Road, TS � 501401
I want to convert it CSV file with headers as
['Name', 'Last Name','DOB', 'Phone', 'Graduation','Specialization','Graduation Pass Out','Higher Secondary School Name','City','Venue Details']
with value as all the value after ':'
I have done something like this:
writer = csv.writer(open('result.csv', 'a'))
writer.writerow(['Name', 'Last Name','DOB', 'Phone', 'Graduation','Specialization','Graduation Pass Out','Graduation Percentage','Higher Secondary Percentage','Higher Secondary School Name','City','Venue Details'])
with open('Name2.txt') as f:
text = f.read()
myarray = text.split("\n\n")
for text1 in myarray:
parselines(text1, writer)
def parselines(lines,writer):
data=[]
for line in lines.split('\n'):
Name = line.split(": ",1)[1]
data.append(Name)
writer.writerow(data)
It worked but any efficient way would be much appreciated.
This algorithm works (kind-of a state machine)
If blank line, make a new row
Otherwise: add to current row, collect all headers and fields
def parselines(lines):
header = []
csvrows = [{}]
for line in lines:
line = line.strip()
if not line:
csvrows.append({}) # new row, in dict form
else:
field, data = line.split(":", 1)
csvrows[-1][field] = data
if field not in header:
header.append(field)
# format CSV
print(",".join(header))
for row in csvrows:
print(",".join(row.get(h,"") for h in header))

How to capture words spread through multiple lines which have anywhite space(newline, space, tab)

import re
c = """
class_monitor std4:
Name: xyz
Roll number: 123
Age: 9
Badge: Blue
class_monitor std5:
Name: abc
Roll number: 456
Age: 10
Badge: Red
"""
I want to print Name, Roll number and age for std4 and Name, roll number and badge for std5.
pat = (class_monitor)(.*4:)(\n|\s|\t)*(Name:)(.*)(\s|\n|\t)*(Roll number:)(.*)(\s|\n|\t)*(Age:)(.*)(\s|\n|\t)*(Badge:)(.*)
it matches the respective std if I toggle the second group (.*4:) to (.*5:) in pythex.
However, in a script mode, it is not working. Am I missing something here?

Python iterates row data into another file

a python beginner here.
I come up with a small task so I could continue to learn and improve my python. I have an example file1:
Name: Group: UID:
user1 group1 12
user2 group2 23
user3 group3 34
user9 group9 99
I am trying to achieve the result in file2 column:
User_Name: User_ID:
user1 12
user9 99
I have gotten this far, and near the end is where I am stuck:
#!/usr/bin/python3.3
with open("file1") as rfile:
with open("file2", "w") as wfile:
for line in rfile:
l=line.strip()
if l:
x=l.split()
#From here, I am not sure what to do next.
if "user1" in x:
do something...
if "user9" in x:
do something..
Thanks for the help.
You can do
wfile.write( x[0] + "\t" + x[2] + "\n")

Resources