Python adding text to a line

Python adding text to a line - python-3.x

a = open('testlines.csv', 'r')
b = a.readlines()
a.close()
for c in range(0,1):
d = '<' + b[c] + '>'
d = b[c].replace(',', '><')
e = re.findall(r'<(.*?)>', d, re.DOTALL)
print(d)
print(e[0],e[1],e[2],e[3],e[4],e[5],e[6],e[7],e[8])
d does not print right, the < or > at the beginning of the line and the end of the line doesn't show up. If I reverse the two line to create/modify d then it doesn't replace the commas. What am I doing wrong here. I want the replace and I need to add in the < > at the beginning and end so I do the findall and create a multidimensional array in the end once it has split everything apart.

The problem is that after you assign d = '<'+b[c]+'>', you do nothing with d, but reassign a value to it. As a result the step where you add <...> is lost.
You can solve it by working on d instead of b[c], like:
for c in range(0,1):
d = '<' + b[c] + '>'
d = d.replace(',', '><') # use d instead of b[c]
e = re.findall(r'<(.*?)>', d, re.DOTALL)
print(d)
print(e[0],e[1],e[2],e[3],e[4],e[5],e[6],e[7],e[8])

Related

Follow up question to: find characters present in two different lines if they satisfy a positional relationship

This is a follow up to this question recapitulated below.
I have the following three strings (ignoring the lines starting with >)
>chain A
---------MGPRLSVWLLLLPAALLLHEEHSRAAA--KGGCAGSGC-GKCDCHGVKGQKGERGLPGLQGVIGFPGMQGPEGPQGPPGQKGDTGEPGLPGTKGTRGPPGASGYPGNPGLPGIPGQDGPPGPPGIPGCNGTKGERGPLGPPGLPGFAGNPGPPGLPGMKGDPGEILGHVPGMLLKGERGFPGIPGTPGPPGLPGLQGPVGPPGFTGPPGPPGPPGPPGEKGQMGLSFQGPKGDKGDQGVSGPPGVPGQA-------QVQEKG
>chain B
---------MGPRLSVWLLLLPAALLLHEEHSRAAA--KGGCAGSGC-GKCDCHGVKGQKGERGLPGLQGVIGFPGMQGPEGPQGPPGQKGDTGEPGLPGTKGTRGPPGASGYPGNPGLPGIPGQDGPPGPPGIPGCNGTKGERGPLGPPGLPGFAGNPGPPGLPGMKGDPGEILGHVPGMLLKGERGFPGIPGTPGPPGLPGLQGPVGPPGFTGPPGPPGPPGPPGEKGQMGLSFQGPKGDKGDQGVSGPPGVPGQA-------QVQEKG
>chain C
MGRDQRAVAGPALRRWLLLGTVTVGFLAQSVLAGVKKFDVPCGGRDCSGGCQCYPEKGGRGQPGPVGPQGYNGPPGLQGFPGLQGRKGDKGERGAPGVTGPKGDVGARGVSGFPGADGIPGHPGQGGPRGRPGYDGCNGTQGDSGPQGPPGSEGFTGPPGPQGPKGQKGEP-YALPKEERDRYRGEPGEPGLVGFQGPPGRPGHVGQMGPVGAPGRPGPPGPPGPKGQQGNRGLGFYGVKGEKGDVGQPGPNGIPSDTLHPIIAPTGVTFH
I want to find out the character position of all R and D/E in the three chains that satisfy the following relationship
Ri (chain A) - Di+2 (chain B)
Ri (chain B) - Di+2 (chain C)
Ri (chain C) - Di+5 (chain A)
Explanation: Iterate over every ith R in chain A and check if the i+2 position of chain B contains D or E. If yes, output the character positions of every such R and D/E pair. Do the same with chains B+C and chains C+A.
Catch: While deciding the relationship, it should count the dashes. But when printing the positions it should disregard the dashes.
Using the script posted in the original question, I get the following output
B-C 187 R E
What the output should be
B-C 175-188 R E
I modified the code posted in the original question to include a correction
awk '
{ chain_id[++c]=$2 # save chain id, eg, "A", "B", "C"
getline # read next line from input file
chains[c]=$0 # save associated chain
}
END { i_char="R" # character to search for in 1st chain
for (i=1;i<=c;i++) { # loop through list of chains
j= (i==c ? 1 : i+1) # determine index of 2nd chain
offset= (i==c ? 5 : 2) # +2 for A-B, B-C; +5 for C-A
chain_i=chains[i] # copy chains as we are going to cut them up as we process them
chain_j=chains[j]
chain_pair= chain_id[i] "-" chain_id[j] # build output label, eg, "A-B"
pos=0 # reset position
while (length(chain_i)>0) {
n=index(chain_i,i_char) # look for "K"
if (n==0) break # if not found we are done with this chain pair so break out of loop else ...
pos=pos+n # update our position in the chain and ...pos is the field position
j_char=substr(chain_j,n+offset,1) # find character from 2nd chain at location n+2
if (j_char ~ /D|E/) {
corr_i=substr(chain_i,1,n)
corr=gsub (/-/,"",corr_i) # if 2nd chain character is one of "D" or "E" then ..
corr_pos=pos-corr
print chain_pair,corr_pos,i_char,j_char # print our finding
}
chain_i=substr(chain_i,n+1) # strip off 1st n characters
chain_j=substr(chain_j,n+1)
}
}
}
' file
but this doesn't help and the output is incorrect.
B-C 187 R E

Adding some logic to keep count of dashes:
awk '
{ chain_id[++c]=$2; getline; chains[c]=$0 }
END { i_char="R"
for (i=1;i<=c;i++) {
j= (i==c ? 1 : i+1)
offset= (i==c ? 5 : 2)
chain_i=chains[i]
chain_j=chains[j]
chain_pair= chain_id[i] "-" chain_id[j]
pos=dash_cnt_i=dash_cnt_j=0
while (length(chain_i)>0) {
n=index(chain_i,i_char)
if (n==0) break
pos=pos+n
head_i = substr(chain_i,1,n) # copy everything up to matching character
head_j = substr(chain_j,1,n) # copy everything up to matching character
dash_cnt_i += gsub(/-/,"",head_i) # add count of dashes in head_i; gsub() returns number of substitutions which in this case is also the number of dashes in head_i
dash_cnt_j += gsub(/-/,"",head_j) # add count of dashes in head_j
j_char=substr(chain_j,n+offset,1)
if (j_char ~ /E|D/)
print chain_pair,(pos-dash_cnt_i) "-" (pos+offset-dash_cnt_j) ,i_char,j_char
chain_i=substr(chain_i,n+1)
chain_j=substr(chain_j,n+1)
}
}
}
' file.txt
This generates:
A-B 355-357 R E
A-B 390-392 R E
A-B 597-599 R D
A-B 781-783 R E
A-B 917-919 R D
A-B 968-970 R D
A-B 1063-1065 R E
A-B 1516-1518 R D
A-B 1638-1640 R E
B-C 175-188 R E # OP's expected result
B-C 346-364 R D
B-C 355-373 R E
B-C 396-414 R D
B-C 500-519 R D
B-C 585-602 R D
B-C 917-963 R E
B-C 1063-1108 R E
B-C 1173-1218 R D
B-C 1516-1562 R D
C-A 334-321 R E
C-A 400-389 R E
C-A 471-459 R E
C-A 740-706 R D
C-A 1228-1190 R E
C-A 1589-1552 R E

How do i convert all the variables into an int straightaway

a,b,c = input(a,b,c).split()
would need a way to immediately convert a,b,c into int without having to do something like
a = int(a)
b = int(b)
c = int(c)
please help :)

l = input().split(',')
l = [ int(i) for i in l ]
a, b, c = *l
Run the code and enter 3 integers separated by ',' then a b and c will contain the 3 integers

Count number of occurences of each string using regex

Given a pattern like this
pattern = re.compile(r'\b(A|B|C)\b')
And a huge_string I would like to replace every substring matching the pattern with a string D and find the number of occurences for each string A, B and C. What is the most feasible approach?
One way is to split the pattern to 3 patterns for each string and then use subn
pattern_a = re.compile(r'\bA\b')
pattern_b = re.compile(r'\bB\b')
pattern_c = re.compile(r'\bC\b')
huge_string, no_a = re.subn(pattern_a, D, huge_string)
huge_string, no_b = re.subn(pattern_b, D, huge_string)
huge_string, no_c = re.subn(pattern_c, D, huge_string)
But it requires 3 passes through the huge_string. Is there a better way?

You may pass a callable as the replacement argument to re.sub and collect the necessary counting details during a single replacement pass:
import re
counter = {}
def repl(m):
if m.group() in counter:
counter[m.group()] += 1
else:
counter[m.group()] = 1
return 'd'
text = "a;b o a;c a l l e d;a;c a b"
rx = re.compile(r'\b(a|b|c)\b')
result = rx.sub(repl, text)
print(counter, result, sep="\n")
See the Python demo online, output;
{'a': 5, 'b': 2, 'c': 2}
d;d o d;d d l l e d;d;d d d

you could do it in 2 passes, the first just counting then the second doing the sub. this will mean if your search space grows like a|b|c|d|e etc you will still only do 2 passes, your number of passes will not be based on your number of possible matches.
import re
from collections import Counter
string = " a j h s j a b c "
pattern = re.compile(r'\b(a|b|c)\b')
counts = Counter(pattern.findall(string))
string_update = pattern.sub('d', string)
print(counts, string, string_update, sep="\n")
OUTPUT
Counter({'a': 2, 'b': 1, 'c': 1})
a j h s j a b c
d j h s j d d d

Convert ALL UPPERCASE to Title

#!/usr/bin/python3.4
import urllib.request
import os
import re
os.chdir('/home/whatever/')
a = open('Shopstxt.csv','r')
b = a.readlines()
a.close()
c = len(b)
d = list(zip(*(e.split(';') for e in b)))
shopname = []
shopaddress = []
shopcity = []
shopphone = []
shopwebsite = []
f = d[0]
g = d[1]
h = d[2]
i = d[3]
j = d[4]
e = -1
for n in range(0, 5):
e = e + 1
sn = f[n]
sn.title()
print(sn)
shopname.append(sn)
sa = g[n]
sa.title()
shopaddress.append(sa)
sc = h[n]
sc.title()
shopcity.append(sc)
Shopstxt.csv is all upper case letters and I want to convert them to title. I thought this would do it but it doesn't...it still leaves them all upper case. What am I doing wrong?
I also want to save the file back. Just wanting to check on a couple of things real quick like as well...time pressed.
When I combine the file back together, before writing it back to the drive do I have to add an '\n' at the end of each line or does it automatically include the '\n' when I write each line to the file?

Strings are immutable, so you need to asign the result of title():
sa = sa.title()
sc = sc.title()
Also, if you do this:
with open("bla.txt", "wt") as outfile:
outfile.write("stuff")
outfile.write("more stuff")
then this will not automatically add line endings.
A quick way to add line endings would be this:
textblobb = "\n".join(list_of_text_lines)
with open("bla.txt", "wt") as outfile:
outfile.write(textblobb)
As long as textblobb isn't inefficiently large and fits into memory, that should do the trick nicely.

Use the .title() method when defining your variables like I did in the code below. As others have mentioned, strings are immutable so save yourself a step and create the string you need in one line.
for n in range(0, 5):
e = e + 1
sn = f[n].title() ### Grab and modify the list index before assigning to your variable
print(sn)
shopname.append(sn)
sa = g[n].title() ###
shopaddress.append(sa)
sc = h[n].title() ###
shopcity.append(sc)

How to make the limit of the readlines to the first number in the file?

so this program predicts the first winning move of the famous Game of Nim. I just need a little help figuring out this problem in the code. The input file reads something like this.
3
13 4 5
29 5 1
34 4 50
The first number would represent the number of lines following the first line that the program has to read. So if the case was
2
**13 4 5
29 5 1**
34 4 50
it would only read the next two lines following it.
So far this has been the progress of my code
def main ():
nim_file = open('nim.txt', 'r')
first_line = nim_file.readline()
counter = 1
n = int (first_line)
for line in nim_file:
for j in range(1, n):
a, b, c = [int(i) for i in line.split()]
nim_sum = a ^ b ^ c
if nim_sum == 0:
print ("Heaps:", a, b, c, ": " "You Lose!")
else:
p = a ^ nim_sum
q = b ^ nim_sum
r = c ^ nim_sum
if p < a:
stack1 = a - p
print ("Heaps:", a, b, c, ": " "remove", stack1, "from Heap 1")
elif q < b:
stack2 = b - q
print ("Heaps:", a, b, c, ": " "remove", stack2, "from Heap 2")
elif r < c:
stack3 = c - r
print ("Heaps:", a, b, c, ": " "remove", stack3, "from Heap 3")
else:
print ("Error")
nim_file.close()
main()
I converted the first line number to an int and tried to set a while loop at first with a counter to see that the counter wouldn't go above the value of n but that didn't work. So any thoughts?

If the file is small, just load the whole thing:
lines = open('nim.txt').readlines()
interesting_lines = lines[1:int(lines[0])+1]
and continue from there.

Yo have two nested for statement, the second of which doesn't make much sence. You need to leave just one, like this:
for _ in range(n):
a, b, c = [int(i) for i in nim_file.readline()]
and remove for line in nim_file. Also check out this question and consider using the with statement to handle the file opening/closing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Python adding text to a line - python-3.x

Related

Follow up question to: find characters present in two different lines if they satisfy a positional relationship

How do i convert all the variables into an int straightaway

Count number of occurences of each string using regex

Convert ALL UPPERCASE to Title

How to make the limit of the readlines to the first number in the file?

Categories

Resources