Python adding text to a line - python-3.x

a = open('testlines.csv', 'r')
b = a.readlines()
a.close()
for c in range(0,1):
d = '<' + b[c] + '>'
d = b[c].replace(',', '><')
e = re.findall(r'<(.*?)>', d, re.DOTALL)
print(d)
print(e[0],e[1],e[2],e[3],e[4],e[5],e[6],e[7],e[8])
d does not print right, the < or > at the beginning of the line and the end of the line doesn't show up. If I reverse the two line to create/modify d then it doesn't replace the commas. What am I doing wrong here. I want the replace and I need to add in the < > at the beginning and end so I do the findall and create a multidimensional array in the end once it has split everything apart.

The problem is that after you assign d = '<'+b[c]+'>', you do nothing with d, but reassign a value to it. As a result the step where you add <...> is lost.
You can solve it by working on d instead of b[c], like:
for c in range(0,1):
d = '<' + b[c] + '>'
d = d.replace(',', '><') # use d instead of b[c]
e = re.findall(r'<(.*?)>', d, re.DOTALL)
print(d)
print(e[0],e[1],e[2],e[3],e[4],e[5],e[6],e[7],e[8])

Related

Follow up question to: find characters present in two different lines if they satisfy a positional relationship

This is a follow up to this question recapitulated below.
I have the following three strings (ignoring the lines starting with >)
>chain A
---------MGPRLSVWLLLLPAALLLHEEHSRAAA--KGGCAGSGC-GKCDCHGVKGQKGERGLPGLQGVIGFPGMQGPEGPQGPPGQKGDTGEPGLPGTKGTRGPPGASGYPGNPGLPGIPGQDGPPGPPGIPGCNGTKGERGPLGPPGLPGFAGNPGPPGLPGMKGDPGEILGHVPGMLLKGERGFPGIPGTPGPPGLPGLQGPVGPPGFTGPPGPPGPPGPPGEKGQMGLSFQGPKGDKGDQGVSGPPGVPGQA-------QVQEKG
>chain B
---------MGPRLSVWLLLLPAALLLHEEHSRAAA--KGGCAGSGC-GKCDCHGVKGQKGERGLPGLQGVIGFPGMQGPEGPQGPPGQKGDTGEPGLPGTKGTRGPPGASGYPGNPGLPGIPGQDGPPGPPGIPGCNGTKGERGPLGPPGLPGFAGNPGPPGLPGMKGDPGEILGHVPGMLLKGERGFPGIPGTPGPPGLPGLQGPVGPPGFTGPPGPPGPPGPPGEKGQMGLSFQGPKGDKGDQGVSGPPGVPGQA-------QVQEKG
>chain C
MGRDQRAVAGPALRRWLLLGTVTVGFLAQSVLAGVKKFDVPCGGRDCSGGCQCYPEKGGRGQPGPVGPQGYNGPPGLQGFPGLQGRKGDKGERGAPGVTGPKGDVGARGVSGFPGADGIPGHPGQGGPRGRPGYDGCNGTQGDSGPQGPPGSEGFTGPPGPQGPKGQKGEP-YALPKEERDRYRGEPGEPGLVGFQGPPGRPGHVGQMGPVGAPGRPGPPGPPGPKGQQGNRGLGFYGVKGEKGDVGQPGPNGIPSDTLHPIIAPTGVTFH
I want to find out the character position of all R and D/E in the three chains that satisfy the following relationship
Ri (chain A) - Di+2 (chain B)
Ri (chain B) - Di+2 (chain C)
Ri (chain C) - Di+5 (chain A)
Explanation: Iterate over every ith R in chain A and check if the i+2 position of chain B contains D or E. If yes, output the character positions of every such R and D/E pair. Do the same with chains B+C and chains C+A.
Catch: While deciding the relationship, it should count the dashes. But when printing the positions it should disregard the dashes.
Using the script posted in the original question, I get the following output
B-C 187 R E
What the output should be
B-C 175-188 R E
I modified the code posted in the original question to include a correction
awk '
{ chain_id[++c]=$2 # save chain id, eg, "A", "B", "C"
getline # read next line from input file
chains[c]=$0 # save associated chain
}
END { i_char="R" # character to search for in 1st chain
for (i=1;i<=c;i++) { # loop through list of chains
j= (i==c ? 1 : i+1) # determine index of 2nd chain
offset= (i==c ? 5 : 2) # +2 for A-B, B-C; +5 for C-A
chain_i=chains[i] # copy chains as we are going to cut them up as we process them
chain_j=chains[j]
chain_pair= chain_id[i] "-" chain_id[j] # build output label, eg, "A-B"
pos=0 # reset position
while (length(chain_i)>0) {
n=index(chain_i,i_char) # look for "K"
if (n==0) break # if not found we are done with this chain pair so break out of loop else ...
pos=pos+n # update our position in the chain and ...pos is the field position
j_char=substr(chain_j,n+offset,1) # find character from 2nd chain at location n+2
if (j_char ~ /D|E/) {
corr_i=substr(chain_i,1,n)
corr=gsub (/-/,"",corr_i) # if 2nd chain character is one of "D" or "E" then ..
corr_pos=pos-corr
print chain_pair,corr_pos,i_char,j_char # print our finding
}
chain_i=substr(chain_i,n+1) # strip off 1st n characters
chain_j=substr(chain_j,n+1)
}
}
}
' file
but this doesn't help and the output is incorrect.
B-C 187 R E
Adding some logic to keep count of dashes:
awk '
{ chain_id[++c]=$2; getline; chains[c]=$0 }
END { i_char="R"
for (i=1;i<=c;i++) {
j= (i==c ? 1 : i+1)
offset= (i==c ? 5 : 2)
chain_i=chains[i]
chain_j=chains[j]
chain_pair= chain_id[i] "-" chain_id[j]
pos=dash_cnt_i=dash_cnt_j=0
while (length(chain_i)>0) {
n=index(chain_i,i_char)
if (n==0) break
pos=pos+n
head_i = substr(chain_i,1,n) # copy everything up to matching character
head_j = substr(chain_j,1,n) # copy everything up to matching character
dash_cnt_i += gsub(/-/,"",head_i) # add count of dashes in head_i; gsub() returns number of substitutions which in this case is also the number of dashes in head_i
dash_cnt_j += gsub(/-/,"",head_j) # add count of dashes in head_j
j_char=substr(chain_j,n+offset,1)
if (j_char ~ /E|D/)
print chain_pair,(pos-dash_cnt_i) "-" (pos+offset-dash_cnt_j) ,i_char,j_char
chain_i=substr(chain_i,n+1)
chain_j=substr(chain_j,n+1)
}
}
}
' file.txt
This generates:
A-B 355-357 R E
A-B 390-392 R E
A-B 597-599 R D
A-B 781-783 R E
A-B 917-919 R D
A-B 968-970 R D
A-B 1063-1065 R E
A-B 1516-1518 R D
A-B 1638-1640 R E
B-C 175-188 R E # OP's expected result
B-C 346-364 R D
B-C 355-373 R E
B-C 396-414 R D
B-C 500-519 R D
B-C 585-602 R D
B-C 917-963 R E
B-C 1063-1108 R E
B-C 1173-1218 R D
B-C 1516-1562 R D
C-A 334-321 R E
C-A 400-389 R E
C-A 471-459 R E
C-A 740-706 R D
C-A 1228-1190 R E
C-A 1589-1552 R E

How do i convert all the variables into an int straightaway

a,b,c = input(a,b,c).split()
would need a way to immediately convert a,b,c into int without having to do something like
a = int(a)
b = int(b)
c = int(c)
please help :)
l = input().split(',')
l = [ int(i) for i in l ]
a, b, c = *l
Run the code and enter 3 integers separated by ',' then a b and c will contain the 3 integers

Count number of occurences of each string using regex

Given a pattern like this
pattern = re.compile(r'\b(A|B|C)\b')
And a huge_string I would like to replace every substring matching the pattern with a string D and find the number of occurences for each string A, B and C. What is the most feasible approach?
One way is to split the pattern to 3 patterns for each string and then use subn
pattern_a = re.compile(r'\bA\b')
pattern_b = re.compile(r'\bB\b')
pattern_c = re.compile(r'\bC\b')
huge_string, no_a = re.subn(pattern_a, D, huge_string)
huge_string, no_b = re.subn(pattern_b, D, huge_string)
huge_string, no_c = re.subn(pattern_c, D, huge_string)
But it requires 3 passes through the huge_string. Is there a better way?
You may pass a callable as the replacement argument to re.sub and collect the necessary counting details during a single replacement pass:
import re
counter = {}
def repl(m):
if m.group() in counter:
counter[m.group()] += 1
else:
counter[m.group()] = 1
return 'd'
text = "a;b o a;c a l l e d;a;c a b"
rx = re.compile(r'\b(a|b|c)\b')
result = rx.sub(repl, text)
print(counter, result, sep="\n")
See the Python demo online, output;
{'a': 5, 'b': 2, 'c': 2}
d;d o d;d d l l e d;d;d d d
you could do it in 2 passes, the first just counting then the second doing the sub. this will mean if your search space grows like a|b|c|d|e etc you will still only do 2 passes, your number of passes will not be based on your number of possible matches.
import re
from collections import Counter
string = " a j h s j a b c "
pattern = re.compile(r'\b(a|b|c)\b')
counts = Counter(pattern.findall(string))
string_update = pattern.sub('d', string)
print(counts, string, string_update, sep="\n")
OUTPUT
Counter({'a': 2, 'b': 1, 'c': 1})
a j h s j a b c
d j h s j d d d

Convert ALL UPPERCASE to Title

#!/usr/bin/python3.4
import urllib.request
import os
import re
os.chdir('/home/whatever/')
a = open('Shopstxt.csv','r')
b = a.readlines()
a.close()
c = len(b)
d = list(zip(*(e.split(';') for e in b)))
shopname = []
shopaddress = []
shopcity = []
shopphone = []
shopwebsite = []
f = d[0]
g = d[1]
h = d[2]
i = d[3]
j = d[4]
e = -1
for n in range(0, 5):
e = e + 1
sn = f[n]
sn.title()
print(sn)
shopname.append(sn)
sa = g[n]
sa.title()
shopaddress.append(sa)
sc = h[n]
sc.title()
shopcity.append(sc)
Shopstxt.csv is all upper case letters and I want to convert them to title. I thought this would do it but it doesn't...it still leaves them all upper case. What am I doing wrong?
I also want to save the file back. Just wanting to check on a couple of things real quick like as well...time pressed.
When I combine the file back together, before writing it back to the drive do I have to add an '\n' at the end of each line or does it automatically include the '\n' when I write each line to the file?
Strings are immutable, so you need to asign the result of title():
sa = sa.title()
sc = sc.title()
Also, if you do this:
with open("bla.txt", "wt") as outfile:
outfile.write("stuff")
outfile.write("more stuff")
then this will not automatically add line endings.
A quick way to add line endings would be this:
textblobb = "\n".join(list_of_text_lines)
with open("bla.txt", "wt") as outfile:
outfile.write(textblobb)
As long as textblobb isn't inefficiently large and fits into memory, that should do the trick nicely.
Use the .title() method when defining your variables like I did in the code below. As others have mentioned, strings are immutable so save yourself a step and create the string you need in one line.
for n in range(0, 5):
e = e + 1
sn = f[n].title() ### Grab and modify the list index before assigning to your variable
print(sn)
shopname.append(sn)
sa = g[n].title() ###
shopaddress.append(sa)
sc = h[n].title() ###
shopcity.append(sc)

How to make the limit of the readlines to the first number in the file?

so this program predicts the first winning move of the famous Game of Nim. I just need a little help figuring out this problem in the code. The input file reads something like this.
3
13 4 5
29 5 1
34 4 50
The first number would represent the number of lines following the first line that the program has to read. So if the case was
2
**13 4 5
29 5 1**
34 4 50
it would only read the next two lines following it.
So far this has been the progress of my code
def main ():
nim_file = open('nim.txt', 'r')
first_line = nim_file.readline()
counter = 1
n = int (first_line)
for line in nim_file:
for j in range(1, n):
a, b, c = [int(i) for i in line.split()]
nim_sum = a ^ b ^ c
if nim_sum == 0:
print ("Heaps:", a, b, c, ": " "You Lose!")
else:
p = a ^ nim_sum
q = b ^ nim_sum
r = c ^ nim_sum
if p < a:
stack1 = a - p
print ("Heaps:", a, b, c, ": " "remove", stack1, "from Heap 1")
elif q < b:
stack2 = b - q
print ("Heaps:", a, b, c, ": " "remove", stack2, "from Heap 2")
elif r < c:
stack3 = c - r
print ("Heaps:", a, b, c, ": " "remove", stack3, "from Heap 3")
else:
print ("Error")
nim_file.close()
main()
I converted the first line number to an int and tried to set a while loop at first with a counter to see that the counter wouldn't go above the value of n but that didn't work. So any thoughts?
If the file is small, just load the whole thing:
lines = open('nim.txt').readlines()
interesting_lines = lines[1:int(lines[0])+1]
and continue from there.
Yo have two nested for statement, the second of which doesn't make much sence. You need to leave just one, like this:
for _ in range(n):
a, b, c = [int(i) for i in nim_file.readline()]
and remove for line in nim_file. Also check out this question and consider using the with statement to handle the file opening/closing.

Resources