How to Print a string to a specific character? - string

I have a file like this:
NA|polymerase|KC545393|Bundibugyo_ebolavirus|EboBund_112_2012|NA|2012|Human|Democratic_Republic_of_the_Congo
NA|VP24|KC545393|Bundibugyo_ebolavirus|EboBund_112_2012|NA|2012|Human|Democratic_Republic_of_the_Congo
NA|VP30|KC545393|Bundibugyo_ebolavirus|EboBund_112_2012|NA|2012|Human|Democratic_Republic_of_the_Congo
I am trying to print this characters from each line:
polymerase|KC545393
VP24|KC545393
VP30|KC545393
How can I do this?
I tried this code:
for character in line:
if character=="|":
print line[1:i.index(j)]

Use str.split() to split each line by the '|' character; you can limit the splitting because you only need the first 3 columns:
elems = line.split('|', 3)
print '|'.join(elems[1:3])
The print line then takes the elements at index 1 and 2 and joins them together again using the '|' character to produce your desired output.
Demo:
>>> lines = '''\
... NA|polymerase|KC545393|Bundibugyo_ebolavirus|EboBund_112_2012|NA|2012|Human|Democratic_Republic_of_the_Congo
... NA|VP24|KC545393|Bundibugyo_ebolavirus|EboBund_112_2012|NA|2012|Human|Democratic_Republic_of_the_Congo
... NA|VP30|KC545393|Bundibugyo_ebolavirus|EboBund_112_2012|NA|2012|Human|Democratic_Republic_of_the_Congo
... '''.splitlines(True)
>>> for line in lines:
... elems = line.split('|', 3)
... print '|'.join(elems[1:3])
...
polymerase|KC545393
VP24|KC545393
VP30|KC545393

Assuming you know that each line has at least two separators, you can use:
>>> s = 'this|is|a|string'
>>> s
'this|is|a|string'
>>> s[:s.find('|',s.find('|')+1)]
'this|is'
This finds the first | starting at the character position beyond the first | (i.e., it finds the second |) then gives you the substring up but not including to that point.
If it may not have two separators, you just have to be more careful:
s = 'blah blah'
result = s
if s.find('|') >= 0:
if s.find('|',s.find('|')+1) >= 0:
result = s[:s.find('|',s.find('|')+1)]
If that's the case, you'll probably definitely want it in a more general purpose function, something like:
def substringUpToNthChar(str,n,ch):
if n < 1: return ""
pos = -1
while n > 0:
pos = str.find(ch,pos+1)
if pos < 0: return str
n -= 1
return str[:pos]
This will correctly handle the case where there's fewer separators than desired and will also handle (relatively elegantly) getting more than the first two fields.

Related

pythonic way to convert character position to terminal offset as printed

I am wondering if there is a "pythonic" way to from the a character position in a string to the terminal offset that character will be printed at (i.e including tabs).
For example take the following three strings:
$> python3
>>> print("+\tabc")
+ abc
>>> print("\tabc")
abc
>>> print(" abc")
abc
They are three difference strings, with three different character counts preceding the "abc", but the position of 'a' is different each time.
The only solution I have that works "well enough" is
def get_offset(s, c):
pos = s.find(c)
if pos == -1:
return -1
tablen = 0
ntabs = 0
for i in range(0, pos):
if line[i] == '\t':
tablen += (TABLEN - (i % TABLEN))
ntabs += 1
offset = tablen + (pos - ntabs)
return offset
I am wondering is there a more pythonic way to do this?
Not really.
Because python doesn't know about how its output will be rendered, it cannot tell you the offset resulting from tab characters. From python's point of view, a tab is a character like any other.

Recursive function how to manage output

I'm working on a project for creating some word list. I have a word and some rules, for example, this char % is for digit, while this one ^ for special character, for example January%%^ should create things like:
January00!
January01!
January02!
January03!
January04!
January05!
January06!
etc.
For now I'm trying to do it with only digit and create a recursive function, because people can add as many digits and special characters as they want
January^%%%^% (for example)
This is the first function I have created:
month = "January"
nbDigit = "%%%"
def addNumber(month : list, position: int):
for i in range(position, len(month)):
for j in range(0,10):
month[position] = j
if(position == len(month)-1):
print (''.join(str(v) for v in month))
if position < len(month):
if month[position+1] == "%":
addNumber(month, position+1)
The problem is for each % that I have there is another output (three %, three times as output January000-January999/January000-January999/January000-January999).
When I tried to add the new function special character it's even worse, because I can't manage the output since every word can't end with a special character or digit. (AddSpecialChar is also a recursive function).
I believe what you are looking for is the following:
month = 'January'
nbDigit = "%%"
def addNumbers(root: str, mask: str)-> list:
# create a list of words using root followed By digits
rslt = []
mxNmb = 0
for i in range(len(mask)):
mxNmb += 9 * 10**i
mxNmb += 1
for i in range(mxNmb):
word = f"{root}{((str(i).rjust(len(mask), '0')))}"
rslt.append(word)
return rslt
this will produce:
['January00',
'January01',
'January02',
'January03',
'January04',
'January05',
'January06',
'January07',
'January08',
'January09',
'January10',
'January11',
'January12',
'January13',
'January14',
'January15',
'January16',
'January17',
'January18',
'January19',
'January20',
'January21',
'January22',
'January23',
'January24',
'January25',
'January26',
'January27',
'January28',
'January29',
'January30',
'January31',
'January32',
'January33',
'January34',
'January35',
'January36',
'January37',
'January38',
'January39',
'January40',
'January41',
'January42',
'January43',
'January44',
'January45',
'January46',
'January47',
'January48',
'January49',
'January50',
'January51',
'January52',
'January53',
'January54',
'January55',
'January56',
'January57',
'January58',
'January59',
'January60',
'January61',
'January62',
'January63',
'January64',
'January65',
'January66',
'January67',
'January68',
'January69',
'January70',
'January71',
'January72',
'January73',
'January74',
'January75',
'January76',
'January77',
'January78',
'January79',
'January80',
'January81',
'January82',
'January83',
'January84',
'January85',
'January86',
'January87',
'January88',
'January89',
'January90',
'January91',
'January92',
'January93',
'January94',
'January95',
'January96',
'January97',
'January98',
'January99']
Adding another position to the nbDigit variable will produce the numeric sequence from 000 to 999

How to extract only the digits and print as a string

Here I want to extract 011700 (these are 6 digit codes) which I want to extract without the semi-colon and later I will use a dict for a value against it.
How do I extract only 011700 (or 6 digit number from that line)?
And how to print it as a 6 digit number - instead of printing it like ['011700']?
Thanks.
import re
line = "N 011700; 3;20:34:00:02:ac:07:e9:d5;2f:f7:00:02:ac:07:e9:d5; 3333"
line_list = line.split()
print(line_list)
result = (re.findall('\\d+', line))
print(result)
Here's how I would go about modifying your current code.
First, I would specify that you are trying to split the string by semicolons, by changing your split line to:
line_list = line.split(";")
Then I would trim off any whitespace, which you could do with a second line like:
line_list = [l.strip() for l in line_list]
(or by combining them like)
line_list = [l.strip() for l in line.split(";")]
Then I would simply loop through the list like so:
for l in line_list:
if len(l) == 6:
result = l
break
And if you want the result to be the actual number and not just a string of the number, change the line to:
result = int(l)
Altogether that would look like this:
line = "N 011700; 3;20:34:00:02:ac:07:e9:d5;2f:f7:00:02:ac:07:e9:d5; 3333"
line_list = [l.strip() for l in line.split(";")]
for l in line_list:
if len(l) == 6:
result = int(l)
break
print(line_list)
print(result)
Result now contains the string of the first six-digit number found in your original string.

How to split string by odd length

Lets say with a string = "AABBAAAAABBBBAAABBBBAA"
I want to return string split by the odd lengths of the string (i.e when A = 5 or A = 3),
What I want returned is 1) AABBAAAAA 2)BBBBAAA 3)BBBBAA,
How can I do that?
I tried using regex [A]+[B]+ for a slightly different case
One option might be to regex iterate using re.finditer with the following pattern:
.*?(?:AAA(?:AA)?|$)
This pattern will non greedily consume until reaching either 3 A's, 5 A's, or the end of the string. Then, we can print out each complete match as we iterate.
input = 'AABBAAAAABBBBAAABBBBAA'
pattern = '.*?(?:AAA(?:AA)?|$)'
for match in re.finditer(pattern, input):
print match.group()
This prints:
AABBAAAAA
BBBBAAA
BBBBAA
You can use itertools.groupby:
s = 'BBAAAAABBBBAAABBBBAA'
from itertools import groupby
out = ['']
for v, g in groupby(s):
l = [*g]
out[-1] += ''.join(l)
if v == 'A' and len(l) in (3, 5):
out.append('')
print(out)
Prints:
['BBAAAAA', 'BBBBAAA', 'BBBBAA']

Python Join String to Produce Combinations For All Words in String

If my string is this: 'this is a string', how can I produce all possible combinations by joining each word with its neighboring word?
What this output would look like:
this is a string
thisis a string
thisisa string
thisisastring
thisis astring
this isa string
this isastring
this is astring
What I have tried:
s = 'this is a string'.split()
for i, l in enumerate(s):
''.join(s[0:i])+' '.join(s[i:])
This produces:
'this is a string'
'thisis a string'
'thisisa string'
'thisisastring'
I realize I need to change the s[0:i] part because it's statically anchored at 0 but I don't know how to move to the next word is while still including this in the output.
A simpler (and 3x faster than the accepted answer) way to use itertools product:
s = 'this is a string'
s2 = s.replace('%', '%%').replace(' ', '%s')
for i in itertools.product((' ', ''), repeat=s.count(' ')):
print(s2 % i)
You can also use itertools.product():
import itertools
s = 'this is a string'
words = s.split()
for t in itertools.product(range(len('01')), repeat=len(words)-1):
print(''.join([words[i]+t[i]*' ' for i in range(len(t))])+words[-1])
Well, it took me a little longer than I expected... this is actually tricker than I thought :)
The main idea:
The number of spaces when you split the string is the length or the split array - 1. In our example there are 3 spaces:
'this is a string'
^ ^ ^
We'll take a binary representation of all the options to have/not have either one of the spaces, so in our case it'll be:
000
001
011
100
101
...
and for each option we'll generate the sentence respectively, where 111 represents all 3 spaces: 'this is a string' and 000 represents no-space at all: 'thisisastring'
def binaries(n):
res = []
for x in range(n ** 2 - 1):
tmp = bin(x)
res.append(tmp.replace('0b', '').zfill(n))
return res
def generate(arr, bins):
res = []
for bin in bins:
tmp = arr[0]
i = 1
for digit in list(bin):
if digit == '1':
tmp = tmp + " " + arr[i]
else:
tmp = tmp + arr[i]
i += 1
res.append(tmp)
return res
def combinations(string):
s = string.split(' ')
bins = binaries(len(s) - 1)
res = generate(s, bins)
return res
print combinations('this is a string')
# ['thisisastring', 'thisisa string', 'thisis astring', 'thisis a string', 'this isastring', 'this isa string', 'this is astring', 'this is a string']
UPDATE:
I now see that Amadan thought of the same idea - kudos for being quicker than me to think about! Great minds think alike ;)
The easiest is to do it recursively.
Terminating condition: Schrödinger join of a single element list is that word.
Recurring condition: say that L is the Schrödinger join of all the words but the first. Then the Schrödinger join of the list consists of all elements from L with the first word directly prepended, and all elements from L with the first word prepended with an intervening space.
(Assuming you are missing thisis astring by accident. If it is deliberately, I am sure I have no idea what the question is :P )
Another, non-recursive way you can do it is to enumerate all numbers from 0 to 2^(number of words - 1) - 1, then use the binary representation of each number as a selector whether or not a space needs to be present. So, for example, the abovementioned thisis astring corresponds to 0b010, for "nospace, space, nospace".

Resources