Use split function to split recursively in python - python-3.x

I have written a piece of code which takes a keyword and splits a name using that specific keyword, however as split returns the list, now I want to individually check the elements of the returned list and again split it using a different keyword(if it exists), but this time I dont want another sublist to be returned, rather then the elements should get extended in the same list.
Code below :-
def get_comb_drugs(keyword, name):
if keyword in name:
name = name.split(keyword)
return name
print(get_comb_drugs(", polymer with", "Acetaldehyde, polymer with ammonia and formaldehyde"))
The output I get is:
['Acetaldehyde', ' ammonia and formaldehyde']
however, I want to split 'ammonia and formaldehyde' again using " and " keyword and the exact output I want is:
['Acetaldehyde', ' ammonia', 'formaldehyde']
Guide me in achieving the desired result.

You can use re.split instead with an alternation pattern:
import re
separators = [', polymer with', ' and ']
re.split('|'.join(separators), 'Acetaldehyde, polymer with ammonia and formaldehyde')
This returns:
['Acetaldehyde', ' ammonia', 'formaldehyde']

Related

Difference between defining a variable as a=[a] and a= ' ' in python

I have below two doubts
1)I am trying to build a wordcloud and for doing that am defining a variable a=[ ] but it throws error but if I define it as a='' it works well. Please tell me what is the difference between them?
I am using below two for loops but both of them show difference output whereas I expect them to show the same? What is the difference between them.
a)
allwords=[]
for i in data['Url']:
allwords+= ' '.join(i)
b) all_words = ' '.join([text for text in data['Url']])
About item A
There is an error in your code. To add an element to a list, use the append method:
allwords=[]
for i in data['Url']:
allwords.append(' '.join(i))
Use += to concatenate the string or append method to add an item to a list.
About item B
This works properly because is an attribution.

Stringify list back to list in Python 3

I have a list like string which I want to convert to a list, but so far I'm unlucky. The string is like follows:
my_string="[749385,435,'20/07/11 05:32','34035',1298,tmp_host_name,'312642',6577,tmp_guest_name,'-0.5,-1.0','2.5,3.0','9.5 ',tmp_league_name,'2' ,'0','0','0','4',' 2','0','1','0.0,-0.5','4.5','1.0',1]"
My problems are:
I can't use eval because some of the items in the list to be are not strings, so it gives me
eval(my_string)
>NameError: name 'tmp_host_name' is not defined
I can't use ast.literal_eval because again, it gives an error
ast.literal_eval(my_string)
>ValueError: malformed node or string: <_ast.Name object at 0x0000017E7DA9E488>
and I can't do it with strip and split because some of the items are like '2.5,3.0' and this is splitted as well, something I don't want
my_string.strip('][').split(',')
['749385','435',"'20/07/11 05:32'", "'34035'",'1298','tmp_host_name',"'312642'",'6577','tmp_guest_name',"'-0.5","-1.0'","'2.5","3.0'","'9.5','tmp_league_name', "'2' ","'0'","'0'","'0'","'4'","' 2'","'0'","'1'","'0.0","-0.5'","'4.5'","'1.0'",'1']
One possible route is to use my last approach and verify that every element has 2 ' characters, and if not, merge it with the following element, but I'm looking for something a little more pythonic.
newlist=list()
for el in k:
if el.startswith("'") and el.endswith("'"):newlist.append(el)
elif el.startswith("'"):
compound=el
elif el.endswith("'"):
compound+=el
newlist.append(compound)
else:newlist.append(el)
Problem is, if I do this, the resulting list loses its order and becomes useless
Thanks!

Is there a way to only list a certain format of text from a list?

I am quite new to python.
And i want to only get a certain format from a bigger list, example:
Whats in the list:
/ABC/EF213
/ABC/EF
/ABC/12AC4
/ABC/212
However the only on i want listed are the ones with this format /###/##### while the rest gets discarded
You could use a generator expression or a for loop to check each element of the list to see if it matches a pattern. One way of doing this would be to check if the item matches a regex pattern.
As an example:
import re
original_list = ["Item I don't want", "/ABC/EF213", "/ABC/EF", "/ABC/12AC4", "/ABC/212", "123/456", "another useless item", "/ABC/EF"]
filtered_list = [item for item in original_list if re.fullmatch("\/\w+\/\w+", item) is not None]
print(filtered_list)
outputs
['/ABC/EF213', '/ABC/EF', '/ABC/12AC4', '/ABC/212', '/ABC/EF']
If you need help making regex patterns, there are many great websites such as regexr which can help you
Every String can be used as a list without any conversion. If the only format you want to check is /###/##### then you can simply make if commands like these:
for text in your_list:
if len(text) == 10 and text[0] == "/" and text[4] == "/" (and so on):
print(text)
Of course this would require a lot of if statements and would take a pretty long time. So I would recomend doing a faster and simpler scan. We could perform this one by, for example, splitting the texts, which would look something like this:
for text in your_list:
checkstring = text.split("/")
Now you have your text Split in parts, and you can simply check what lengths these new parts have with the len() command.

Python: Print entire line of string match and not cut off after the period

See bottom for the solution I came up with.
Hopefully this is a easy question for you guys. Trying to match a string to a list and print just that string matched. I was successful using re, but it is cutting off the rest of the string after the period. The span per re is 0,10 and when i look at the output without using re it is 0,14 not 0,10 so match is cutting off the info after the period. So I would like to learn how to tell it to print the entire span or learn a new way to match a var string to a list and print that exact string. My original attempts printed anything with the TESTPR in it, 3 printed total, the others I do not want printing have a 1 in the front and the last match has an additional R at the end. Here is my current match code:
#OLD See below
for element in catalog:
z = re.match("((TESTPRR )\w+)", element)
if z:
print((z.group()))
Output: TESTPR 105
It should show:
Wanted output: TESTPT 105.465
It will go up to 3 decimal places after the period and no more. I am currently taking a Python class to learn Python and love it so far, but this one has me stumped as I am just now learning about re and matching by reading as we have not gotten to that yet in class.
I am open to learning a different way to search for and match a string and print just that string. For my first attempt that prints 3 results was this:
catalog = [ long list pulled from API then code here to make it a nice column]
prod = 'TESTPR'
print ([s for s in catalog if prod in s])
When I add a space at the end of prod i can get rid of the match with the extra char at the end, but I cannot add a space to do the same thing with the match that has an extra char at the front. This is for the code above and not for the re match code. Thanks!
Answer below!
Since you are interested in learning about ways to match strings and solve your problem: try fuzzywuzzy.
In your case you could try:
from fuzzywuzzy import process
catalog = [long list pulled from API then code here to make it a nice column]
prod = "TESTPR"
hit = process.extractOne(prod, catalog, score_cutoff = 75) #you can adjust this to suit how close the match should be
print(hit[0]) #hit will be sth like ("TESTPT 105.465", 75)
Output: TESTPT 105.465
For information on different ways of using fuzzywuzzy, check out this link.
You can use different ways of matching such as:
fuzz.partial_ratio
fuzz.ratio
token_sort_ratio
fuzz.token_set_ratio
for this from fuzzywuzzy import fuzz
Kept at it with re.match and got the correct regex so the entire match prints and it does not cut off numbers after the period.
my original match as you can see above was re.match("((TESTPRR )\w+)", element), some of the ( were unneeded and needed to add a few more expressions and now it prints the correct match. See above for old code and below for the new code that works.
# New code, replaced w+ with w*\d*[.,]?\d*$
for element in catalog:
z = re.match("STRING\w*\d*[.,]?\d*$", element)
if z:
print(z.group())

how use struct.pack for list of strings

I want to write a list of strings to a binary file. Suppose I have a list of strings mylist? Assume the items of the list has a '\t' at the end, except the last one has a '\n' at the end (to help me, recover the data back). Example: ['test\t', 'test1\t', 'test2\t', 'testl\n']
For a numpy ndarray, I found the following script that worked (got it from here numpy to r converter):
binfile = open('myfile.bin','wb')
for i in range(mynpdata.shape[1]):
binfile.write(struct.pack('%id' % mynpdata.shape[0], *mynpdata[:,i]))
binfile.close()
Does binfile.write automatically parses all the data if variable has * in front it (such in the *mynpdata[:,i] example above)? Would this work with a list of integers in the same way (e.g. *myIntList)?
How can I do the same with a list of string?
I tried it on a single string using (which I found somewhere on the net):
oneString = 'test'
oneStringByte = bytes(oneString,'utf-8')
struct.pack('I%ds' % (len(oneString),), len(oneString), oneString)
but I couldn't understand why is the % within 'I%ds' above replaced by (len(oneString),) instead of len(oneString) like the ndarray example AND also why is both len(oneString) and oneString passed?
Can someone help me with writing a list of string (if necessary, assuming it is written to the same binary file where I wrote out the ndarray) ?
There's no need for struct. Simply join the strings and encode them using either a specified or an assumed text encoding in order to turn them into bytes.
''.join(L).encode('utf-8')

Resources