How to find text between two markers - python-3.x

I've got the below string and want to find and then join together everything in between '^[#n' and '$,s)'. these two markers may occur more than one time in our string. thanks in advance for your help.
string="O8d:^[#nI just $,s):<#Rh9f^[#n don't know $,s)>jwU*/#'^[#nhow to write this code $,s){<3f9(f3#"
for example,here the output is:
I just don't know how to write this code

You can use re.findall with a lazy quantifier:
print("".join(re.findall(r'\^\[#n(.*?)\$,s\)', string)))

Related

How to get a substring with Regex in Python

I am trying to formnulate a regex to get the ids from the below two strings examples:
/drugs/2/drug-19904-5106/magnesium-oxide-tablet/details
/drugs/2/drug-19906/magnesium-moxide-tablet/details
In the first case, I should get 19904-5106 and in the second case 19906.
So far I tried several, the closes I could get is [drugs/2/drug]-.*\d but would return g-19904-5106 and g-19907.
Please any help to get ride of the "g-"?
Thank you in advance.
When writing a regex expression, consider the patterns you see so that you can align it correctly. For example, if you know that your desired IDs always appear in something resembling ABCD-1234-5678 where 1234-5678 is the ID you want, then you can use that. If you also know that your IDs are always digits, then you can refine the search even more
For your example, using a regex string like
.+?-(\d+(?:-\d+)*)
should do the trick. In a python script that would look something like the following:
match = re.search(r'.+?-(\d+(?:-\d+)*)', my_string)
if match:
my_id = match.group(1)
The pattern may vary depending on the depth and complexity of your examples, but that works for both of the ones you provided
This is the closest I could find: \d+|.\d+-.\d+

How do I split individual character within a list that is within a list?

I am working on a project in Python, and I stumbled across this hindrance.
I have something like this:
[['abcde'],['bcdef']]
How do I make it such that it gives me this? :
[['a','b','c','d','e'],['b','c','d','e','f']]
Thank you for helping
It would be better if you start reading docs
[list(string) for string in given_list]
or otherwise
list(map(list,given_list))

With data.table, return between certain characters into a new column

I have a feeling this might be a simple question, but I've searched through SO for a bit now and found many interesting related Q/A, I'm still stumped.
Here's what I need to learn (in honesty, I'm playing with the kaggle Titanic dataset, but I want to use data.table)...
Let's say you have the following data.table:
dt <- data.table(name=c("Johnston, Mr. Bob", "Stone, Mrs. Mary", "Hasberg, Mr. Jason"))
I want my output to be JUST the titles "Mr.", "Mrs.", and "Mr." -- heck we can leave out the period as well.
I've been playing around (all night) and discovered that using regular expressions might hold the answer, but I've only been able to get that to work on a single string, not with the whole data.table.
For example,
substr(dt$name[1], gregexpr(",.", dt$name[1]), gregexpr("[.]", dt$name[1]))
Returns:
[1] ", Mr."
Which is cool, and I can do some further processing to get rid of the ", " and ".", but, the optimist(/optimizer) in me feels that that's ugly, gross, and inefficent.
Besides, even if I wanted to settle on that, (it pains me to admit) I don't know how to apply that into the J of data.table....
So, how do I add a column to dt called "Title", that contains:
[1] "Mr"
[2] "Mrs"
[3] "Mr"
I firmly believe that if I'm able to use regular expressions to select and extract data within a data.table that I will probably use this 100x a day. So thank you in advance for helping me figure out this pivotal technique.
PS. I'm an excel refugee, in excel I would just do this:
=mid(data, find(", ", data), find(".", data))
Umm.. I may have figured it out:
dt[, Title:=sub(".*?, (.*?)[.].*", "\\1", name)]
But I'm going to leave this here in case anyone else needs help, or perhaps there's an even better way of doing this!
You can use the stringr package
library(stringr)
str_extract(dt$name, "M.+\\.")
[1] "Mr." "Mrs." "Mr."
Different variations on the regular expression will let you extract other titles, like Dr., Master, or Reverend which may also be of interest to you.
To get all characters between "," and "." (inclusive) you can use
str_extract(dt$name, ",.+\\.")
and then remove the first and last characters of the result with str_sub (also from stringr package).
But as I think about it more, I might use grepl to create indicator variables for all the different titles that are in the Titanic dataset. For example
dr_ind <- grepl("Dr|Doctor", dt$name)
titled_ind <- grepl("Count|Countess|Baron", dt$name)
etc.

Excel Len function

I have a probleem in excel and i didnt figure it out how to do it right. anyway i'll give you an example to explain it correctly.
Name Surname Code
Martin Kara Maar4 (=> First two letters from Martin and 2nd en 3rd letters from surname and code is 4 and that's the lenght of the surname.
the probleem is i wanted to see the Maar4 as Maar04 not as Maar4.
I have already checked the formatcell but still didnt find it.
this is the code i wrote it:
=UPPER(LEFT(A2;2)& MID(B2;2;2)&LEN(B2))
thank you
edit: problem is solved. you guys are amazing. thank you
I'm not sure why you're using UPPER() if you expect Maar04. If you do want to get Marr04, you would use:
=LEFT(A2;2)&MID(B2;2;2)&TEXT(LEN(B2);"00")
If you want to get MAAR04, then you'd use the uppercase:
=UPPER(LEFT(A2;2)&MID(B2;2;2)&TEXT(LEN(B2);"00"))
Try using TEXT function to force 2 digits
=UPPER(LEFT(A2;2)& MID(B2;2;2))&TEXT(LEN(B2);"00")
If you're quite sure the length will never go above 99 you can try this hack:
=UPPER(LEFT(A2;2)& MID(B2;2;2)&RIGHT(LEN(B2)+100,2))
Need to use REPT() to achieve this.
See: http://office.microsoft.com/en-us/excel-help/rept-HP005209236.aspx
Your formula MIGHT look somehing like this;
=UPPER(LEFT(A2;2)& MID(B2;2;2)&REPT(0,LEN(B2)))

parsing a string that ends

I have a huge string. I need to extract a substring from that that huge string. The conditions are the string starts with either "TECHNICAL" or "JUSTIFY" or "ALIGN" and ends with a number( any number from 1 to 10) followed by period and then followed by space. so for example, I have
string x = "This is a test, again I am testing TECHNICAL: I need to extract this substring starting with testing. 8. This is test again and again and again and again.";
so I need this
TECHNICAL: I need to extract this substring starting with testing.
I was wondering if someone has elegant solution for that.
I was trying to use the regular expression, but I guess I could not figure out the right expresion.
any help will be appreciated.
Thanks in advance.
Try this: #"((?:TECHNICAL|JUSTIFY|ALIGN).*?)(?:[1-9]|10)\. "

Resources