Remove characters from string (DataFrame) - python-3.x

How do I remove extra characters with REGEX in this string code snippet below.
From This : Fulham\n3.20\nDraw\n3.25\nSouthampton\n2.25\n
To Desired Outcome: 3.20\n\n3.25\n\n2.25
Note: I've tried with this regex -> ([^\d.\n]) but it leaves unwanted 'n' in team name if applicable.
([^\d\.\\n])
Fulham\n3.20\nDraw\n3.25\nSouthampton\n2.25\n

Try this:
s = "Fulham\n3.20\nDraw\n3.25\nSouthampton\n2.25\n"
"\n\n".join(i for i in s.split() if re.search(r"\d", i))
Output:
'3.20\n\n3.25\n\n2.25'

You can also use str.replace.
df['column_name'].str.replace(r'[a-zA-Z]','')
If you don't need the trailing and leading \n you can then use strip('\n')

Related

Python String cleanup from spaces and Unknown empty element

I got attribute from Selenium Element and it contains empty char or spaces:
When I double click the result :
In VS code:
What I tried so far :
string.replace(" ","") #didnt work
So I came with this resolution (I know its bad ):
edit1 = ticketID[:1]
ticketF = ticketID.replace(edit1,"")
edit2 = ticketF[:1]
ticketE = ticketF.replace(edit2,"")
edit3 = ticketE[:1]
ticketD = ticketE.replace(edit3,"")
What Im looking for is what is those blanks ? tabs ? new lines ?
how to make it better ?
Edit:
ticketID.replace("\n","")
ticketID.replace(" ","")
ticketID.strip()
Those are basically whitespaces, Please use .strip() for any trailing spaces.
In Python, the stripping methods are capable of removing leading and trailing spaces and specific characters. The leading and trailing spaces include blanks, tabs (\t), carriage returns (\r, \n), and the other lesser-known whitespace characters.
If you have ele as an web element.
You can use .text to get the text and then on top of that use .strip()
Probably in your code :
ticketID.text.strip()
They look like lines and not spaces to me.
string.replace("\n","")

Strip characters to the left of a specific character in a pandas column

I have the following data:
key German
0 0:- Profile 1
1 1:- Archetype Realist*in
2 2:- RIASEC Code: R- Realistic
3 3:- Subline Deine Stärke? Du bleibst dir selber treu.
4 4:- Copy Dein Erfolg basiert auf deiner praktischen Ver...
In the "Key" column I would like to remove the numbers and colon dash which follows. This order is always the same (from the left). So for the first row I would like to remove "0:- ", and just leave "Profile 1". I am struggling to find the correct regex expression to do what I want. Originally I tried the following:
df_json['key'] = df_json['key'].map(lambda x: x.strip(':- ')[1])
However, this approach is too restrictive since there can be multiple words in the field.
I would like to use pd.Series.str.replace(), but I cant figure out the correct regex expression to achieve the desired results. Any help would be greatly appreciated.
With your shown samples, please try following. Using replace function of Pandas here. Simple explanation would be, apply replace function of Pandas to German column of dataframe and then use regex ^[0-9]+:-\s+ to replace values with NULL.
df['German'].replace('(^[0-9]+:-\s+)','', regex=True)
Explanation:
^[0-9]+: match starting digits followed by colon here.
:-\s+: Match colon, followed by - followed by 1 or more space occurrences.
What about just using pandas.Series.str.partition instead of regular expressions:
df['German'] = df['German'].str.partition()[2]
This would split the series on the 1st space only and grab the trailing part. Alternatively to partition you could also just split:
df['German'] = df['German'].str.split(' ', 1).str[1]
If regex is a must for you, maybe use a lazy quantifier to match upto the 1st space character:
df['German'] = df['German'].replace('^.*? +','', regex=True)
Where:
^ - Start line anchor.
.*? - Any 0+ (lazy) characters other than newline upto;
+ - 1+ literal space characters.
Here is an online demo
You need
df_json['key'] = df_json['key'].str.replace(r'^\d+:-\s*', '', regex=True)
See the regex demo and the regex graph:
Details:
^ - start of string
\d+ - one or more digits
: - a colon
- - a hyphen
\s* - zero or more whitespaces
Extract any non white Space \S and Non Digits \D which are immediately to the left of unwanted characters
df['GermanFiltered']=df['German'].str.extract("((?<=^\d\:\-\s)\S+\D+)")

How to replace part of a string with an added condition

The problem:
The objective is to convert: "tan(x)*arctan(x)"
Into: "np.tan(x)*np.arctan(x)"
What I've tried:
s = "tan(x)*arctan(x)"
s = s.replace('tan','np.tan')
Out: np.tan(x)*arcnp.tan(x)
However, using pythons replace method resulted in arcnp.tan.
Taking one additional step:
s = s.replace('arcnp.', 'np.arc')
Out: np.tan(x)*np.arctan(x)
Achieves the desired result... but this solution is sloppy and inefficient.
Is there a more efficient solution to this problem?
Any help is appreciated. Thanks in advance.
Here is a way to do the job:
var string = 'tan(x)*arctan(x)';
var res = string.replace(/\b(?:arc)?tan\b/g,'np.$&');
console.log(res);
Explanation:
/ : regex delimiter
\b : word boundary, make sure we don't have any word character before
(?:arc)? : non capture group, literally 'arc', optional
tan : literally 'tan'
\b : word boundary, make sure we don't have any word character after
/g : regex delimiter, global flag
Replace:
$& : means the whole match, ie. tan or arctan
You can use regular expression to solve your issue. Following code is in javascript. Since, u didn't mention the language you are using.
var string = 'tan(x)*arctan(x)*xxxtan(x)';
console.log(string.replace(/([a-z]+)?(tan)/g,'np.$1$2'));

rstrip() has no effect on string

Trying to use rstrip() at its most basic level, but it does not seem to have any effect at all.
For example:
string1='text&moretext'
string2=string1.rstrip('&')
print(string2)
Desired Result:
text
Actual Result:
text&moretext
Using Python 3, PyScripter
What am I missing?
someString.rstrip(c) removes all occurences of c at the end of the string. Thus, for example
'text&&&&'.rstrip('&') = 'text'
Perhaps you want
'&'.join(string1.split('&')[:-1])
This splits the string on the delimiter "&" into a list of strings, removes the last one, and joins them again, using the delimiter "&". Thus, for example
'&'.join('Hello&World'.split('&')[:-1]) = 'Hello'
'&'.join('Hello&Python&World'.split('&')[:-1]) = 'Hello&Python'

Is there a way to add quotes to a multi paragraph string

I wrote the following line:
string QuoteTest2 = "Benjamin Netnayahu,\"BB\", said that: \"Israel will not fall\"";
This example went well, but what can I do in case I want to write a multi paragraph string including quotes?
The following example shows that puting '#' before the doesn't cut it..
string QuoteTest2 = #"Benjamin Netnayahu,\"BB\", said that: \"Israel will not fall\"";
The string ends and the second quote and the over just gives me errors, what should I do?
Use double quotes to escape ""
e.g.
string QuoteTest2 = #"Benjamin Netnayahu,""BB"", said that: ""Israel will not fall""";

Resources