I want to delete numbers contained in the text with Python. But I don´t want to delete the numbers that are part of a string.
For example,
strings = [ "There are 55 cars in 45trt avenue"
In this case 55 should be deleted, but not 45trt, that should remain the same
Thanks in advance,
You could try searching for numbers which are surrounded by word boundaries:
inp = "There are 55 cars in 45trt avenue"
output = re.sub(r'\s*\b\d+\b\s*', ' ', inp).strip()
print(output)
This prints:
There are cars in 45trt avenue
The logic here is to actually replace with a single space, to ensure that the resulting string is still properly spaced. This opens an edge case for numbers which might happen to appear at the very beginning or end, leaving behind an extra space. To handle that, we trim using strip().
Related
First, I wish to extract the last word and first word for the Description column (this column contains at least 3 words) into a newly created column firstword and lastword. However, the word() function is not applied to all the rows. As such, there are many rows with empty lastword, though these rows actually have a last word (as you can see from the Description column). This is shown in the first two lines of codes.
Second, I am also trying to get the third line of code to replace the lastword with firstword, if lastword is empty. However it isn't working.
Is there a way to rectify this?
c1$lastword = word(c1$Description,start=-1) #extract last word
c1$firstword = word(c1$Description,start=1) #extract first word
c1$lastword=ifelse(c1$lastword == " ", c1$firstword, c1$lastword)
I realise that there is white space at the beginning of some of the rows of the Description variable, which isn't shown when viewed in R.
Removing the whitespace using stri_trim() solved the issue.
c1$Description = stri_trim(c1$Description, "left") #remove whitespace
Note: this is SSIS not sql server
I am pulling data from a file and some columns have names like this:
1;&count chocula
13;&roger ramjet
123;&mary smith
45678;&john adams
How do I remove the ampersand and everything to the left of it?
I am using the fx transformation for the character.
I thought about finding the character position for the ampersand and then deleting everthing from start to that position but ssis does not have that function. The ampersand can be at any position, I cannot say it is guaranteed to be in position such and such.
Thanks
The RIGHT() function retrieves the last X characters of a string.
RIGHT("13;&roger ramjet",12) = roger ramjet
Above, X equals 12. Of course, twelve won't work for every string. Instead we can calculate X by subtracting the string length from the position of the ampersand.
LEN(MyColumn]) = 16
FINDSTRING([MyColumn],"&",1) = 4
Or put another way...
RIGHT([MyColumn], LEN([MyColumn]) - FINDSTRING([MyColumn],"&",1)) = roger ramjet
I have a string variable with short text strings. I want to replace all the text strings with numbers based on key words contained inside the individual cells.
Example: Some cells states "I like cats", while others "I dont like the smell of wet dog".
I want to assign the value 1 to all cells containing the word cat, and the number 2 to all cells containing the word dog.
How do I do this?
This will put 1 in NewVar when "cat" appears in OldVar, 2 for "dog", 3 for "mouse":
do repeat wrd="cat" "dog" "mouse"/val= 1 2 3.
if index(OldVar, wrd)>0 NewVar=val.
end repeat.
This is only good if there will never be a cat AND a dog in the same sentence. If you do have such cases you should go this way:
do repeat wrd="cat" "dog" "mouse"/NewVar=cat dog mouse.
compute NewVar=char.index(OldVar, wrd)>0.
end repeat.
This will create a new variable for each of the possible words, putting 1 in cases where the word appears in OldVar, 0 when it doesn't.
Apparently you have to open a syntax window and enter this command:
COMPUTE newvar=CHAR.INDEX(UPCASE(VAR1),"ABCD")>0
newvar is the name of the new variable.
VAR1 is the name of the variable to be searched.
ABCD is the text to be searched for. NOTE: This must be in CAPITAL letters.
newvar will recieve a value of 1 if the text is found.
I am interested in removing leading alphabetical (alpha) characters from cells which appear in a column. I only wish to remove the leading alpha characters (including UPPER and LOWER case): if alpha characters appear after a number they should be kept. Some cells in the column might not have leading alpha characters.
Here is an example of what I have:
36173
PIL51014
4UNV22001
ZEB54010
BICMPAFG11BK
BICMPF11
Notice how there are not always the same number of leading alpha characters. I cannot simply use a Left or Right function in Excel, because the number of characters I wish to keep and remove varies.
A correct output for what I am looking for would look like:
36173
51014
4UNV22001
54010
11BK
11
Notice how the second to last row preserved the characters "BK", and the 3rd row preserved "UNV". I cannot simply remove all alpha characters.
I am a beginner with visual basic and was not able to figure out how to use excel functions to address my issue. How would I do this?
Here is an Excel formula that will "strip off the leading alpha characters" Actually, it looks for the first numeric character, and returns everything after that:
=MID(A1,MIN(FIND({0;1;2;3;4;5;6;7;8;9},A1&"0123456789")),99)
The 99 at the end needs to be some value longer than the longest string you might be processing. 99 usually works.
Here's a formula based solution complete with test results:
=MID(A1,MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"),255),100)
Change the 100 at the end if any string may be longer than 100 characters. Also the 255 is not needed, but it won't hurt.
This short UDF should strip off leading alphabetic characters.
Function noLeadAlpha(str As String)
If Not IsNumeric(str) Then
Do While Asc(str) < 48 Or Asc(str) > 57
str = Mid(str, 2)
If Not CBool(Len(str)) Then Exit Do
Loop
End If
noLeadAlpha = str
End Function
Koodos Jeeped, you beat me to it.
But here is an alternative anyway:
Function RemoveAlpha(aString As String) As String
For i = 1 To Len(aString)
Select Case Mid(aString, i, 1)
Case "0" To "9"
RemoveAlpha = Right(aString, Len(aString) - i + 1): Exit For
End Select
Next i
End Function
Is there a way of returning part of a string between certain characters in excel? For example my string looks like this:
`switchrefid` = {switchrefid: }
I need to cut the part of the string between the ' (apostrophes) so it just returns switchrefid
I'm sure there must be a formula for this i just cant think of the one to use.
Thanks in advance.
As long as the ``` characters occur exactly twice in your data, you can do:
=LEFT(RIGHT(A1, LEN(A1)-FIND("`", A1)), FIND("`",RIGHT(A1, LEN(A1)-FIND("`", A1)))-1)
Although it is pretty horrible!
(Edit: this assumes your data is in A1, of course.)
Just two more options, if the word always starts at the second Character and ends just before the last you could simply use :
=MID(A1,2,LEN(A1)-2) ' Minus 2 for the 2 ticks
And the second option would be to substitute the tick with nothing like so:
=SUBSTITUTE(A1,"`","")
With the substitute is also supports a number of substitutes. So if you had `switchrefid`` for some reason and only want to get rid of 2 of the three ticks you could use:
=SUBSTITUTE(A1,"`","",2)
and this would return switchrefid`
although it would not work for ''switchrefid' as it would STILL return switchrefid' because it only removes the 1st 2 instances of the text to remove