Is there a function to remove alphanumeric words given a sentence using python - python-3.x

Given a sentence "hi I stay at 4th cross street and my ssn number is 56tyuh". I want to remove words such as alphanumeric ( 4th and 56tyuh ). Does isalpha() is used only to check if there are alphanumerics in sentences? If not how do I use it to remove alphanumerics

You'll need to use regex for this. Regex can be confusing but in this case, it's quite straight forward.
import re
s = 'hi I stay at 4th cross street and my ssn number is 56tyuh'
r = r'\S*\d+\S*'
cut_string = re.sub(r, '', s)
Let's break this down:
r is a regex variable, which detects character sequences of 0-n leading non-whitespace characters, followed by 1-n numeric charcters and again 0-n trailing non-whitespace characters.
re.sub replaces the matches of our regex with the second parameter, in our case an empty string. Thus it removes all matches of our regex from the string.
Edit:
This will also remove numbers. If you only want to remove alphanumeric words, make the follwing change:
r = r'([a-zA-Z]*\d+[a-zA-Z]+|[a-zA-Z]+\d+[a-zA-Z]*)'
Note the | in the center of the variable. This means either match the first part within the parentheses or the second. The first would match 4th but not ep95, the opposite is true for the second.

Related

Postgresql, regex pattern match in "where"

One table column is a string that contains multiple substrings separated by delimiter character the pipe char (|), like this "aa-a|aa-a-a|a-a|aa", the delimiter character cannot be the leading and ending char of the column string. And the match check is in "where", when match then the row is selected. Actually it's a search, pass in substring such as "aa-a" and search for all rows that has the "aa-a" as a full substring, the "aa-a-a" should not be a match. Another case also need to be considered when there is only one substring and no delimiter. Something like this:
Select * from tb where REGEX_FUNC(tb.col_1, "pattern")>0
in which the "pattern" might be like "^aa-a$" (1) what the REGEX_FUNC should be, need to create my own? (2) what the "pattern" should be?
No need for a regex match.
You can convert the delimited value into an array, then check the array if it matches your comparison value:
where 'aa-a' = any(string_to_array(col1, '|'))

Format in Python

I have a list of values as follows:
no column
1. 111-222-11
2. 112-333-12
3. 113-444-13
I want to format the value from 111-222-11 to 111-222-011 and format the other values similarly. Here is my code snippet in Python 3, which I am trying to use for that:
‘{:03}-{:06}-{:03}.format(column)
I hope that you can help.
Assuming that column is a variable that can be assigned string values 111-222-11, 112-333-12, 113-444-13 and so on, which you want to change to 111-222-011, 112-333-012, 113-444-013 and so on, it appears that you tried to use a combination of slice notation and format method to achieve this.
Slice notation
Slice notation, when applied to a string, treats it as a list-like object consisting of characters. The positional index of a character from the beginning of the string starts from zero. The positional index of a character from the end of the string starts with -1. The first colon : separates the beginning and the end of a slice. The end of the slice is not included into it, unlike its beginning. You indicate slices as you would indicate indexes of items in a list by using square brackets:
'111-222-11'[0:8]
would return
'111-222-'
Usually, the indexes of the first and the last characters of the string are skipped and implied by the colon.
Knowing the exact position where you need to add a leading zero before the last two digits of a string assigned to column, you could do it just with slice notation:
column[:8] + '0' + column[-2:]
format method
The format method is a string formatting method. So, you want to use single quotes or double quotes around your strings to indicate them when applying that method to them:
'your output string here'.format('your input string here')
The numbers in the curly brackets are not slices. They are placeholders, where the strings, which are passed to the format method, are inserted. So, combining slices and format method, you could add a leading zero before the last two digits of a column string like this:
'{0}0{1}'.format(column[:8], column[-2:])
Making more slices is not necessary because there is only one place where you want to insert a character.
split method
An alternative to slicing would be using split method to split the string by a delimiter. The split method returns a list of strings. You need to prefix it with * operator to unpack the arguments from the list before passing them to the format method. Otherwise, the whole list will be passed to the first placeholder.
'{0}-{1}-0{2}'.format(*column.split('-'))
It splits the string into a list treating - as the separator and puts each item into a new string, which adds 0 character before the last one.

Remove all text and characters except some

I have here some text strings
"16cg-301 -request","16cg-3368 - for review","16cg-3684 - for process"
what i would like to do is to remove all the text and characters except the number and the letters "cg" and - which is within the reference code.
If the string you want to extract is always before the first space in the full string then you can use SEARCH and LEFT to extract your reference code:
=LEFT(A1,SEARCH(" ",A1)-1)
This formula would take 16cg-3368 from 16cg-3368 - for review.
I suggest using something like suggested here
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
With a replace regex similar to this
[^\dcg]*
or a match regex like this
^([0-9cg- ]+).*
else you could also work with a strange formule similar to this
=CONCATENATE(IF(NOT(ISERROR(SEARCH(MID(A2;1;1);"01234567890cg-")>0));MID(A2;1;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;2;1);"01234567890cg-")>0));MID(A2;2;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;3;1);"01234567890cg-")>0));MID(A2;3;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;4;1);"01234567890cg-")>0));MID(A2;4;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;5;1);"01234567890cg-")>0));MID(A2;5;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;6;1);"01234567890cg-")>0));MID(A2;6;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;7;1);"01234567890cg-")>0));MID(A2;7;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;8;1);"01234567890cg-")>0));MID(A2;8;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;9;1);"01234567890cg-")>0));MID(A2;9;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;10;1);"01234567890cg-")>0));MID(A2;10;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;11;1);"01234567890cg-")>0));MID(A2;11;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;12;1);"01234567890cg-")>0));MID(A2;12;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;13;1);"01234567890cg-")>0));MID(A2;13;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;14;1);"01234567890cg-")>0));MID(A2;14;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;15;1);"01234567890cg-")>0));MID(A2;15;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;16;1);"01234567890cg-")>0));MID(A2;16;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;17;1);"01234567890cg-")>0));MID(A2;17;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;18;1);"01234567890cg-")>0));MID(A2;18;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;19;1);"01234567890cg-")>0));MID(A2;19;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;20;1);"01234567890cg-")>0));MID(A2;20;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;21;1);"01234567890cg-")>0));MID(A2;21;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;22;1);"01234567890cg-")>0));MID(A2;22;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;23;1);"01234567890cg-")>0));MID(A2;23;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;24;1);"01234567890cg-")>0));MID(A2;24;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;25;1);"01234567890cg-")>0));MID(A2;25;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;26;1);"01234567890cg-")>0));MID(A2;26;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;27;1);"01234567890cg-")>0));MID(A2;27;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;28;1);"01234567890cg-")>0));MID(A2;28;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;29;1);"01234567890cg-")>0));MID(A2;29;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;30;1);"01234567890cg-")>0));MID(A2;30;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;31;1);"01234567890cg-")>0));MID(A2;31;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;32;1);"01234567890cg-")>0));MID(A2;32;1);""))
only works by now for less than 33 signs.
problem here will be that you will get unexpected behavior like this:
123cg-123 - Process => 123cg-123-c
after rereading , I think you should try an other approach than described in the question ;-)
If you want to return everything up to and including the last digit, then try:
=LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,seq,1)),seq))
seq is a named formula: Formula ► Define Name
Name: seq
Refers to: =ROW(INDEX($1:$65535,1,1):INDEX($1:$65535,255,1))
seq returns an array of sequential numbers from 1 to 255.
mid(a1,seq,1)
returns an array consisting of the individual characters in the string in A1. The leading minus sign converts the digits from strings to numbers.
The lookup function will then return the position of the last digit

Return the characters after Nth character in a string

I need help! Can someone please let me know how to return the characters after the nth character?
For example, the strings I have is "001 baseball" and "002 golf", I want my code to return baseball and golf, not the number part. Since the word after the number is not always the same length, I cannot use = Right(String, n)
Any help will be greatly appreciated
If your numbers are always 4 digits long:
=RIGHT(A1,LEN(A1)-5) //'0001 Baseball' returns Baseball
If the numbers are variable (i.e. could be more or less than 4 digits) then:
=RIGHT(A1,LEN(A1)-FIND(" ",A1,1)) //'123456 Baseball’ returns Baseball
Mid(strYourString, 4) (i.e. without the optional length argument) will return the substring starting from the 4th character and going to the end of the string.
Alternately, you could do a Text to Columns with space as the delimiter.
Since there is the [vba] tag, split is also easy:
str1 = "001 baseball"
str2 = Split(str1)
Then use str2(1).
Another formula option is to use REPLACE function to replace the first n characters with nothing, e.g. if n = 4
=REPLACE(A1,1,4,"")

concatenating unknown-length strings in COBOL

How do I concatenate together two strings, of unknown length, in COBOL? So for example:
WORKING-STORAGE.
FIRST-NAME PIC X(15) VALUE SPACES.
LAST-NAME PIC X(15) VALUE SPACES.
FULL-NAME PIC X(31) VALUE SPACES.
If FIRST-NAME = 'JOHN ' and LAST-NAME = 'DOE ', how can I get:
FULL-NAME = 'JOHN DOE '
as opposed to:
FULL-NAME = 'JOHN DOE '
I believe the following will give you what you desire.
STRING
FIRST-NAME DELIMITED BY " ",
" ",
LAST-NAME DELIMITED BY SIZE
INTO FULL-NAME.
At first glance, the solution is to use reference modification to STRING together the two strings, including the space. The problem is that you must know how many trailing spaces are present in FIRST-NAME, otherwise you'll produce something like 'JOHNbbbbbbbbbbbbDOE', where b is a space.
There's no intrinsic COBOL function to determine the number of trailing spaces in a string, but there is one to determine the number of leading spaces in a string. Therefore, the fastest way, as far as I can tell, is to reverse the first name, find the number of leading spaces, and use reference modification to string together the first and last names.
You'll have to add these fields to working storage:
WORK-FIELD PIC X(15) VALUE SPACES.
TRAILING-SPACES PIC 9(3) VALUE ZERO.
FIELD-LENGTH PIC 9(3) VALUE ZERO.
Reverse the FIRST-NAME
MOVE FUNCTION REVERSE (FIRST-NAME) TO WORK-FIELD.
WORK-FIELD now contains leading spaces, instead of trailing spaces.
Find the number of trailing spaces in FIRST-NAME
INSPECT WORK-FIELD TALLYING TRAILING-SPACES FOR LEADING SPACES.
TRAILING-SPACE now contains the number of trailing spaces in FIRST-NAME.
Find the length of the FIRST-NAME field
COMPUTE FIELD-LENGTH = FUNCTION LENGTH (FIRST-NAME).
Concatenate the two strings together.
STRING FIRST-NAME (1:FIELD-LENGTH – TRAILING-SPACES) “ “ LAST-NAME DELIMITED BY SIZE, INTO FULL-NAME.
You could try making a loop for to get the real length.

Resources