Postgresql, regex pattern match in "where" - string

One table column is a string that contains multiple substrings separated by delimiter character the pipe char (|), like this "aa-a|aa-a-a|a-a|aa", the delimiter character cannot be the leading and ending char of the column string. And the match check is in "where", when match then the row is selected. Actually it's a search, pass in substring such as "aa-a" and search for all rows that has the "aa-a" as a full substring, the "aa-a-a" should not be a match. Another case also need to be considered when there is only one substring and no delimiter. Something like this:
Select * from tb where REGEX_FUNC(tb.col_1, "pattern")>0
in which the "pattern" might be like "^aa-a$" (1) what the REGEX_FUNC should be, need to create my own? (2) what the "pattern" should be?

No need for a regex match.
You can convert the delimited value into an array, then check the array if it matches your comparison value:
where 'aa-a' = any(string_to_array(col1, '|'))

Related

Excel Formula To Replicate Text To Column Functionality

I would like a formula in excel that does what Text To Columns does.
For example the following string in A1
" text with a comma, stays in one column",," keep starting blank text",1,2,3,"123"
Would be split into multiple cells like this...
The following LET Function allows you to split the text into columns based on the splitter character (in this instance a comma).
It ignores commas that are between quotes (the Delim argument - which has double quotes in it).
It does this by ensuring there is an even number of quotes before the splitter character.
=LET(
NOTES,"Splits a string but also checks to see if the splitter is inside a delimiter. So will ignore a comma inside quotes.",
RawString,$A1,
Splitter,",",Note2,"This is the character to split the string by",
Delim,"""",Note4,"This is the text delimiter it looks odd but it's just a double quote - change to "" if you don't want text delimitation",
IgnoreBlanks,FALSE,
CleanTextDelims,TRUE,
TrimBlanks,FALSE,
SplitString,Splitter&RawString&Splitter,Note3,"Add the splitter to the start and the end to help create the array of split positions",
StringLength,LEN(SplitString),
Seq,SEQUENCE(1,StringLength),Note5,"Get a sequence from 1 to the length of the split string",
Note6,"The below does the bulk of the work. It works out if we are at an odd or even point in terms of count of text delimiters up to the point in the sequence we are processing.",
Note7,"if we are at an even point and we have a delimiter then make a note of the sequence otherwise put a blank.",
PosArray,IF(Seq=StringLength,Seq,IF(MOD(LEN(LEFT(SplitString,Seq))-LEN(SUBSTITUTE(LEFT(SplitString,Seq),Delim,"")),2)=0,IF(MID(SplitString,Seq,1)=Splitter,Seq,""),"")),
PosArrayClean,FILTER(PosArray,PosArray<>""),Note8,"Clean blanks",
StartArray,FILTER(PosArrayClean,PosArrayClean<>StringLength),
EndArray,FILTER(PosArrayClean,PosArrayClean<>1),
StringArray,MID(SplitString,StartArray+1,EndArray-StartArray-1),
StringArrayB,IF(IgnoreBlanks,FILTER(StringArray,StringArray<>""),StringArray),
StringArrayC,IF(CleanTextDelims,IF(LEFT(StringArrayB,1)=Delim,MID(StringArrayB,2,IF(RIGHT(StringArrayB,1)=Delim,LEN(StringArrayB)-2,LEN(StringArrayB))),StringArrayB),StringArrayB),
IFERROR(IF(TrimBlanks,TRIM(StringArrayC),StringArrayC),"")
)
Breaking down each step in the LET formula:
Supply the raw string (from cell A1 in this case)
Set the splitter character - in this case a comma
Set the text delimiter - in this case double quotes (looks odd because it has to be as double double quotes - Delim,"""" )
IgnoreBlanks is an option to exclude blank cells in the output
CleanTextDelims will clean the TextDelimiter (Double quotes) from the start and end of the resultant string
Create a SplitString variable with the split character at the front and back.
Get the length of the string for ease of use
Get a sequence from 1 to the length of the string.
Get an array of the position of characters that are splitters with an even number of Text Delimiters to the left of that position in the string the posArray (splitter position array).
Clean the blanks to get the posArrayClean
Create a start and end array (start array ignores the last and end array ignores the first item in the PosArrayClean)
Get the array of strings/cells to output.
If the IgnoreBlanks is used then igore blank cells
If the CleanTextDelims option is set then strip off the Text Delim (double quotes) from the start and end of the resultant string.
If the TrimBlanks option is set then trim blank spaces off the start and end of the resulting strings.
Hopefully the notes explain clearly how this works and make it easy to modify.
If you want create a named Lambda to use you can use the following code to paste into the formula of a named range called SplitStringDelim (you can name it what you like of course). NB You can't have the line separators in this and I stripped the notes out of it.
=LAMBDA(StringRaw,SplitChar,DelimChar,IgnoreBlank,CleanTextDelim,TrimBlank, LET( RawString,StringRaw, Splitter,SplitChar, Delim,DelimChar, IgnoreBlanks,IgnoreBlank, CleanTextDelims,CleanTextDelim, TrimBlanks,TrimBlank, SplitString,Splitter&RawString&Splitter, StringLength,LEN(SplitString), Seq,SEQUENCE(1,StringLength), PosArray,IF(Seq=StringLength,Seq,IF(MOD(LEN(LEFT(SplitString,Seq))-LEN(SUBSTITUTE(LEFT(SplitString,Seq),Delim,"")),2)=0,IF(MID(SplitString,Seq,1)=Splitter,Seq,""),"")), PosArrayClean,FILTER(PosArray,PosArray<>""),Note8,"Clean blanks", StartArray,FILTER(PosArrayClean,PosArrayClean<>StringLength), EndArray,FILTER(PosArrayClean,PosArrayClean<>1), StringArray,MID(SplitString,StartArray+1,EndArray-StartArray-1), StringArrayB,IF(IgnoreBlanks,FILTER(StringArray,StringArray<>""),StringArray), StringArrayC,IF(CleanTextDelims,IF(LEFT(StringArrayB,1)=Delim,MID(StringArrayB,2,IF(RIGHT(StringArrayB,1)=Delim,LEN(StringArrayB)-2,LEN(StringArrayB))),StringArrayB),StringArrayB), IFERROR(IF(TrimBlanks,TRIM(StringArrayC),StringArrayC),"")))

Is there a function to remove alphanumeric words given a sentence using python

Given a sentence "hi I stay at 4th cross street and my ssn number is 56tyuh". I want to remove words such as alphanumeric ( 4th and 56tyuh ). Does isalpha() is used only to check if there are alphanumerics in sentences? If not how do I use it to remove alphanumerics
You'll need to use regex for this. Regex can be confusing but in this case, it's quite straight forward.
import re
s = 'hi I stay at 4th cross street and my ssn number is 56tyuh'
r = r'\S*\d+\S*'
cut_string = re.sub(r, '', s)
Let's break this down:
r is a regex variable, which detects character sequences of 0-n leading non-whitespace characters, followed by 1-n numeric charcters and again 0-n trailing non-whitespace characters.
re.sub replaces the matches of our regex with the second parameter, in our case an empty string. Thus it removes all matches of our regex from the string.
Edit:
This will also remove numbers. If you only want to remove alphanumeric words, make the follwing change:
r = r'([a-zA-Z]*\d+[a-zA-Z]+|[a-zA-Z]+\d+[a-zA-Z]*)'
Note the | in the center of the variable. This means either match the first part within the parentheses or the second. The first would match 4th but not ep95, the opposite is true for the second.

Format in Python

I have a list of values as follows:
no column
1. 111-222-11
2. 112-333-12
3. 113-444-13
I want to format the value from 111-222-11 to 111-222-011 and format the other values similarly. Here is my code snippet in Python 3, which I am trying to use for that:
‘{:03}-{:06}-{:03}.format(column)
I hope that you can help.
Assuming that column is a variable that can be assigned string values 111-222-11, 112-333-12, 113-444-13 and so on, which you want to change to 111-222-011, 112-333-012, 113-444-013 and so on, it appears that you tried to use a combination of slice notation and format method to achieve this.
Slice notation
Slice notation, when applied to a string, treats it as a list-like object consisting of characters. The positional index of a character from the beginning of the string starts from zero. The positional index of a character from the end of the string starts with -1. The first colon : separates the beginning and the end of a slice. The end of the slice is not included into it, unlike its beginning. You indicate slices as you would indicate indexes of items in a list by using square brackets:
'111-222-11'[0:8]
would return
'111-222-'
Usually, the indexes of the first and the last characters of the string are skipped and implied by the colon.
Knowing the exact position where you need to add a leading zero before the last two digits of a string assigned to column, you could do it just with slice notation:
column[:8] + '0' + column[-2:]
format method
The format method is a string formatting method. So, you want to use single quotes or double quotes around your strings to indicate them when applying that method to them:
'your output string here'.format('your input string here')
The numbers in the curly brackets are not slices. They are placeholders, where the strings, which are passed to the format method, are inserted. So, combining slices and format method, you could add a leading zero before the last two digits of a column string like this:
'{0}0{1}'.format(column[:8], column[-2:])
Making more slices is not necessary because there is only one place where you want to insert a character.
split method
An alternative to slicing would be using split method to split the string by a delimiter. The split method returns a list of strings. You need to prefix it with * operator to unpack the arguments from the list before passing them to the format method. Otherwise, the whole list will be passed to the first placeholder.
'{0}-{1}-0{2}'.format(*column.split('-'))
It splits the string into a list treating - as the separator and puts each item into a new string, which adds 0 character before the last one.

To extract a string based on some specific characters

I have values in rows like below:
Https://abc/uvw/xyz
Https://def/klm/qew/asdas
Https://ghi/sdk/asda/as/aa/
Https://jkl/asd/vcx/asdsss/ssss/
Now i want the result to be like below
Https://abc/uvw/xyz
Https://def/klm/qew
Https://ghi/sdk/asda
Https://jkl/asd/vcx
So how to take result by skipping / for up to some count or is there any other way to get this done in excel. Is there any way to skip result of the RIGHT when it Finds 4 '/' in string?
You could use SUBSTITUTE to replace the nth / (in this case 5th) to a unique character and perform a LEFT based on that unique character obtained from FIND. I'll take CHAR(1) as the unique character:
=LEFT(A1,IFERROR(FIND(CHAR(1),SUBSTITUTE(A1,"/",CHAR(1),5))-1,LEN(A1)))
Another option would be to split on / using Text to Columns under the Data tab and join back only the columns you need.

Remove all text and characters except some

I have here some text strings
"16cg-301 -request","16cg-3368 - for review","16cg-3684 - for process"
what i would like to do is to remove all the text and characters except the number and the letters "cg" and - which is within the reference code.
If the string you want to extract is always before the first space in the full string then you can use SEARCH and LEFT to extract your reference code:
=LEFT(A1,SEARCH(" ",A1)-1)
This formula would take 16cg-3368 from 16cg-3368 - for review.
I suggest using something like suggested here
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
With a replace regex similar to this
[^\dcg]*
or a match regex like this
^([0-9cg- ]+).*
else you could also work with a strange formule similar to this
=CONCATENATE(IF(NOT(ISERROR(SEARCH(MID(A2;1;1);"01234567890cg-")>0));MID(A2;1;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;2;1);"01234567890cg-")>0));MID(A2;2;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;3;1);"01234567890cg-")>0));MID(A2;3;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;4;1);"01234567890cg-")>0));MID(A2;4;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;5;1);"01234567890cg-")>0));MID(A2;5;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;6;1);"01234567890cg-")>0));MID(A2;6;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;7;1);"01234567890cg-")>0));MID(A2;7;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;8;1);"01234567890cg-")>0));MID(A2;8;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;9;1);"01234567890cg-")>0));MID(A2;9;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;10;1);"01234567890cg-")>0));MID(A2;10;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;11;1);"01234567890cg-")>0));MID(A2;11;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;12;1);"01234567890cg-")>0));MID(A2;12;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;13;1);"01234567890cg-")>0));MID(A2;13;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;14;1);"01234567890cg-")>0));MID(A2;14;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;15;1);"01234567890cg-")>0));MID(A2;15;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;16;1);"01234567890cg-")>0));MID(A2;16;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;17;1);"01234567890cg-")>0));MID(A2;17;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;18;1);"01234567890cg-")>0));MID(A2;18;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;19;1);"01234567890cg-")>0));MID(A2;19;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;20;1);"01234567890cg-")>0));MID(A2;20;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;21;1);"01234567890cg-")>0));MID(A2;21;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;22;1);"01234567890cg-")>0));MID(A2;22;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;23;1);"01234567890cg-")>0));MID(A2;23;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;24;1);"01234567890cg-")>0));MID(A2;24;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;25;1);"01234567890cg-")>0));MID(A2;25;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;26;1);"01234567890cg-")>0));MID(A2;26;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;27;1);"01234567890cg-")>0));MID(A2;27;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;28;1);"01234567890cg-")>0));MID(A2;28;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;29;1);"01234567890cg-")>0));MID(A2;29;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;30;1);"01234567890cg-")>0));MID(A2;30;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;31;1);"01234567890cg-")>0));MID(A2;31;1);"");IF(NOT(ISERROR(SEARCH(MID(A2;32;1);"01234567890cg-")>0));MID(A2;32;1);""))
only works by now for less than 33 signs.
problem here will be that you will get unexpected behavior like this:
123cg-123 - Process => 123cg-123-c
after rereading , I think you should try an other approach than described in the question ;-)
If you want to return everything up to and including the last digit, then try:
=LEFT(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,seq,1)),seq))
seq is a named formula: Formula ► Define Name
Name: seq
Refers to: =ROW(INDEX($1:$65535,1,1):INDEX($1:$65535,255,1))
seq returns an array of sequential numbers from 1 to 255.
mid(a1,seq,1)
returns an array consisting of the individual characters in the string in A1. The leading minus sign converts the digits from strings to numbers.
The lookup function will then return the position of the last digit

Resources