How check if text contains only one upperCase character in DB2 - string

Is it built-in function in DB2 to check if text(varchar) contains ONLY one uppercase character or i have to iterate a whole string and check every character?

you can use REGEXP_COUNT
something like:
SELECT REGEXP_COUNT( 'Steven Jones and Stephen Smith are the best players', '[A-Z]') FROM sysibm.sysdummy1
Result: 4
Take care if you can have accented chars, you may need to change the regexp pattern to consider them (don't remember the syntax, but you can find it easily on the web).
Also, look at the documentation to specify the length unit, if using a unicode string.

Related

Regex remove both apostrophes if they exist in Python

I’m quit new to Regex but almost finished with my text mining script. Only one thing fails: I’m trying to remove the apostrophes between a word if they exist. I’m using re.sub for this.
For instance:
‘Apple’ needs to be Apple
‘apple’ needs to be apple
‘[apple]’ needs to be [apple]
‘(apple)’ needs to be (apple)
However: Apple’s needs to stay Apple’s because there is only one apostrophe.
How do I select both apostrophes when there is a word in between so I can delete them with re.sub? In every try I remove the entire string! Hopefully someone can help.
My code is as follows:
str_o='\'Apple\''
str_o_a = re.sub(r"\'(.*?)\'","", str_o)
I have a simpler idea: split by whitespace, trim leading and trailing apostrophes, join with whitespace. Avoids having to write a regular expression and handles sentences such as "She's 'her' mother's daughter".
text = "She's 'her' mother's daughter"
text = ' '.join([word.strip("'") for word in text.split()])
print(text)
# She's her mother's daughter
The purpose of the parentheses in your regular expression was probably to capture the string you want to keep. The idiom looks like
str_o_a = re.sub(r"'([^']*)'", r"\1", str_o)
You want a raw string around the replacement, too, in order to preserve the backslash in the argument (otherwise you would be replacing with the literal string "\x01").
Notice also the preference for using a negated character class over a non-greedy "match anything" wildcard.

Dialogflow RE2 Regex

I am new here. I wanted to ask a question on using REGEX for an entity in DialogFlow
I wanted the entity to accept all text and spaces except for the symbol *
I have tried to use [A-Za-z0-9 ][^*], but it is not working. Any advice. thanks!
In your Regex expression, [^*] means "capture any character at the start of the line." To refer to a literal asterisk rather than matching any character, you need to use \*
If you want to match a line of letters or numbers as in the [A-Za-z0-9] example you give, but only if that string does not include an asterisk, then this expression should work for you:
^[a-zA-Z0-9]+$
This means "match a whole line of text if it only contains one or more of the characters a-z, A-Z, or 0-9".
If you want to match any character or group of characters in a line except for the asterisk, then you could use something like this:
(?!\*)([a-zA-Z0-9]+)(?<!\*)
The first part is called a "negative lookahead," and it looks forward to ensure we're not matching the asterisk. The last part is called a "negative lookbehind," and it looks backwards to make sure we're not matching the asterisk. The middle part is your "capture group," and confirms that you're matching any letters or numbers in a given string, but excluding the * character.
If this Regex gets input like *abc, it will capture abc. If it encounters abc*, it will still capture abc. If it encounters abc*def, it will capture abc and def separately in two capture groups, because it will break around the asterisk.
This link explains the concept of lookarounds in Regex. You can also use this Regex tester to get started practicing your Regular Expressions with explanations of what each block of characters does.
EDITED TO ADD If you're just interested in matching single characters rather than groups of characters, you can use [A-Za-z0-9] and match any upper or lowercase letter and any single digit. You don't need to exclude the * character, because the character group is already exclusive.
This is a slight duplicate of the question below, so responses here may also help you. Hope this helps!
How can I exclude asterisk in a regex expression
[A-Za-z0-9 ][^*]
What you regex will do is match 2 consecutive characters. First, it will look for anything A-Za-z0-9 . Then, it will look at the negated set that includes *, and will match ANY character except *.
You can type your regex into https://regexr.com/ to see a breakdown of how it matches and test some strings.
For example, your regex would match these:
Aa
AA
a&
A1
0_
But would not match these:
A*
a*
1*
And WOULD NOT match anything longer than 2 characters. If you really want to match any string with any characters except *, this should work:
[^\*]+
What that will do is match any number of consecutive characters that are not *. (The + means match 1 or more characters in the set). It is also a good idea to escape * because it is also a reserved character in regex. Even though most regex parsers are smart enough to know that inside a group you probably mean the literal char *, it is still a best practice to escape it. (And by that same token, you would want to use \s instead of the blank space in your original regex.)

Moving Columns/Text in VIM

I was wondering how I might go about moving around Columns/Text around in VIM using a string. I have a short list of names I have to reorder, which need to be placed in Last Name First Middle to First Middle Last.
So here would be an example list:
Plant, Robert A.
Page, Jimmy
Bonhham, John H.
Jones, John Paul
I was thinking that the string should look something like this:
:s/\([A-z]\{2}\)\(\[A-z]\{2}\)/2\1/
Thanks
First, I recommend using the \v "very magic" flag to avoid all the other internal escaping of metacharacters. This will work with a replacement like:
:s/\v([A-z]+),\s+([A-z]+)(\s+[A-z.]+)?/\2\3 \1
Breaking it down:
([A-z]+) Capture the last name into \1
,\s+ A literal comma and one or more spaces
([A-z]+) Capture the first name into \2
(\s+[A-z.]+)? Capture the middle name with its leading spaces, since it may not exist. Also permit the ., and end with a ? to make the whole construct optional, into \3
\2\3 \1 Replace with the second group (first name) followed immediately by the middle name \3 with no space in between because the space was captured along with the middle name. Then append \1 the last name.
If the names could be possibly more than [A-z]+, you may alternatively use [\S]+ to capture all non-whitespace characters.
How about this:
:%s/\([[:alpha:]]\+\), \([[:alpha:]]\+\)\( [[:alpha:]]\+\.\?\)\?/\2\3 \1/g
This captures last, middle (optional), and first, and reorders them in the replacement. You'll probably need to include additional characters in the [[:alpha:]] collection, but this works for your example.
For more information, learn how the excellent and comprehensive :help is structured; all the information is in there (you just need to know how to find it)! There are also many very similar regular expression questions here on Stack Overflow.

replacing part of regex matches

I have several functions that start with get_ in my code:
get_num(...) , get_str(...)
I want to change them to get_*_struct(...).
Can I somehow match the get_* regex and then replace according to the pattern so that:
get_num(...) becomes get_num_struct(...),
get_str(...) becomes get_str_struct(...)
Can you also explain some logic behind it, because the theoretical regex aren't like the ones used in UNIX (or vi, are they different?) and I'm always struggling to figure them out.
This has to be done in the vi editor as this is main work tool.
Thanks!
To transform get_num(...) to get_num_struct(...), you need to capture the correct text in the input. And, you can't put the parentheses in the regular expression because you may need to match pointers to functions too, as in &get_distance, and uses in comments. However, and this depends partially on the fact that you are using vim and partially on how you need to keep the entire input together, I have checked that this works:
%s/get_\w\+/&_struct/g
On every line, find every expression starting with get_ and continuing with at least one letter, number, or underscore, and replace it with the entire matched string followed by _struct.
Darn it; I shouldn't answer these things on spec. Note that other regex engines might use \& instead of &. This depends on having magic set, which is default in vim.
For an alternate way to do it:
%s/get_\(\w*\)(/get_\1_struct(/g
What this does:
\w matches to any "word character"; \w* matches 0 or more word characters.
\(...\) tells vim to remember whatever matches .... So, \(w*\) means "match any number of word characters, and remember what you matched. You can then access it in the replacement with \1 (or \2 for the second, etc.)
So, the overall pattern get_\(\w*\)( looks for get_, followed by any number of word chars, followed by (.
The replacement then just does exactly what you want.
(Sorry if that was too verbose - not sure how comfortable you are with vim regex.)

Most reliable split character

Update
If you were forced to use a single char on a split method, which char would be the most reliable?
Definition of reliable: a split character that is not part of the individual sub strings being split.
We currently use
public const char Separator = ((char)007);
I think this is the beep sound, if i am not mistaken.
Aside from 0x0, which may not be available (because of null-terminated strings, for example), the ASCII control characters between 0x1 and 0x1f are good candidates. The ASCII characters 0x1c-0x1f are even designed for such a thing and have the names File Separator, Group Separator, Record Separator, Unit Separator. However, they are forbidden in transport formats such as XML.
In that case, the characters from the unicode private use code points may be used.
One last option would be to use an escaping strategy, so that the separation character can be entered somehow anyway. However, this complicates the task quite a lot and you cannot use String.Split anymore.
You can safely use whatever character you like as delimiter, if you escape the string so that you know that it doesn't contain that character.
Let's for example choose the character 'a' as delimiter. (I intentionally picked a usual character to show that any character can be used.)
Use the character 'b' as escape code. We replace any occurrence of 'a' with 'b1' and any occurrence of 'b' with 'b2':
private static string Escape(string s) {
return s.Replace("b", "b2").Replace("a", "b1");
}
Now, the string doesn't contain any 'a' characters, so you can put several of those strings together:
string msg = Escape("banana") + "a" + Escape("aardvark") + "a" + Escape("bark");
The string now looks like this:
b2b1nb1nb1ab1b1rdvb1rkab2b1rk
Now you can split the string on 'a' and get the individual parts:
b2b1nb1nb1
b1b1rdvb1rk
b2b1rk
To decode the parts you do the replacement backwards:
private static string Unescape(string s) {
return s.Replace("b1", "a").Replace("b2", "b");
}
So splitting the string and unencoding the parts is done like this:
string[] parts = msg.split('a');
for (int i = 0; i < parts.length; i++) {
parts[i] = Unescape(parts[i]);
}
Or using LINQ:
string[] parts = msg.Split('a').Select<string,string>(Unescape).ToArray();
If you choose a less common character as delimiter, there are of course fewer occurrences that will be escaped. The point is that the method makes sure that the character is safe to use as delimiter without making any assumptions about what characters exists in the data that you want to put in the string.
I usually prefer a '|' symbol as the split character. If you are not sure of what user enters in the text then you can restrict the user from entering some special characters and you can choose from those characters, the split character.
It depends what you're splitting.
In most cases it's best to use split chars that are fairly commonly used, for instance
value, value, value
value|value|value
key=value;key=value;
key:value;key:value;
You can use quoted identifiers nicely with commas:
"value", "value", "value with , inside", "value"
I tend to use , first, then |, then if I can't use either of them I use the section-break char §
Note that you can type any ASCII char with ALT+number (on the numeric keypad only), so § is ALT+21
\0 is a good split character. It's pretty hard (impossible?) to enter from keyboard and it makes logical sense.
\n is another good candidate in some contexts.
And of course, .Net strings are unicode, no need to limit yourself with the first 255. You can always use a rare Mongolian letter or some reserved or unused Unicode symbol.
There are overloads of String.Split that take string separators...
I'd personally say that it depends on the situation entirely; if you're writing a simple TCP/IP chat system, you obviously shouldn't use '\n' as the split.. But '\0' is a good character to use due to the fact that the users can't ever use it!
First of all, in C# (or .NET), you can use more than one split characters in one split operation.
String.Split Method (Char[]) Reference here
An array of Unicode characters that delimit the substrings in this instance, an empty array that contains no delimiters, or null reference (Nothing in Visual Basic).
In my opinion, there's no MOST reliable split character, however some are more suitable than others.
Popular split characters like tab, comma, pipe are good for viewing the un-splitted string/line.
If it's only for storing/processing, the safer characters are probably those that are seldom used or those not easily entered from the keyboard.
It also depend on the usage context. E.g. If you are expecting the data to contain email addresses, "#" is a no no.
Say we were to pick one from the ASCII set. There are quite a number to choose from. E.g. " ` ", " ^ " and some of the non-printable characters. Do beware of some characters though, not all are suitable. E.g. 0x00 might have adverse effect on some system.
It depends very much on the context in which it's used. If you're talking about a very general delimiting character then I don't think there is a one-size-fits-all answer.
I find that the ASCII null character '\0' is often a good candidate, or you can go with nitzmahone's idea and use more than one character, then it can be as crazy as you want.
Alternatively, you can parse the input and escape any instances of your delimiting character.
"|" pipe sign is mostly used when you are passing arguments.. to the method accepting just a string type parameter.
This is widely used used in SQL Server SPs as well , where you need to pass an array as the parameter. Well mostly it depends upon the situation where you need it.

Resources