Special case in the Unicode String.StartsWith function

Special case in the Unicode String.StartsWith function - string

I have implemented a utf8 string class in C++ with full case-folding case insensitive comparison. By full case-folding I mean a case insensitive comparison where Maße and MASSE would be equal. I use the same comparison function in the StartsWith(...) member function which returns true if it starts with the string specified in the argument. There is a special case where I'm not sure what is the correct behavior:
String myString = "Maß";
bool result = myString.StartsWith("MAS", CaseSensitivity::CaseInsensitive);
So, using full case folding I can convert Maß to mass and MAS to mas. So basically the StartsWith function should return true, because mass starts with mas. But the problem might be that not the whole ß was processed, only the half of it. Is this correct? Or should I return true only when the whole codepoint was processed?

Related

How to check if a string is alphanumeric?

I can use occursin function, but its haystack argument cannot be a regular expression, which means I have to pass the entire alphanumeric string to it. Is there a neat way of doing this in Julia?

I'm not sure your assumption about occursin is correct:
julia> occursin(r"[a-zA-z]", "ABC123")
true
julia> occursin(r"[a-zA-z]", "123")
false

but its haystack argument cannot be a regular expression, which means I have to pass the entire alphanumeric string to it.
If you mean its needle argument, it can be a Regex, for eg.:
julia> occursin(r"^[[:alnum:]]*$", "adf24asg24y")
true
julia> occursin(r"^[[:alnum:]]*$", "adf24asg2_4y")
false
This checks that the given haystack string is alphanumeric using Unicode-aware character class
[[:alnum:]] which you can think of as equivalent to [a-zA-Z\d], extended to non-English characters too. (As always with Unicode, a "perfect" solution involves more work and complication, but this takes you most of the way.)
If you do mean you want the haystack argument to be a Regex, it's not clear why you'd want that here, and also why "I have to pass the entire alphanumeric string to it" is a bad thing.

As has been noted, you can indeed use regexes with occursin, and it works well. But you can also roll your own version, quite simply:
isalphanumeric(c::AbstractChar) = isletter(c) || ('0' <= c <= '9')
isalphanumeric(str::AbstractString) = all(isalphanumeric, str)

Compare to qml string "10" and "9"

I've tried to compare qml string "10" biggest than "9", but console.log sent me false
console.log("10" > "9");
console output false
qml: false
Explain to me why this doesn't work

Explain to me why this doesn't work
This is really a JavaScript question because that is the language used by QML processor. You're comparing strings, not numbers. Here's a good explanation of string (and in general) comparison in JavaScript.
I will quote the relevant part and present my own example:
To see whether a string is greater than another, JavaScript uses the so-called “dictionary” or “lexicographical” order. In other words, strings are compared letter-by-letter.
The algorithm to compare two strings is simple:
Compare the first character of both strings.
If the first character from the first string is greater (or less) than the other string’s, then the first string is greater (or less) than the second. We’re done.
Otherwise, if both strings’ first characters are the same, compare the second characters the same way.
Repeat until the end of either string.
If both strings end at the same length, then they are equal. Otherwise, the longer string is greater.
console.log("10" > "9"); // false, first character is smaller
console.log("9" > "8"); // true, first character is larger
console.log("9" > "08"); // true, first char. is larger
console.log("10" > "09"); // true, first char. is larger
console.log("100" > "11"); // false, second char. is smaller

If you know the strings will be integers always (assuming they will come from some variable) you can change your code to parse them and the comparison will work correctly:
console.log(parseInt("10") > parseInt("9"));

You can use this workaround:
console.log("10"*1 > "9"*1); // true
If you are going to use parseInt take care about the radix parameter. In previus example there is no radix parameter, so the interpreter will fall back to the default behaviour, which typically treats numbers as decimal, unless they start with a zero (octal) or 0x (hexadecimal)

Is there a built-in in Python 3 that checks whether a character is a "basic" algebraic symbol?

I know the string methods str.isdigit, str.isdecimal and str.isnumeric.
I'm looking for a built-in method that checks if a character is algebraic, meaning that it can be found in a declaration of a decimal number.
The above mentioned methods return False for '-1' and '1.0'.
I can use isdigit to retrieve a positive integer from a string:
string = 'number=123'
number = ''.join([d for d in string if d.isdigit()]) # returns '123'
But that doesn't work for negative integers or floats.
Imagine a method called isnumber that works like this:
def isnumber(s):
for c in s:
if c not in list('.+-0123456789'):
return False
return True
string1 = 'number=-1'
string2 = 'number=0.1'
number1 = ''.join([d for d in string1 if d.isnumber()]) # returns '-1'
number2 = ''.join([d for d in string2 if d.isnumber()]) # returns '0.1'
The idea is to test against a set of "basic" algebraic characters. The string does not have to contain a valid Python number. It could also be an IP address like 255.255.0.1.
.
Does a handy built-in that works approximately like that exist?
If not, why not? It would be much more efficient than a python function and very useful. I've seen alot of examples on stackoverflow that use str.isdigit() to retrieve a positive integer from a string. Is there a reason why there isn't a built-in like that, although there are three different methods that do almost the same thing?

No such function exists. There are a bunch of odd characters that can be part of number literals in Python, such as o, x and b in the prefix of integers of non-decimal bases, and e to introduce the exponential part of a float. I think those plus the hex digits (0-9 and A-F) and sign characters and the decimal point are all you need.
You can put together a string with the right character yourself and test against it:
from string import hex_digits
num_literal_chars = hex_digits + "oxOX.+-"
That will get a bunch of garbage though if you use it to test against mixed text and numbers:
string1 = "foo. bar. 0xDEADBEEF 10.0.0.1"
print("".join(c for c in string1 if c in num_literal_chars))
# prints "foo.ba.0xDEADBEEF10.0.0.1"
The fact that it gives you a bunch of junk is probably why no builtin function exists to do this. If you want to match a certain kind of number out of a string, write an appropriate regular expression to match that specific kind of number. Don't try to do it character-by-character, or try to match all the different kinds of Python numbers.

Smalltalk: how to check wether a string contains only numbers?

so basically I have some input possibilities for the user where should only numbers be accepted, otherwise the user will be alerted his input was incorrect.
the input is considered a String when I read it out using a callback.
now I want to check whether the string(which SHOULD contain numbers) actually DOES ONLY contain numbers, but I didnt find a solution implemented already.
i tried
theString isInteger
-is never true for the string
theString asNumber
- ignores letters, but I want to have a clear output wether letters are included in the string or not
theString isNumber
- always false

In Squeak and Pharo, you have the message #isAllDigits that does exactly what you want:
'1233248539487523' isAllDigits "--> true"

You can use a regular expression to check that the string contains only numbers:
theString matchesRegex: '\d+'
or a more complex regular expression to also allow an optional sign and decimal point:
theString matchesRegex: '-?\\d+(\\.\\d+)?'

Unfortunately, I could not locate messages 'isAllDigits' or 'matchesRegex'on Cincom Smalltalk.
However, what you could do is extract a word from the string and convert it to a number using asNumber.
So, if the returned value is 0(zero) it means that either the number is actually a 0(which could e checked with an additional condition) or string did not contain a digit/number.

This should work with many Smalltalks dialects:
(aString detect: [:c| c isDigit not ]) isNil ifTrue: [ "it's a number" ].

Checking if all letters in a string (from any major spoken language) are upper-casee

I simply want to check if all the letters that occur in a string are upper-case (if they have lower- and upper-case variants). Tcl's built-in procs don't behave quite as desired, e.g.,
string is upper "123A"
returns false, but I would want it to return true. I would also want it to return true if the A were replaced with, say, an upper-case Cyrillic letter, or a letter from another popular alphabet that doesn't have a case. I could simply filter out all non-letters from the string, but that's not so simple I think when you're trying to handle letters from languages other than just English.

In this case, you don't want string is upper as that checks if the string is just upper case letters. (Numbers aren't letters.)
Instead, you want to do:
set str "123A"
if {$str eq [string toupper $str]} {
# It's upper-case by your definition...
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Special case in the Unicode String.StartsWith function - string

Related

How to check if a string is alphanumeric?

Compare to qml string "10" and "9"

Is there a built-in in Python 3 that checks whether a character is a "basic" algebraic symbol?

Smalltalk: how to check wether a string contains only numbers?

Checking if all letters in a string (from any major spoken language) are upper-casee

Categories

Resources