String compare : Comparing 'zürich' and 'zurich' results in -1 - string

I'm trying to do a string compare for 'zürich' and 'zurich'
Something like this:
int compareResult = String.Compare(zürich, zurich);
So what happens is that it returns -1, which causes a problem as I'm using compareResult for an if-else later.
Can someone point me to the right direction on why does this happen. Do I need to clean this first before comparing "zürich" or is it something else?

you use the method just fine, but the strings are actually different.
so, in order to make this comparison in your way, you need:
decide if this you want every comparison that uses ü and other "special" latin characters to look at them as they were the simple characters.
i.e. in every time you see ü, it will treat it as a "u"
if so, you need to do pre-processing of both the strings, and replace all special chars with regular ones.
there is another thread about it here:
How can I remove accents on a string?
hope it helped.

Related

Split only when quotes contain 3 characters or more

I'm trying to split a given string when two quotes appear and they contain at least 3 characters(I also have to split whenever . or , appear). So, something like hello"example"hello,cat should return [hello;example;hello;cat].
I came up with:
re.split("\'(...+)\'|\.|,","hello'example'hello,cat")
This works fine with the quotes, but whenever it split for . or , this happens:
['hello', 'example', 'hello', None, 'cat']
I found out the capture group is the one that causes it (the None in the middle of the list), but it is the only way I know to keep the content.
Please keep in mind that I have to do as few as possible computations because the program shall work with huge files, also I'm not very experienced with Python so sorry if I did something obvious wrong.
Try just:
re.split("\'|\.|,", "hello'example'hello,cat")
It's tricky because the open quote and close quote is the exact same character. I think you'd have to use a negative look behind to exclude any single quote that is preceded by 0, 1 or 2 characters and another single quote. In addition, you'd have to use a positive lookahead. This works in javascript.
re.split("(?<!'(.?|..))'(?=[^']{3,})|\.|\,", "hello'example'hello,cat")
But it doesn't look like python supports variable-length lookbehinds. Also, this won't work if there is a lone single quote (apostrophe).

Nodejs equivalent of c sscanf

I need a function that behaves similar to the behavior of sscanf
For example, let's suppose we have a format string that looks like this (the function I'm looking for doesn't have to be exactly like this, but something similar)
"This is normal text that has to exactly match, but here is a ${var}"
And have return/modify a variable to look like
{'var': <whatever was there>}
After researching this for a while, the only things I could actually find was scanf, but that takes input form stdin, and not a string
I am aware that there is a regex solution for this, but I'm looking for a function that does this without the need for regex (regex is slow). However, if there is no other solution for this, I will accept a regex solution.
The normal solution for this in most languages that have regular expressions built-in is to use regular expressions.
If you're not used to or don't like regular expressions I'm sorry. Most of the programming world have assumed that knowledge of regular expressions is mandatory.
In any case. The normal solution to this is string.prototype.match:
let text = get_string_to_scan();
let match = text.match(/This is normal text that has to exactly match, but here is a (.+)/);
if (match) { // match is null if no match is found
// The result you want is in match[1]
console.log('value of var is:', match[1]);
}
What pattern you put in your capture group (the (..) part) depends on what you want. The code above captures anything at all including spaces and special characters.
If you just want to capture a "word", that is, printable characters without spaces, then you can use (\w+):
text.match(/This is normal text that has to exactly match, but here is a (\w+)/)
If you want to capture a word with only letters but not numbers you can use ([a-zA-Z]+):
text.match(/This is normal text that has to exactly match, but here is a ([a-zA-Z]+)/)
The flexibility of regular expression is why other methods of string scanning are usually not supported in languages that have had regular expression built-in since the beginning. But of course, flexibility comes with complexity.
Do you mean to have the ${var} to act as a placeholder? If so you could do it by replacing the " with the backtick:
console.log(`This is normal text that has to exactly match, but here is a ${"whatever was there"}`)

Function that sorts words by punctuation in excel

I have a task to create reversed alphabetized list in excel. I thought it was easy to do, created a function to write words from behind and sorted list by that. It would work... if my language was English. But my language is Slovak, which uses bunch of characters with punctuation like á, ä, ô, š etc. And syllables containing these letters should be grouped. For example words strany, hrany, planý, plány, vraný, vrany should be sorted in order hrany, strany, vrany, plány, planý, vraný. Instead of, these words are sorted in order plány,
planý,
hrany,
strany,
vrany,
vraný.
I thought that switching language is enough, but seems all collates sort this way. I have tried to switch from ISO 8859-2 to unicode and several other encodings, but it didn't make a change as well.
So my question is, is there any encoding+locale setting in windows 10 that will do it? And if not, is it possible to do it through VBA function?
Thanks for any idea.
I have solved this problem by myself with pretty simple solution:
1, get hex codes of the characters
2, translate them into unique code containing only ascii chars (a = aa, á = ab...)
3, sort this translated row

How to capture a string between parentheses?

str = "fa, (captured)[asd] asf, 31"
for word in str:gmatch("\(%a+\)") do
print(word)
end
Hi! I want to capture a word between parentheses.
My Code should print "captured" string.
lua: /home/casey/Desktop/test.lua:3: invalid escape sequence near '\('
And i got this syntax error.
Of course, I can just find position of parentheses and use string.sub function
But I prefer simple code.
Also, brackets gave me a similar error.
The escape character in Lua patterns is %, not \. So use this:
word=str:match("%((%a+)%)")
If you only need one match, there is no need for a gmatch loop.
To capture the string in square brackets, use a similar pattern:
word=str:match("%[(%a+)%]")
If the captured string is not entirely composed of letters, use .- instead of %a+.
lhf's answer likely gives you what you need, but I'd like to mention one more option that I feel is underused and may work for you as well. One issue with using %((%a+)%) is that it doesn't work for nested parentheses: if you apply it to something like "(text(more)text)", you'll get "more" even though you may expect "text(more)text". Note that you can't fix it by asking to match to the first closing parenthesis (%(([^%)]+)%)) as it will give you "text(more".
However, you can use %bxy pattern item, which balances x and y occurrences and will return (text(more)text) in this case (you'd need to use something like (%b()) to capture it). Again, this may be overkill for your case, but useful to keep in mind and may help someone else who comes across this problem.

Add parenthesis to vim string

I'm having a bit of trouble using parenthesis in a vim string. I just need to add a set of parenthesis around 3 digits, but I can't seem to find where I'm suppose to correctly place them. So for example; I would have to place them around a phone number such as: 2015551212.
Right now I have a strings that separates the numbers and puts a hyphen between them. For example; 201 555-1212. So I just need the parenthesis. The final result should look like: (201) 555-1212
The string I have so far is this: s/\(\d\{3}\)\(\d\{3}\)/\1 \2-/g
How might I go about doing this?
Thanks
Just add the parens around the \1 in your replacement.
s/\(\d\{3\}\)\(\d\{3\}\)/(\1) \2-/g
If you want to go in reverse, and change "(800) 555-1212" to "8005551212", you can use something like this:
s/(\(\d\d\d\))\ \(\d\d\d\)-\(\d\d\d\d\)/\1\2\3/g
Instead of the \d\d\d, you could use \d\{3\}, but that is more trouble to type.

Resources