Function that sorts words by punctuation in excel

Function that sorts words by punctuation in excel - excel

I have a task to create reversed alphabetized list in excel. I thought it was easy to do, created a function to write words from behind and sorted list by that. It would work... if my language was English. But my language is Slovak, which uses bunch of characters with punctuation like á, ä, ô, š etc. And syllables containing these letters should be grouped. For example words strany, hrany, planý, plány, vraný, vrany should be sorted in order hrany, strany, vrany, plány, planý, vraný. Instead of, these words are sorted in order plány,
planý,
hrany,
strany,
vrany,
vraný.
I thought that switching language is enough, but seems all collates sort this way. I have tried to switch from ISO 8859-2 to unicode and several other encodings, but it didn't make a change as well.
So my question is, is there any encoding+locale setting in windows 10 that will do it? And if not, is it possible to do it through VBA function?
Thanks for any idea.

I have solved this problem by myself with pretty simple solution:
1, get hex codes of the characters
2, translate them into unique code containing only ascii chars (a = aa, á = ab...)
3, sort this translated row

Related

Arabic words aren't displayed properly in DrRacket

I work on Arabic scripted texts in DrRacket but the characters stand seprate, they have to be attached to each other.
The second problem is that DrRacket reads them left-to-right like in Latin script.
When I am posting here in order to show how they look in DrRacket; they get conjoined and its order gets reversed (which is good since Arabic script is written right-to-left).
(explode "اعمالنده")
(list "اعمالنده" "اولا")
(list "آبار — آوار كلمه+سنه نظر بیوریله")
This is how it looks in DrRacket1
Why do the chars stand separately and why aren't the order of the chars as they should be from right-to left?

How to get unicode of characters from 55296 to 56319 in Excel

I generated a list of letters in excel, from character codes 1 to 66535.
I am trying to get back the unicode by using the function "UNICODE". However, excel return #VALUE! for character codes from 55296 to 56319.
Please advise if there are any other function that can return a proper unicodes.
Thank you.

The range you are listing is a special range in Unicode: surrogates.
So, they have Unicode code point, but the problem it is you cannot have them in a text: Windows uses UCS-2/UTF-16 as internal encoding, so there are no way you can put in text. Or better: you to have code points above 65535, Windows uses two surrogates, one in the range 0xD800-0xDBFF (high surrogate) and the second one 0xDC00-0xDFFF )low surrogate). By combining these two, you have all Unicode code points.
But so, you should never have a single surrogate (or a mismatch surrogate, e.g. a high surrogate not followed a low surrogate, or a low surrogate not preceded be a high surrogate).
So, just skip such codes. Or better use them correctly to have characters above 65535.
Note: you cannot have all Unicode characters only with one code point. many characters requires combining many code points (there is a whole category of "combining characters" in Unicode). E.g. the zero with a oblique line is rendered with two unicode characters: the normal zero, and a variant selector. Also accented characters are very limited (and often with just one accent per characters). And without going to more complex scripts.

Using the len() function in python I am somehow getting the wrong length

So I am learning to code and I am an absolute beginner here. Beginner as in I've been coding for maybe like 5 hours total. Not including the 4.5 hour video I watched at 1.5 times speed for my intro to python.
I am trying to make a hangman game and was messing around with it. I have a separate file full of words that the game randomly draws from, and then the user tries to guess it. My problem is that after selecting the "hidden_word" if I use len(hidden_word) and it gives me the wrong length of the word.
They are all single words, and in the text file there are no spaces before or after any of the words. Each word is on a new line, and every word selected has a length of 1 greater than it should be. For instance, the word Jinx, apparently has 5 letters.
The file is literally just this list, but 45 lines.
Awkward
Bagpipes
Banjo
Bungler
...
My code:
from random import randint
# open words file and choose a hidden word
dictionary = open("words.txt", "r")
words = dictionary.readlines()
hidden_word = words[randint(0, 45)]
dictionary.close()
print(hidden_word)
It always gives me a length of 1 longer than it should be.
Thank you for the help.

As #jonrsharpe pointed out in the comments, the newline character \n counts as a character, and that newline character is included in each item of .readlines().
To accurately get the length, you're going to want to strip the whitespace (spaces and newlines, etc.) before trying to find the length:
hidden_word = hidden_word.strip()
print(len(hidden_word))

Randomize characters in string?

unfortunately I am struggling a bit with what I thought would be an easy task.
I am writing an autocorrect in AHK for words I frequently mistype, and instead of writing every possible way to write every word incorrectly I have written a list with words I often mistype in their correct form. I now want to take each item from that list, leave the first character as is and randomize two consectutive letters in every possible way with the exclusion "is already in list".
So in pseudo code it would be:
For each word in correctWords
{
FirstLetter = split to chararray(0)
newWord = split to chararray(>0)
randomized = Firstletter + newWord.randomizeTwoLetters
if(!correctWords.Contains(randomized); correctWords.AddToList(randomized))
}
The part I struggle with is obviously the randomizeTwoLetters(), how would you go about that?
I hope you can help me, thank you!

Pulled from the AutoHotkey Help File:
AutoCorrect 4700 Common Words
If you still want to do this yourself (in C#?) look into various algorithms like: Generating a random sequence with no repeats for starters.

String compare : Comparing 'zürich' and 'zurich' results in -1

I'm trying to do a string compare for 'zürich' and 'zurich'
Something like this:
int compareResult = String.Compare(zürich, zurich);
So what happens is that it returns -1, which causes a problem as I'm using compareResult for an if-else later.
Can someone point me to the right direction on why does this happen. Do I need to clean this first before comparing "zürich" or is it something else?

you use the method just fine, but the strings are actually different.
so, in order to make this comparison in your way, you need:
decide if this you want every comparison that uses ü and other "special" latin characters to look at them as they were the simple characters.
i.e. in every time you see ü, it will treat it as a "u"
if so, you need to do pre-processing of both the strings, and replace all special chars with regular ones.
there is another thread about it here:
How can I remove accents on a string?
hope it helped.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Function that sorts words by punctuation in excel - excel

I have solved this problem by myself with pretty simple solution: 1, get hex codes of the characters 2, translate them into unique code containing only ascii chars (a = aa, á = ab...) 3, sort this translated row

Related

Arabic words aren't displayed properly in DrRacket

How to get unicode of characters from 55296 to 56319 in Excel

Using the len() function in python I am somehow getting the wrong length

Randomize characters in string?

String compare : Comparing 'zürich' and 'zurich' results in -1

Categories

Resources