Algorithm for finding what is misspelled in a word - string

Im dyslexic and so im trying to make a console application to test my spelling. When I misspell a word i print the correction out. The issue is if I spell "matter" as "mater" i wont notice that im missing a t form first glance. So i want to highlight certain types of errors. Assuming that you the computer knows the word you are trying to spell. Is there an algorithm that can identify missing character, wrong characters or extra characters eg an extra t in "watter" instead of "water". The amount of missing or extra characters is arbitrary. I should just relise there is something missing or extra and what it is.
Ideally do something like this

Related

How to read a csv file that has points as thousand separator on excel

So, I've got that huge csv file that contains numbers that use "." as number separators (I guess this is how they roll in germany). Some of them are negative numbers.
I have to check that the sum is a certain amount just to be sure they sent me the correct data. When I just replace the dots with nothing I get an incorrect total (close to the total they sent me, but still incorrect). And as I can't review the whole file to find if there is something wrong somewhere, I can't be certain that the issue lies with the data or with something I didn't expect (like a line that would use "." as a decimal separator for example, but maybe there are more exotic cases that I could quite not imagine)
I'm pretty sure there must be a way to make excel understand that "." is a thousand separator, but so far I didn't manage to make that custom format understand what I'm trying to say.
Well this is actually half-true, I can make him understand that it should write 1.000.000 instead of 1000000 but I can't make him understand that it should read 1.000.000 as 1000000.
I also tried my luck at changing the separator in File > Options > Advanced > Use system separator, but it doesn't seem to work (like at all, when I change it, nothing changes, maybe this feature is bugged)
NB : I'm french and my default separator is a space. Though I could change the language to english, I can't change it to german because the package is not installed and I can't install anything on my working computer (cause "securtity and blahblahblah").
Thank you for your kind help.
Regards.

Unicode character order problem when text is displayed

I am working on an application that converts text into some other characters of the extended ASCII character set that gets displayed in custom font.
The program operation essentially parses the input string using regex, locates the standard characters and outputs them as converted before returning a string with the modified text which displays correctly when viewed with the correct font.
Every now and again, the function returns a string where the characters are displayed in the wrong order, almost like they are corrupted or some data is missing from the Unicode double width spacing. I have examined the binary output, the hex data, and inspected the data in the function before i return it and everything looks ok, but every once in a while something goes wrong and cant quite put my finger on it.
To see an example of what i mean when i say the order is weird, just take a look at the following piece of converted text output from the program and try to highlight it with your mouse. You will see that it doesn't highlight in the order you expect despite how it appears.
Has anyone seen anything like this before and have they any ideas as to what is going on?
ך┼♫יἯ╡П♪דἰ
You are mixing various Unicode characters with different LTR/RTL characteristics.
LTR means "left-to-right" and is the direction that English (and many other western language) text is written.
RTL is "right-to-left" and is used mostly by Arabic and Hebrew (as well as several other scripts).
By default when rendering Unicode text the engine will try to use the directionality of the characters to figure out what direction a given part of the code should go. And normally that works just fine because Hebrew words will have only Hebrew letters and English words will only use letters from the Latin alphabet, so for each chunk there's a easily guessable direction that makes sense.
But you are mixing letters from different scripts and with different directionality.
For example ך is U+05DA HEBREW LETTER FINAL KAF, but you also use two other Hebrew characters. You can use something like this page to list the Unicode characters you used.
You can either
not use "wrong" directionality letters or
make the direction explict using a Left-to-right mark character.
Edit: Last but not least: I just realized that you said "custom font": if you expect displaying with a specific custom font, then you should really be using one of the private use areas in Unicode: they are explicitly reserved for private use like this (i.e. where the characters don't match the publicly defined glyphs for the codepoints). That would also avoid surprises like the ones you get, where some of the used characters have different rendering properties.

VBA string comparison failure

i met interesting issue when im comparing two strings. Im reading data from file and everything works well. But then co-worker send me input file, which is just CTRL+C and CTRL+V of working file. And then miracle happend! VBA is so confused, that cant compare two simple strings and i fell of chair.
If you take a look at image you can see that comparison passed if condition where are two same strings, but it should not. Im a bit confused how this can happen.
So met someone something like this? Im realy start thinking about something like machine revolution from Terminator. (files are both saved in notepad++ and there are no strange characters or something like that)
Progress update
So i tried hints from guys in comments below. and ended with something like this
If CStr(Trim(rowArray(4))) <> (CStr("N/A")) Then
Contentent of rowArray(4) is still "N/A" string as on picture above and excel still thinks this strings arent same. I also saved file in pspad, netbeans, and normal notepad and issue is still same.
Use the immediate window to test the contents of the variable:
For i = 1 To Len(rowArray(4)): Print Asc(Mid(rowArray(4), i, 1)): Next
This will print the ASCII value of each character in the string - you can use this to determine what the extra character(s) are causing the issue.

What kind of sign is "‎" and what is it used for

What kind of sign is "‎" and what is it used for (note there is a invisible sign there)?
I have searched through all my documents and found a lot of them. They messed upp my htaccess file. I think I got them when I copied webadresses from google to redirect. So maybe a warning searching through your documents for this one also :)
It is U+200E LEFT-TO-RIGHT MARK. (A quick way to check out such things is to copy a string containing the character and paste it in the writing area in my Full Unicode input utility, then click on the “Show U+” button there, and use Fileformat.Info character search to check out the name and other properties of the character, on the basis of its U+... number.)
The LEFT-TO-RIGHT MARK sets the writing direction of directionally neutral characters. It does not affect e.g. English or Arabic words, but it may mess up text that contains parentheses for example – though for text in English, there should be no confusion in this sense.
But, of course, when text is processed programmatically, as when a web server processes a .htaccess file, they are character data and make a big difference.

How to translate Unicode to and from matlab?

I have written matlab programs that produce plots and tables for chemical substances. I get my input mostly from excel tables and a local MySql database. My problem is quite a few substance names contain greek letters.
My problem is I want to create plots that use exactly the names specified by my collegues. And also create tables that show the correct symbol.
An example:
If I create an excel file containing: "α-Methylstyrol" in the first cell and read it with [~,~,tmp] = xlsread('test.xlsx'). tmp will contain '(box with question mark)-Methylstyrol'. If I use the string in a plot (title(tmp)) it will be shown as: '(right arrow)-Methylstyrol'
So far I tried the native2unicode and unicode2native commands on the string but there is no effect. Also I tried replacing the characters but the number of characters I need to replace is growing way too fast for me - so I'm really hoping there would be a more systematic way.
(We know there are also names that wouldn't contain greek letters - but we try to adhere to some guidelines which prefer these names.)
As far as I understand, Matlab does not support unicode nicely. However, it is possible to type greek letters in image titles using LaTex syntax.
title('\alpha-Methanol')
Even though it is not the nicest solution, I think it should be possible to replace unicode symbols with LaTex keywords.
I think, your problem is, that xlsread is not even getting the correct greek letter out of your sheet.
Just give jexcelapi or poi a try. Both links lead to java classes for importing xls-files. In MATLAB you only need to add the jar-file to you path via javaaddpath and the next steps are like basic java coding.

Resources