How can I spell check words ignoring optional hyphens? - browser

I am using a RTE (TinyMce) for text creation in a browser. The RTE is inserting non-visible optional hyphens (code: '­' or '\u00ad') into words of a minimum length for hyphenation as the user writes.
The problem i got now is to find a spellchecker that is able to check my words even though they consist not only of letters but additionally of optional hyphens. I did only find spell checkers that i.e. checking the word "goldfish" looked at it as two words cause of the optional hyphen between "gold" and "fish".
Is there a spellchecker that is able to be modified in any way to ignore them? Is there a way to configure a spellchecker to ignore such soft-hyphens? (The solution does not need to be open-source.)
Adding words to a dictionary is not an option.
The solution should work for Safari or FireFox.

I found a solution to my problem, so i'll answer my own question.
Hunspell seems to ignore those soft-hyphens and works with FireFox (plugin).

Related

Ellipsis ignored in search engines

I noticed that ellipsis '...' is ignored in all search engines I tested, including in stackoverflow search engine, when trying to search literature to address this POST, is there any way to avoid this?
Sadly not directly possible, most search engines ignore all punctuation. Google has made an effort to allow for certain types to be searched like "," or ".", but all of these are special cases and don't follow a general rule.
What you can do however is to search for "sizeof ellipsis" which brings the results of thoughtful writers.
Sidenote: "..." in particular is kind of special in common texts, there are lots of different ways to use it in printing, with differences in spacing, bracing, vertical position etc.. Also there is the possibility of using the single character HTML … which results in … as does Unicode U+2026.

Delimiting quick-open path with fullstops in Sublime Text 3?

I'm making the move to ST3, and I'm having some trouble. I'd like to be able to delimit the quick-open filepath (⌘ + T) with periods instead of slashes or spaces. However, I can't find the setting to do that.
For example:
component.biz_site_promotions.presentation
should be able to open the file that
component biz_site_promotions presentation
would.
Any help would be greatly appreciated!
There is no setting in Sublime that changes the way this works; the search term is always used to directly match the text in the list items (except for space characters).
Note however that the Goto Anything panel uses fuzzy matching on the text that you're entering, so in many cases trying to enter an entire file name is more time consuming anyway.
As an example, to find the file you're mentioning, you could try entering the text cbspp, which in this case is the first letters of all of the parts of the file name in question.
As you add to the search term, the file list immediately filters down to text that matches what you entered; first only filenames that contain a C, then only filenames that contain a C that is followed somewhere after by a B, and so on.
Depending on the complexity and number of files that you have in your project, you may need to add in a few extra characters to dial in better (e.g. comb_s_pp). Usually this search method will either end you up at the exact file you want, or filter the list so much that the file that you want will be easier to find and select.
Additionally, when you select an item and there was more than one possible match, Sublime remembers which item you selected for that particular search term and brings it to the top of the search results next time you do it, under the assumption that you want the same thing again.
As you use Sublime more (and with different projects) you will quickly get a handle on what partial search terms work the best for you.
In addition to finding files, you can do other things with that panel as well, such as jumping to a specific line and/or column or searching inside the file for a search term and jumping directly to it. This applies not only to the current file but also the one that you're about to open.
For more complete details, there is a page in the Unofficial Documentation that covers File Navigation with Goto Anything
As an extra aside, starting with Sublime Text build 3154, the fuzzy searching algorithm handles spaces differently than previous builds.
Historically, spaces in the search term are essentially ignored and the entire input is treated as one search term to be matched character by character.
Starting in build 3154, spaces are handled by splitting up a single search term into multiple search terms, which are applied one after the other.
This allows multiple search terms to hit out of order. For example, index doc in build 3154 will find doc/index.html, but it won't find it in previous versions because the terms aren't in the right order.
As such, assuming you're not currently using such a build (as of right now it's a development build, so only licensed users have access to it), moving forward if you continue to search the way you're searching in your question, you might start getting more results than you expected.

How to remove a word from Aspell's British dictionary

When I check my texts with aspell (with the British dictionary), the word "froward" is accepted (because it is a real English word). However I never use it, so in my texts "froward" is always a misspelling of "forward". Therefore I want aspell to reject "froward".
How can I remove a word from Aspell's standard dictionary? Is there a way to create a "blacklist" of words? There is no way to mark it in .aspell.en.pws, because the personal dictionary only contains a "whitelist".
You can't.
Aspell does not support it.
Submit an issue or a pull request on the official repo if you care.

Lucene, Change Search on one file

Question about Lucene,
I have a file that I would like to index and search by different analyzers. My goal is to be able to change how I search.
In one case I would like to search exact phrase with punctuation IE. for "one,two" and only return exact matchings w/ punctuation.
I would also like to be able to search the exact phrase without punctuation. IE. for "one two." As in the StandardAnalyzer
Essentially I need to change the search functionality on one field.
How can I change the search on the same file. Ive tried using two analyzers (standard and whitespace) however this makes the indexing time very long.
My second thought is to use just a WhitespaceAnalyzer and when searching pass a query that further tokenizes each string if needed? However I am not sure which API has this if any do.
Also is there a good reading on how analyzers and tokens work and are implemented.
Thanks
What do you mean you tried two analyzers? Duplicate the content to 2 seperate fields with different analyzers? That would be my suggestion.

What kind of sign is "‎" and what is it used for

What kind of sign is "‎" and what is it used for (note there is a invisible sign there)?
I have searched through all my documents and found a lot of them. They messed upp my htaccess file. I think I got them when I copied webadresses from google to redirect. So maybe a warning searching through your documents for this one also :)
It is U+200E LEFT-TO-RIGHT MARK. (A quick way to check out such things is to copy a string containing the character and paste it in the writing area in my Full Unicode input utility, then click on the “Show U+” button there, and use Fileformat.Info character search to check out the name and other properties of the character, on the basis of its U+... number.)
The LEFT-TO-RIGHT MARK sets the writing direction of directionally neutral characters. It does not affect e.g. English or Arabic words, but it may mess up text that contains parentheses for example – though for text in English, there should be no confusion in this sense.
But, of course, when text is processed programmatically, as when a web server processes a .htaccess file, they are character data and make a big difference.

Resources