Why are there no single letter TLDs or TLDs with numbers? - dns

Why is it that there are no single letter TLDs or TLDs containing numbers? (excluding xn-- TLDs)
As far as I know, there are no restrictions against TLDs that either have 1 letter or contain numbers. There are even TLDs such as .one and .seven that could be replaced with .1 or .7.

Related

Check if string contains consecutive repeated substring

I got an interview problem which asks to determine whether or not a given string contains substring repeated right after it. For example:
ATAYTAYUV contains TAY after TAY
AABCD contains A after A
ABCAB contains two AB, but they are not consecutive, so the answer is negative
My idea was to look at the first letter, find its second occurrence then check letter by letter if the letters after the first occurrence match the letters after the second occurrence. If they all do, the answer is positive. If not, once I get a mismatch, I can repeat the process but starting with the last letter I checked, since I would not be able to get a repeated sequence up to that point.
I am not sure if the approach is correct or if it is the mos efficient.
Assume that you are looking for a repeating pattern of length 3. If you write the string shifted right by three positions in front of itself (and trimmed), you can detect runs of 3 identical characters.
ATAYTAYUV
ATAYTA
Repeat this for all lengths up to N/2.

Finding all words in a paragraph whose first three letters are the same?

How can we solve this problem in a best way? Is there any algorithm for solving this?
"In a paragraph we have to find and print all the words which have starting 3 letters same. Example: we input some paragraph and as a output we get letters like-
a) 1. you 2. your 3. yours 4. yourself
b) 1. early 2. earlier 3. earliest
Like this we get all the words of paragraph which have starting 3 letters common"
A reasonable solution that's not too hard to code up is to maintain a map of some sort where the keys are the first three letters of each word and the values are the sets of words that start with those three letters. You can scan across the words in the paragraph and, for each one you encounter, trim off the first three words, look up the map entry corresponding to those letters, and add in that word to the list. You can then iterate over the map at the end, find all sets containing at least two words, then print out each cluster you find.
Overall, the runtime of this approach is O(L), where L is the total length of all the words in the paragraph. To see this, notice that for each word, we do a map lookup on a constant-sized prefix of that word, then copy all the characters of the word into the map. Overall, this visits each character at most a constant number of times.
Trie with the first three characters and then the word index as the leaf should do the trick.

determining soundex conversion

when converting the name 'Lukasieicz' to soundex (LETTER,DIGIT,DIGIT,DIGIT,DIGIT), I come up with L2222.
However, I am being told by my lecture slides that the actual answer is supposed to be L2220.
Please explain why my answer is incorrect, or if the lecture answer was just a typo or something.
my steps:
Lukasieicz
remove and keep L
ukasieicz
Remove contiguous duplicate characters
ukasieicz
remove A,E,H,I,O,U,W,Y
KSCZ
convert up to first four remaining letters to soundex (as described in lecture directions)
2222
append beginning letter
L2222
If this is American Soundex as defined by the National Archives you're both wrong. American Soundex contains one letter and three numbers, you can't have L2222 nor L2220. It's L222.
But let's say they added another number for some reason.
The basic substitution gives L2222. But you're supposed to collapse adjacent letters with the same numbers (step 3 below) and then pad with zeros if necessary (step 4).
If two or more letters with the same number are adjacent in the original name (before step 1), only retain the first letter; also two letters with the same number separated by 'h' or 'w' are coded as a single number, whereas such letters separated by a vowel are coded twice. This rule also applies to the first letter.
If you have too few letters in your word that you can't assign [four] numbers, append with zeros until there are [four] numbers. If you have more than [4] letters, just retain the first [4] numbers.
Lukasieicz # the original word
L_2_2___22 # replace with numbers, leave the gaps in
L_2_2___2 # apply step 3 and squeeze adjacent numbers
L2220 # apply step 4 and pad to four numbers
We can check how conventional (ie. three number) soundex implementations behave with the shorter Lukacz which becomes L_2_22. Following rules 3 and 4, it should be L220.
The National Archives recommends an online Soundex calculator which produces L220. So does PostgreSQL and Text::Soundex in both its original flavor and NARA implementations.
$ perl -wle 'use Text::Soundex; print soundex("Lukacz"); print soundex_nara("Lukacz")'
L220
L220
MySQL, predictably, is doing its own thing and returns L200.
This function implements the original Soundex algorithm, not the more popular enhanced version (also described by D. Knuth). The difference is that original version discards vowels first and duplicates second, whereas the enhanced version discards duplicates first and vowels second.
In conclusion, you forgot the squeeze step.

Use of Excel text parsing functions to extract from a string with complex format

I have a list of items, with a sample as such:
(CompanyName){space}(PartNumber ending with -){space}(Revision Level).pdf
Company 100-50006- Rev. A.pdf
Company Two 6001241- Rev. CN.pdf
CompanyThree 109581- Rev. B.pdf
My goal is to get three unique pieces of information using Excel: Company Name, Part Number, Revision.
The revision is easy to capture. I am trying to find a way to capture the Company (segregating from the first appearance of any Numeric value). I am also trying to find a way to capture the whole part number.
What function can I use to locate the first numeric character, and do a LEFT(A2,LEN(FUNCTION HERE)-1) where the -1 is due to the spacing?
Similarly, I want to do something to find MID(A2,LEN(FUNCTIONHERE TO FIND BEGINNING NUMERIC), LEN(FUNCTIONHERE TO FIND SPACE OR "REV" AND SEGREGATE AFTER SUCH).
Okay, I don't know if there might be more spaces in the company name, but for the sample you provided, the below formulae work:
=IF(ISERROR(FIND("-",LEFT(A2,FIND(" ",A2,9)))),LEFT(A2,FIND(" ",A2,9)),LEFT(A2,FIND(" ",A2,8)))
=IF(ISERROR(FIND("-",LEFT(A2,FIND(" ",A2,9)))),MID(A2,FIND(" ",A2,9)+1,FIND(" Rev.",A2)-FIND(" ",A2,9)-1),MID(A2,FIND(" ",A2,8)+1,FIND(" Rev.",A2)-FIND(" ",A2,8)-1))
It's a bit long though ^^;
It will work for Company Two. Since T is the 9th index in the string, the default formula will look for the next space, which is inside the revision, and also grab a -, which I'm using in the condition. If there is a -, it means that there is a single space in the company name, and thus, reset the search for space from the 8th index.
And MID just works on the same principle, with +1 and -1 to remove the extra spaces.
Note: It won't work if there are more than two spaces in the company name, e.g. Company the first or names having spaces after the 9th character e.g. Companies Twenty.
This may be much easier with the help of even Word's (primitive) regex. Load into Word, Replace All with Use wildcards ticked: first ( [0-9]) with ^t\1 then (- ) with \1^t and load back into Excel. (Copes with the otherwise tricky issue of the number of spaces in a company name).

Source for word weights?

I am building a very basic result ranking algorithm, and one thing I'd like is a way to determine which words are generally more important in a given phrase. It doesn't have to be exact, just general.
Obviously dropping any word under 4 letters, identifying names. But what other ways can I pick out the 3 most significant words in a sentence?
In the absence of any other information, it is fair to assume that important words are rare words. Count how many times each word appears in your set of documents. The words with the lowest counts are more important, while the words with the highest counts are less important (if not nearly useless).
Related reading:
http://en.wikipedia.org/wiki/Stop_words
http://en.wikipedia.org/wiki/Googlewhack
http://en.wikipedia.org/wiki/Statistically_Improbable_Phrases

Resources