What is the exact meaning of lexicographical order? How it is different from alphabetical order?
lexicographical order is alphabetical order. The other type is numerical ordering. Consider the following values,
1, 10, 2
Those values are in lexicographical order. 10 comes after 2 in numerical order, but 10 comes before 2 in "alphabetical" order.
Alphabetical order is a specific kind of lexicographical ordering. The term lexicographical often refers to the mathematical rules or sorting. These include, for example, proving logically that sorting is possible. Read more about lexicographical order on wikipedia
Alphabetical ordering includes variants that differ in how to handle spaces, uppercase characters, numerals, and punctuation. Purists believe that allowing characters other than a-z makes the sort not "alphabetic" and therefore it must fall in to the larger class of "lexicographic". Again, wikipedia has additional details.
In computer programming, a related question is dictionary order or ascii code order. In dictionary order, the uppercase "A" sorts adjacent to lowercase "a". However, in many computer languages, the default string compare will use ascii codes. With ascii, all uppercase letters come before any lowercase letters, which means that that "Z" will sort before "a". This is sometimes called ASCIIbetical order.
This simply means "dictionary order", i.e., the way in which words are ordered in a dictionary. If you were to determine which one of the two words would come before the other in a dictionary, you would compare the words letter by the letter starting from the first position. For example, the word "children" will appear before (and can be considered smaller) than the word "chill" because the first four letters of the two words are the same but the letter at the fifth position in "children" (i.e. d ) comes before (or is smaller than) the letter at the fifth position in "chill" (i.e. l ). Observe that lengthwise, the word "children" is bigger than "chill" but length is not the criteria here. For the same reason, an array containing 12345 will appear
before an array containing 1235. (Deshmukh, OCP Java SE 11 Programmer I 1Z0815 Study guide 2019)
Lexicographical ordering means dictionary order.
For ex: In dictionary 'ado' comes after 'adieu' because 'o' comes after 'i' in English alphabetic system.
This ordering is not based on length of the string, but on the occurrence of the smallest letter first.
I want to add an answer that is more related to the programming side of the term rather than the mathematical side of it.
Lexicographical order is not always an equivalent of "dictionary order", at least this definition is not complete in the realm of programming, rather, it refers to "an ordering based on multiple criteria".
For example, almost in all famous programming languages, there are standard tools for sorting collections of objects, now what if you want to sort a collection based on more than one thing? For instance, let's say you want to sort some items based on their prices first AND then based on their popularity. This is an example of Lexicographical Order.
For example in Java (8+), you could do something like this:
// sorts items from the cheapest AND the most popular ones
// towards the most expensive AND the least popular ones.
Collections.sort(items,
Comparator.comparing(Item::price)
.thenComparing(Item::popularity)
.reversed()
);
And the Java documentation uses this term too, to refer to such type of ordering when explaining the "thenComapring()" method:
Returns a lexicographic-order comparator with another comparator.
Lexicographical order is nothing but the dictionary order or preferably the order in which words appear in the dictonary. For example, let's take three strings, "short", "shorthand" and "small". In the dictionary, "short" comes before "shorthand" and "shorthand" comes before "small". This is lexicographical order.
Related
Suppose I have a string as onehhhtwominusthreehhkkseveneightjnine
Now I want to parse this string to get the numbers out of it. For Example this string should return an array, [one,two,minusthree,seven,eight,nine].
The order of the Integers should be maintained.
Can anyone Please suggest an optimal way to do this parsing? Thanks.
(You haven't mentioned a programming language?)
I would probably search for "minus" and check the number(s) that follow it. Then search for "one", then "two", noting their indexes. This would provide enough information to map and output the results, and order, that you need.
Another option is to look at each character in order, comparing each to the 10 choices. I couldn't tell you which is the most efficient - I think it depends on the possible total string length. I'd probably write both and profile them.
If the string to search is not of inordinate length then I suspect that the second approach might be more efficient. This is because, as soon as you have a match, you can eliminate searching the following (known) length of characters.
That is, if you have "abceightd", once you discover the "e" and its "eight" you can skip four characters. You can also skip the a, b, and c anyway, as they are not the beginning character for any of the 10 choices.
I am assuming your choices are:
one, two, three, four, five, six, seven, eight, nine, minus
Assuming that a) you have access to regular expressions in your choice of programming language and b) your possible choices are as Andy G has assumed... then this regular expression can pick out the numbers grouped with their associated minus, if present:
/((?:minus)*(?:one|two|three|four|five|six|seven|eight|nine))/g
Applied to your example string using JavaScript's RegEx.exec(), for example, this extracts:
one
two
minusthree
seven
eight
nine
You could easily place a space after any minus matched if required. Does this help at all?
I have a huge list of strings (city-names) and I want to find the name of a city even if the user makes a typo.
Example
User types "chcago" and the system finds "Chicago"
Of course I could calculate the Levenshtein distance of the query for all strings in the list but that would be horribly slow.
Is there any efficient way to perform this kind of string-matching?
I think the basic idea is to use Levenshtein distance, but on a subset of the names. One approach that works if the names are long enough is to use n-grams. You can store n-grams and then use more efficient techniques to say that at least x n-grams need to match. Alas, your example misspelling has 2-matching 3-grams with Chicago out of 5 (unless you count partials at the beginning and end).
For shorter names, another approach is to store the letters in each name. So, "Chicago" would turn into 6 "tuples": "c", "h", "i", "a", "g", "o". You would do the same for the name entered and then require that 4 or 5 match. This is a fairly simple match operation, so it can go quite fast.
Then, on this reduced set, apply Levenshtein distance to determine what the closest match is.
You're asking to determine Levenshtein without using Levenshtein.
You would have to determine how far the words could be deviated before it could be identified, and see if it would be acceptable to apply this less accurate algorithm. For instance, you could lookup commonly switched typed letters and limit it to that. Or apply the first/last letter rule from this paper. You could also assume the first few letters are correct and look up the cities in a sorted list and if you don't find it, apply Levenshtein to the n-1 and n+1 words where n is the location of the last lookup (or some variant of it).
There are several ideas, but I don't think there is a single best solution for what you are asking, without more assumptions.
Efficient way to search for fuzzy matches on a text string based on a Levenshtein distance (or any other metric that obeys the triangle inequality) is Levenshtein automaton. It's implemented in a Lucene project (Java) and particulary in a Lucene.net project (C#). This method works fast, but is very complex to implement
I am building a very basic result ranking algorithm, and one thing I'd like is a way to determine which words are generally more important in a given phrase. It doesn't have to be exact, just general.
Obviously dropping any word under 4 letters, identifying names. But what other ways can I pick out the 3 most significant words in a sentence?
In the absence of any other information, it is fair to assume that important words are rare words. Count how many times each word appears in your set of documents. The words with the lowest counts are more important, while the words with the highest counts are less important (if not nearly useless).
Related reading:
http://en.wikipedia.org/wiki/Stop_words
http://en.wikipedia.org/wiki/Googlewhack
http://en.wikipedia.org/wiki/Statistically_Improbable_Phrases
I am on an interview ride here. One more interview question I had difficulties with.
“A rose is a rose is a rose” Write an
algorithm that prints the number of
times a character/word occurs. E.g.
A – 3 Rose – 3 Is – 2
Also ensure that when you are printing
the results, they are in order of
what was present in the original
sentence. All this in order n.
I did get solution to count number of occurrences of each word in sentence in the order as present in the original sentence. I used Dictionary<string,int> to do it. However I did not understand what is meant by order of n. That is something I need an explanation from you guys.
There are 26 characters, So you can use counting sort to sort them, in your counting sort you can have an index which determines when specific character visited first time to save order of occurrence. [They can be sorted by their count and their occurrence with sort like radix sort].
Edit: by words first thing every one can think about it, is using Hash table and insert words in hash, and in this way count them, and They can be sorted in O(n), because all numbers are within 1..n steel you can sort them by counting sort in O(n), also for their occurrence you can traverse string and change position of same values.
Order of n means you traverse the string only once or some lesser multiple of n ,where n is number of characters in the string.
So your solution to store the String and number of its occurences is O(n) , order of n, as you loop through the complete string only once.
However it uses extra space in form of the list you created.
Order N refers to the Big O computational complexity analysis where you get a good upper bound on algorithms. It is a theory we cover early in a Data Structures class, so we can torment, I mean help the student gain facility with it as we traverse in a balanced way, heaps of different trees of knowledge, all different. In your case they want your algorithm to grow in compute time proportional to the size of the text as it grows.
It's a reference to Big O notation. Basically the interviewer means that you have to complete the task with an O(N) algorithm.
"Order n" is referring to Big O notation. Big O is a way for mathematicians and computer scientists to describe the behavior of a function. When someone specifies searching a string "in order n", that means that the time it takes for the function to execute grows linearly as the length of that string increases. In other words, if you plotted time of execution vs length of input, you would see a straight line.
Saying that your function must be of Order n does not mean that your function must equal O(n), a function with a Big O less than O(n) would also be considered acceptable. In your problems case, this would not be possible (because in order to count a letter, you must "touch" that letter, thus there must be some operation dependent on the input size).
One possible method is to traverse the string linearly. Then create a hash and list. The idea is to use the word as the hash key and increment the value for each occurance. If the value is non-existent in the hash, add the word to the end of the list. After traversing the string, go through the list in order using the hash values as the count.
The order of the algorithm is O(n). The hash lookup and list add operations are O(1) (or very close to it).
so i have a list of string
{test,testertest,testing,tester,testingtest}
I want to sort it in descending order .. how do u sort strings in general ? Is it based on the length or is it character by character ??
how would it be in the example above ?? I want to sort them in a descending way.
No matter what language you’re in, there’s a built-in sort function that performs a lexicographical order, which returns
['test','tester','testertest','testing','testingtest']
for your example. If I wanted this reversed, I would just say reversed(sorted(myList)) in Python and be done with it. If you look to your right you can see plenty of related questions that require a more specialized ordering method (for numbers, dates, etc.), but lexicographic order works on strings containing any kind of data.
Here’s how it works:
compare(string A, string B):
if A and B are both non-empty:
if A[0] == B[0]:
// First letters are the same; compare by the rest
return compare(A[1:], B[1:])
else:
// Compare the first letters by Unicode code point
return compare(A[0], B[0])
else:
// They were equal till now; the shorter one shall be sorted first
return compare(length of A, length of B)
I would sort it like this:
testingtest
testing
testertest
tester
test
Assuming C#
string[] myStrings = {"test","testertest","testing","tester","testingtest"};
Array.Sort(myStrings);
Array.Reverse(myStrings);
foreach(string s in myStrings)
{
Console.WriteLine(s);
}
Not always an ideal way to do it - you could implement a custom comparer instead - but for the trivial example you asked about this is probably the most logical approach.
In computer science strings are usually sorted character by character, with the preferred sort order being (for a standard english character set):
Null characters first
Followed by whitepsace
Followed by symbols
Followed by numeric characters in obvious numerical order
Followed by alphabetic characters in obvious alphabetical order
When sorting characters generally lowercase characters come before uppercase characters.
So for example if we were to sort / compare:
test i ng
test e r
Then "tester" would come before "testing" - the first different character in the string is the 5th one, and "e" comes before "i".
Similarily if we were to compare:
test
testing
Then in this case "test" would come first - once again the strings are identical until the 5th character, where the string "test" ends (i.e. no character) which becomes before any alphanumerical character.
Note that this can produce some counter-intutive results when dealing with numbers - for example try sorting the strings "50" and "100" - you will find that "100" comes before "50". Why? because the strings differ at character 1 and "5" comes after "1".
In nearly all languages there is a function which will do all of the above for you!
You should use that function instead of trying to sort strings yourself! For example:
// C#
string[] myStrings = {"test","testertest","testing","tester","testingtest"};
Array.Sort(myStrings);
in Java you can use natural ordering with
java.util.Collections.sort(list);
the make it descending
java.util.Collections.reverse(list);
or create your own Comparator to do the reverse sorting.
When comparing two strings to see which sorts first, the comparison is typically done on a character by character basis. If the characters in the first position (e.g., t in your example) are identical, you move to the next character. When two characters differ, that "may" define which string is considered "greater".
However, depending on the locale used and a number of other factors, it is possible for later characters in the two strings being compared to override a difference in an earlier character. For example, in some collations, the diacritics on letters are considered to be of secondary weight. So a primary difference in a later character can override the secondary difference.
When two strings are otherwise identical but one is longer, the longer one is typically considered to be "greater". When sorting in descending order, the "greater" of two strings is sorted first.
Do you want to know if test should appear after tester in a descending order? Or are you particularly interested in sorting strings with similar prefixes?
If it's the later, I'd suggest a Trie if the input tends to grow big time.