Why we use hashmap when only one key can have one value? - hashmap

What is the use of hashmap when we can associate only one value to one key. We can directly search for that value insted of searching for the key..? am i right..? If not than please explain.
Key1---2->5->8->2;
Key2---->15->14;
Key3---45->15->10;
If it will be like this we can search values using key with less no. of iterations.

It comes in handy when you know the key of the value you need. If this is the case, the search time is constant (doesn't vary with the size of the array). Yes, you can search the array, but you'd have to iterate it, which leads to a linear delay (the greater the array, the longer it would take to find the needed value)

Related

Xpages: get count of values in multi-value field in view

I have 100 documents with a multi-value field. The field holds 5 possible values (Albert,Ben,Chris,Don,Ed) let's say. The field must contain 1 value, but can contain up to 5.
I need to compute the number of docs that contain each value, so
Albert 56
Ben 22
Chris 79
etc.
This seemed easy. I constructed a view that contains the docs, the first column is the field, and I selected show multiple documents for multiple feeds.
In SSJS loop through my master list of values in the field, and for each one do a getDocumentByKey.
myArray = applicationScope.application;
var dc:NotesDocumentCollection;
for (index = 0; index < myArray.length; ++index) {
dc = view1.getAllDocumentsByKey(myArray[index]);
Print(dc.getCount())
}
This gets the first value correctly, but none after. If I just hard code a particular value, it works. But when I call the getAllDocumentsByKey a second time, it doesn't return the right value.
I think this would work fine in LS, but in SSJS I must clear or recycle or rest something, but I don't know what.
Bryan,
Two things to try in this order:
Your first line of myArray = applicationScope.application; doesn't look right. I suspect that you are not actually getting an array here. I think you are just getting the first value from your applicationScope. Add a print statement on the next line of print(myArray.length); If it is always equal to one, that is your problem. You are not using var and you should set the variable to some java type using a colon.
Before the end of your for loop, try setting your collection to null. dc = null; This way you know for sure that you are getting a new collection in the next iteration.
Are any of your multi-value field values ambiguous? getAllDocumentsByKey(Vector) does partial matches. Unless you ever want that, I would recommend always using the second parameter and setting it to true, same always for getAllEntriesByKey().
An alternative, which will definitely perform better, would be to add a total column to the view. There's a performance hit of that on the view indexing, but you can then use a ViewNavigator with getColumnValues() and getNextSibling(). getCount() is extremely poor performing in LS and will almost certainly be as poorly performing in SSJS/Java. See this blog post from a few years ago http://www.intec.co.uk/why-you-shouldnt-count-on-a-notesviewnavigator/
If I understand that right: you want to count all values that are possible over an amount of documents to have a 5 value list with corresponsing 5 value counts, right?
You could loop through all docs, loop through all values and add entries to a HashMap with the value as key and an int as value (which should be increased everytime). Afterall you have a Map with 5 values holding the sums of each keys.
You will never get the right answer with getAllDocumentsByKey(). When document shows in one category, it will be missing from the collection of next category. That's the way it works.
Use ViewNavigator instead. Build it by category and simply count ViewEntries.

Need to understand fomula re: excel search for a string in a table and return string if true

I've adapted this solution from a couple of years ago:
=LOOKUP(2^15,FIND(Keywords,A2),Categories)
I use this for searching within a description field for keywords in a named list, in order to return a corresponding category from an adjacent named list.
However I do not understand the significance of 2^15. Can someone explain?
Also it's unclear in what order the search operates. If two keyword options were "check" and "deposit," and they were assigned to different categories, but both appeared in the same description field cell, how do I know which will be found first? Is it placement in the string, or order in the list?
2^15 is simply an arbitrarily large number, which lookup attempts to find - when it can't find it, it takes the next lowest number.
Effectively your formula looks at Keywords, and attempts to find the value in A2. For each word that actually matches A2, it provides a non-error message. Then out of the whole list, it attempts to find that line number in categories, resulting in many errors, and a single correct value. Lookup picks the value by using 2^15. Though this seems to be a weird way of doing it; it is likely a holdover of pre-2007, as Lookup is generally used now only for backwards compatibility purposes. Also using 1 instead of 2^15 worked for a couple of simple cases that I tried when writing this up.

Finding similar strings in large datasets

I'm using levenshtein distance to retrieve similar strings from a list. At the moment the list has just a few thousand items, but we'll need to support at least 100k items.
I'm trying to make this more efficient and one technique I came up with was to calculate the levenshtein distance only on strings that are of similar length. I though about also filtering on the initial character i.e. if the string to search starts with b then I'll run the calculation only on the strings that start with b. But I'm not sure if I could assume this to work all the time.
I was wondering if you all have a better way of getting this done?
Thanks
One way to go would be to hope that a match with small edit distance would have within it a short exact match. If you assume this, then, given the string ABCDEF, retrieve all strings containing ABC, BCD, CDE, or DEF, and compute their edit distances. You may even find that the best match among these is so close that any closer match must have a short match inside it, so you would have found it already. You would have to accept that if you are unlucky you may miss some good matches, or be forced to go through all the possibilities one by one.
As an alternative to building a database of substrings, you could build a http://en.wikipedia.org/wiki/Suffix_array and LCP array from a string obtained by concatenating all the stored strings, separating them with a marker character not otherwise used. This takes time and space linear in the input size. You would then search for exact matches by looking for strings in the suffix array starting ABCDEF, BCDEF, CDEF, and DEF.

Array Algorithms

I have one question to ask on Algorithms. I have been asked to write the algorithm on this: Not asking you to write the algo for me, but just let me know the efficient process what I need to do:
There is an array of n elements like the book or contents of Bible, and Suppose you have inserted a input string "Gaurav Agarwal" in that. What you want to do you need to fetch unique elements that are present in the array for that String. Just an algorithm how you will proceed further (unsorted)
If you did not understand then let me know and I will try to help on this.
One good way to find duplicates in an unsorted array is to sort it based on the string elements, therefore the algorithm for your homework question would be:
Sort the array
check your array for existence of "Gaurav Agarwal". Since it is sorted, neighboring elements would be the same string, and what you need to do then is to keep a counter and increment it until you find the first array element that is not equal to the string you're looking for
it will take some time to sort the string array and then to parse it. I would recommend just to parse the array of string and verify if length of your string is the same as the length of the string from the current position of the array. If the length is the same, compare the 2 strings
I dont think sorting and searching is the most efficient solution to your problem.
Sorting itself has nlogn complexity.
Just doing a bruteforce search of array is more efficient(has a complexity of n)
This is the case if you are finding unique elements for one string or a few strings.
If you are trying to find unique elements for a lot of input strings instead of one only then sorting makes sense.
I would proceed in the following steps:
I would use a hash table with chaining, using a hash function that
works well for strings.
find the hash of the new string and search the linked list of the
slot corresponding that hash for duplicates.

Lucene number extracting

I have this number extracting problem.
I want to get all matches that don't have a certain number in it
ex : 125501874, 125001873
Every number that as 55 at the position 2 are not to be considered.
The first numbers range is 0 to 9 and the second is 1-9 so the real range is [01-99]
(we cannot have 00 as the first two number)
With Lucene I wanted to add NOT field:[01-99]55*
But it doesn't seem to work. Is there an easy way to find ??55* and disregard it in a Search("NOT field:[01-99]55*")?
Thank you Lucene guru
Lucene can do this very efficiently if one creates an "index-only" field with only the third and fourth digits in it. The complete value can be "stored" (or stored and indexed if other queries use the whole number) in the original field.
Update: A followup comment asked, "Is [there] a way to create a temporary index on only the second digit?"
Using a ParallelReader "vertically partitions" the fields of an index. One partition could hold the current index, with its fields, while the other is a temporary index with the new field, possibly stored in a RAMDirectory.
Assuming the number is "stored" in the original index, iterate over each document in the original index, retrieve the stored field, parse out the key digits, and add a Document to the temporary index with the new field. As the ParallelReader documentation states, it is imperative that the document numbers match in both indexes.
Thank you erickson, Your solution is probably the best, using ParallelReader if only I could use temporary indexes, cause we cache the search query, we will need those later.
But like you said before, better start with an index on the relevant digits straighaway.
I have another solution.
NOT field:0?55*
NOT field:1?55*
...
NOT field:9?55*
It is efficient enough for the search I'm doing and it bypass the first character wildcard limitation. I wouldn't use that if their where more digits to check or if they where farther from the start.
Now I'm testing this on a million of row and it's pretty efficient for our needs.

Resources