Array Algorithms - array-algorithms

I have one question to ask on Algorithms. I have been asked to write the algorithm on this: Not asking you to write the algo for me, but just let me know the efficient process what I need to do:
There is an array of n elements like the book or contents of Bible, and Suppose you have inserted a input string "Gaurav Agarwal" in that. What you want to do you need to fetch unique elements that are present in the array for that String. Just an algorithm how you will proceed further (unsorted)
If you did not understand then let me know and I will try to help on this.

One good way to find duplicates in an unsorted array is to sort it based on the string elements, therefore the algorithm for your homework question would be:
Sort the array
check your array for existence of "Gaurav Agarwal". Since it is sorted, neighboring elements would be the same string, and what you need to do then is to keep a counter and increment it until you find the first array element that is not equal to the string you're looking for

it will take some time to sort the string array and then to parse it. I would recommend just to parse the array of string and verify if length of your string is the same as the length of the string from the current position of the array. If the length is the same, compare the 2 strings

I dont think sorting and searching is the most efficient solution to your problem.
Sorting itself has nlogn complexity.
Just doing a bruteforce search of array is more efficient(has a complexity of n)
This is the case if you are finding unique elements for one string or a few strings.
If you are trying to find unique elements for a lot of input strings instead of one only then sorting makes sense.

I would proceed in the following steps:
I would use a hash table with chaining, using a hash function that
works well for strings.
find the hash of the new string and search the linked list of the
slot corresponding that hash for duplicates.


Best way to find if there is a one-typo word from list of given words

How would you efficiently solve this problem
Suppose we were given a list of words [“apple”, “banana”, “mango”]
If we are given a word in the list that is one typo away,
We output true
If there is more than one typo, we output false.
For optimizations, I’ve tried storing the list in a hashtable containing the number of letters of each word and looking for the same number of letters upon the given input to reduce the size in which we look for our input. Is there a faster optimization we can make to this problem?
One possible optimisation would be to generate all one-typo words for the given list and put them in a map (or some better string lookup structure). Then lookup the given words - if found output true, else false. The total number of one-typo words is: 25*L, where L is the total number of letters in the input list (assuming case does not matter).

Similar String Comparison Algorithm

Got this question in a recent interview. Basic String compare with a little twist. I have an input String, STR1 = 'ABC'. I should return "Same/Similar" when the string to compare, STR2 has anyone of these values - 'ACB' 'BAC' 'ABC' 'BCA' 'CAB' 'CBA' (That is same characters, same length and same no of occurrences). The only answer struck at that moment was to proceed with 'Merge sort' or 'Quick Sort' since it's complexity is logarithmic. Is there any other better algorithm to achieve the above result?
Sorting both, and comparing the results for equality, is not a bad approach for strings of reasonable lengths.
Another approach is to use a map/dictionary/object (depending on language) from character to number-of-occurrences. You then iterate over the first string, incrementing the counts, and iterate over the second string, decrementing them. You can return false as soon as you get a negative number.
And if your set of possible characters is small enough to be considered constant, you can use an array as the "map", resulting in O(n) worst-case complexity.
Supposing you can use any language, I would opt for a python 'dictionary' solution. You could use 2 dictionaries having as keys each string's characters. Then you can compare the dictionaries and return the respective result. This actually works for strings with characters that appear more than once.

Finding similar strings in large datasets

I'm using levenshtein distance to retrieve similar strings from a list. At the moment the list has just a few thousand items, but we'll need to support at least 100k items.
I'm trying to make this more efficient and one technique I came up with was to calculate the levenshtein distance only on strings that are of similar length. I though about also filtering on the initial character i.e. if the string to search starts with b then I'll run the calculation only on the strings that start with b. But I'm not sure if I could assume this to work all the time.
I was wondering if you all have a better way of getting this done?
One way to go would be to hope that a match with small edit distance would have within it a short exact match. If you assume this, then, given the string ABCDEF, retrieve all strings containing ABC, BCD, CDE, or DEF, and compute their edit distances. You may even find that the best match among these is so close that any closer match must have a short match inside it, so you would have found it already. You would have to accept that if you are unlucky you may miss some good matches, or be forced to go through all the possibilities one by one.
As an alternative to building a database of substrings, you could build a and LCP array from a string obtained by concatenating all the stored strings, separating them with a marker character not otherwise used. This takes time and space linear in the input size. You would then search for exact matches by looking for strings in the suffix array starting ABCDEF, BCDEF, CDEF, and DEF.

Search String in Cell Efficient Way

It's my first post here, so please bear with me :-).
Problem Background:
I've multiple text files of the form:
A,20120904 0926,37.14,37.14,37.14,37.14,693
ZZ,20120904 1602,1.6,1.6,1.6,1.6,11771
As you might have guessed it's stock ticks. When I load it to matlab, it creates a structure with an array (of the numerical values) and a cell (for the strings) which is fine at this point as I can work with it.
I'd like to find the most efficient way to search the array for a specific symbol (~70K lines). While it's easy to do a naive or halving searches, I don't think these approaches are very useful for multiple files and/or multiple searches to extract the beginning and end indices of a given symbol/string.
I've looked into past posts here and read about Rabin-Karp, Bitap and hash tables, but I'm not sure any of them fully answers my needs.
So far, I've leaning towards running through the cell once and creating a hash table for each letter (i.e. 'A', 'B', etc) and then running a naive search or anything else you might suggest :-). The reason for hashing is that I might use the same file to look up different stock symbols, so I think running through it once and labeling letters will reduce the complexity in the long run.
What are your thoughts on the matter? Am I in the right direction?
I'm using matlab btw.
Thank you
You can store all your tickers in a struct array. Each column being a property. Assuming you have non-empty values, you can do the following,
tickers = [S.tickers];
dates = [];
You can easily do queries to get the index you want from your struct array S. You can go further and index tickers by ticker name, by creating an index with ticker name as keys.

Comparing strings in MIPS assembly

I have a bunch of strings in an array that I have defined in the data segment. If I were to take 2 of the strings from the array, is it possible to compare them to see which has a greater value in mips? How would I do this? Basically, I'm looking to rearrange the strings based on alphabetical order.
EDIT: This is less of me trying to get help with a specific problem, and more of just a general question that will help me with my approach to the code. Thanks!
If it were me, I'd create a list of pointers to the strings. That is, a list of the addresses of each string. Then you'd write a subroutine the compares two strings given their pointers. Then, when you need to swap the strings, you simply swap the actual pointers.
You want to avoid swapping the strings themselves, since they may well be tightly packed, thus you'd have to do a lot of shifting to move the holes of memory around. Pointers are simple to swap. You could swap strings more easily if they were all of a fixed length (or less), then you wouldn't have to worry about moving the memory holes around.
But sorting the pointer list is really the hot tip.
To compare strings, the simplest way is to iterate over each character of each string, and subtract them from each other. If the result is 0, they're equal. If not, then if the result is > 0, then the first string is before the other string, otherwise the second string is lower and you would swap them. If you run out of either string before the other, and they're equal all the way to that point, the shorter string is less than the longer one.
