DIfference between sorted and unsorted array time complexity - python-3.x

So, I'm very new to programming and computer science, and have been trying to understand the concept of time complexity while solving my code. However, when it comes to sorting arrays, I always get confused about a few things:
In my understanding solving a problem with a sorted array should come in the best case complexity, and unsorted array will have a worst case.
What I always get confused about is, how do we actually take advantage of an array being sorted in a problem that involves searching? Meaning, how will this reduce my time complexity, because i thought i will have to run the loop the same number of times.
For example, if i have an array and want to find two indices whose value add up to a specific target, will it make a difference in time complexity if the array is sorted or unsorted?
Thanks in advance for helping me out.

Let's take a look at your sample problem: find two numbers whose sum equals a given number.
Let's say you have an unsorted array: [2, 8, 1, 3, 6, 7, 5, 4], and the target is 11.
So you look at the first item, 2, and you know that you have to find the number 9 in the array, if it exists. With the unsorted array, you have to do a linear search to determine if 9 exists in the array.
But if you have a sorted array, [1, 2, 3, 4, 5, 6, 7, 8], you have an advantage. When you see the value 2, you know you need to find 9 in the array. But because the list is sorted, you can use binary search. Instead of having to look at every item in the list, you only have to look at 3 of them:
Binary search will look at the 4th item, then the 6th, then the 8th, and finally determine that 9 isn't in the array.
In short, searching in an unsorted array takes O(n) time: you potentially have to look at every item to find out if what you're looking for is there. A sorted array lets you speed up the search. Instead of having to examine every item, you only have to examine at most log2(n) items. That makes a huge difference when numbers get large. For example, if your list contains a million items, binary search only has to examine 20 of them. Sequential search has to look at all million.

The advantage of a sorted array for the "target sum" problem is even better: you don't search the array at all. Instead, you start with pointers at the two ends. If the sum is equal to the target, emit that and move both pointers in. If less than the target, increment the lower pointer. Otherwise, decrement the upper pointer. This will find all solutions in O(n) time -- after taking O(n log n) for the sort.
For the case given in the comments, [40, 60, 1, 200, 9, 83, 17] the process looks like this:
Sort array:
[1, 9, 17, 40, 60, 83, 200]
Start your pointers at the ends, 1 + 200
The sum is 201, too large, so decrement the right pointer.
Now looking at 1 + 83. This is too small; increment the left pointer.
Now looking at 9 + 83. This is too small; increment the left pointer.
Now looking at 17 + 83. This is the target; print (17, 83) as a solution
and move *both* pointers.
Now looking at 40 + 60. This is the target; print (40, 60) as a solution
and move *both* pointers.
The pointers have now met (and passed), so you're done.
This is a lovely example. In general, sorting the array gives you a variety of options for finding things in the array much faster than checking each element in turn. A simple binary search is O(log n), and there is a variety of ways to tune this for a particular application. At worst, a binary (log base 2) search will work nicely.
However, sorting an arbitrary list costs O(n log n) as overhead; you need to figure this one-time payment into your application's needs. For instance, if your array is some sort of data base sorted by a key value (such as a name or ID number), and you have to perform millions of searches from user requests, then it's almost certainly better to sort the data base somehow before you do any searches.
If you want a thorough introduction, research "sorting and searching". One excellent reference is a book by that title by Donald Knuth; it's Vol 2 of "The Art of Computer Programming".

Related

select sublists with items that have multiple occurances throughout list

I have a nested list of integers ranging from 1 to 5 (not really). I want to ensure that each integer occurs at least once in the list, and if one is missing to replace a sublist with a list that contains the missing integer. (I have a full set of possible sublists to choose from.) I'm having trouble working out the syntax for ensuring that the removed list contains integers that have muliple occurances so that I don't recreate the missing integer problem I'm attempting to solve. Here's an example:
a = [[2], [4], [1], [1, 2], [1,2,5]]
Notice 3 is missing. If I randomly choose the the second or fifth sublist for replacement then either the 4 or 5 will be missing. I need to choose the first, third or fourth sublist, where each of the sublist elements i has a list.count(i) > 1.
Therefore I want to create a new list of viable selection candidates. I believe the solution should look something like this
b = [item for item in a if sum(a.count(i)) > 1 for i in item]
but Python3 is complaining that
UnboundLocalError: local variable 'i' referenced before assignment.
Any suggestions? Note: the algorithm will need to be able to scale to thousands of sublists, but this would rarely happen because the probability of a missing integer in those cases becomes nearly 0.
Thanks for looking!

Most Efficient Way to Loop Through Columns of Identical Data - Without Duplicates

I have coded something to evaluate the result of every possible combination of inputs, in hopes to optimize a solution.
I have three identical columns of inputs, and my loop cycles through them all in search of the best combinations of inputs to yield the highest output. Example:
475,475,475
391,391,391
24,24,24
999,999,999
Duplicates are not allowed. I have been able to error correct for this per iteration, but not iteration v. iteration. As an example the first result I evaluate is 475 391 24.
QUESTION: The order of the inputs have no impact on the result I am evaluating. My dataset is so large, it is time consuming to evaluate 475 391 24 and then later again evaluate 391 475 24, and then again evaluate 24 391 475. Is there aa way to design around this? I am unable to manipulate the the source data. I have only a modest VBA skillset, but even the basic concept of solving this problem would be helpful. I imagine this is a common problem in many programming languages.
A possible solution would be to use a dictionary in some way: whenever you read the 3 values, sort them, put them in a CSV string in ascending order, then lookup in the dictionary.
It is hard to tell for sure if this will accelerate your code, because you need to make sure if your evaluation function is more or less expensive than the operation of sorting and dictionary lookup.
Another possibility is to remove duplicates from your data after doing some tranformation on it. Suppose your data is in columns A, B and C. Generate data into D, E and F with formulas in the following way:
the first column holds the min: Di--> =min(Ai:Ci)
the third column is the max: Fi --> =max(Ai, Ci))
the second column is the middle Ei--> =sum(Ai:Ci) - Di - Fi)
after then, select the range in D,E,F, copy and paste it as values, Remove Duplicates, and finally, apply your procedure on the remaining data.
If the order of inputs has no impact, you could first sort the lists. Then choose an item from the first column, an item from the second column that is greater the the first item, and an item from the third that is greater than the second. This assures that each combination is tried only once, not 6 times (so a ~6X speedup if the initial sort not too burdensome).

Elements in beginning or end of an array in Binary Search

I have been trying to understand the binary search algorithm which is quite simple but I have one question which is bothering me. I might have understand it wrong.
1) When we start searching for an element in an array, we checked three scenarios
i) If the element is in the middle.
ii) If it is greater then middle element.
iii) If it less then the middle element.
My question is I understand this algorithm tries to find the element by diving the array and using the above check pints but what if the element we are searching for is in the beginning or end of array. for example if
a = {12, 14, 15, 16, 17, 18 , 19, 20} ;
and we are looking for number 12 then why would it has to do all the divide and checking when it can find it in the first element of array. Why don't we also check the starting and end element of every binary search iteration instead of just only three scenarios stated above?
Thanks.

Count no. of words in O(n)

I am on an interview ride here. One more interview question I had difficulties with.
“A rose is a rose is a rose” Write an
algorithm that prints the number of
times a character/word occurs. E.g.
A – 3 Rose – 3 Is – 2
Also ensure that when you are printing
the results, they are in order of
what was present in the original
sentence. All this in order n.
I did get solution to count number of occurrences of each word in sentence in the order as present in the original sentence. I used Dictionary<string,int> to do it. However I did not understand what is meant by order of n. That is something I need an explanation from you guys.
There are 26 characters, So you can use counting sort to sort them, in your counting sort you can have an index which determines when specific character visited first time to save order of occurrence. [They can be sorted by their count and their occurrence with sort like radix sort].
Edit: by words first thing every one can think about it, is using Hash table and insert words in hash, and in this way count them, and They can be sorted in O(n), because all numbers are within 1..n steel you can sort them by counting sort in O(n), also for their occurrence you can traverse string and change position of same values.
Order of n means you traverse the string only once or some lesser multiple of n ,where n is number of characters in the string.
So your solution to store the String and number of its occurences is O(n) , order of n, as you loop through the complete string only once.
However it uses extra space in form of the list you created.
Order N refers to the Big O computational complexity analysis where you get a good upper bound on algorithms. It is a theory we cover early in a Data Structures class, so we can torment, I mean help the student gain facility with it as we traverse in a balanced way, heaps of different trees of knowledge, all different. In your case they want your algorithm to grow in compute time proportional to the size of the text as it grows.
It's a reference to Big O notation. Basically the interviewer means that you have to complete the task with an O(N) algorithm.
"Order n" is referring to Big O notation. Big O is a way for mathematicians and computer scientists to describe the behavior of a function. When someone specifies searching a string "in order n", that means that the time it takes for the function to execute grows linearly as the length of that string increases. In other words, if you plotted time of execution vs length of input, you would see a straight line.
Saying that your function must be of Order n does not mean that your function must equal O(n), a function with a Big O less than O(n) would also be considered acceptable. In your problems case, this would not be possible (because in order to count a letter, you must "touch" that letter, thus there must be some operation dependent on the input size).
One possible method is to traverse the string linearly. Then create a hash and list. The idea is to use the word as the hash key and increment the value for each occurance. If the value is non-existent in the hash, add the word to the end of the list. After traversing the string, go through the list in order using the hash values as the count.
The order of the algorithm is O(n). The hash lookup and list add operations are O(1) (or very close to it).

Algorithm for approximate search in sorted integer list

Consider an array of integers (assumed to be sorted); I would like to find the array index of the integer that is closest to a given integer in the fastest possible way. And the case where there are multiple possibilities, the algorithm should identify all.
Example: consider T=(3, 5, 24, 65, 67, 87, 129, 147, 166), and if the given integer is 144, then the code should identify 147 as the closest integer, and give the array index 7 corresponding to that entry. For the case of 66, the algorithm should identify 65 and 67.
Are there O(1) or at least O(log N) algorithms to do this? The direct search algorithm (binary-search, tree-search, hashing etc.) implementation won't work since those would require perfect matching. Is there any way these can be modified to handle approximate search?
I am developing a C code.
Thanks
Do binary search until you get down to a single element.
If there is a match, walk along your neighbors to find other matches.
If there is no match, look at your immediate neighbors to find the closest match.
Properly implemented binary-search should do the trick -- as long as you identify the moment where your search range decreased to two items only. Then you just pick the closest one. Complexity: O(log n).
I know this is really old - but for other people looking for an answer:
If implementing a regular binary search algorithm with a target value, will of course return -1 if the target value was not found.
BUT - in this case, the value of Low/Left will be in the index which the target number was supposed to be positioned in the sorted list.
So in this example, the value of Low at the end of the search will be 7.
Which means if 144 was actually inside the array, 147 will be to it's right, and 129 will be to it's left. All there's left to do is to check which difference is smaller between the target to 147 and 129, and return it.

Resources