I have 2 functions which perform same task of identifying if the 2 lists have any common element between them. I want to analyze their time complexity.
What i know is: for loop if iterated n times gives O(n) complexity. But, I am confused with the situation when we use 'in' operator. eg: if element in mylist
Please look at the functions to have better understanding of the scenario:
list1 = ['a','b','c','d','e']
list2 = ['m','n','o','d']
def func1(list1, list2):
for i in list1: # O(n), assuming number of items in list1 is n
if i in list2: # What will be the BigO of this statement??
return True
return False
z = func1(list1, list2)
print(z)
I have another function func2, please help determine its BigO as well:
def func2(list1, list2):
dict = {}
for i in list1:
if i not in dict.keys():
dict[i] = True
for j in list2:
if j in dict.keys():
return True
return False
z = func2(list1, list2)
print(z)
What is the time complexity of func1 and func2? Is there any difference in performance between 2 functions?
Regarding func1:
searching in lists is a linear operation with respect to the number of elements,
assuming items are randomly ordered and order of checking is also not related then statistically you come across an existing element in n/2 steps and n when not found (which simplifies to O(n))
if x in list_ is a linear search as described above, hence func1 has complexity of n^2.
Regarding func2:
instead of dictionary you may want to consider using a set. It has O(1) complexity for checking the existence of element. which would improve the complexity over func1, and also you can use set(list) to create a list instead of iterating over list directly in python (which is slower than initialization of a set directly from list - but does not affect the O complexity, as it is just slower, but constant).
Related
Currently, I have a nested loops in python that iterates over lists, but the iterable child list depends on the selected value of parent loop. So, consider this code snippet for the nested loop.
my_combinations = []
list1 = ['foo', 'bar', 'baz']
for l1 in list1:
list2 = my_func1(l1) # Some user defined function which queries through some dataset
for l2 in list2:
list3 = my_func2(l1, l2) # Some other user defined function which queries through some dataset
for l3 in list3:
my_combinations.append((l1,l2,l3))
Is there an efficient way to get all the permissible combinations (as defined by my_func1 and my_func2 functions) in the my_combinations list as the number of elements in list1, list2 and list3 runs into 4-5 digits and is clearly inefficient right now?
As a thought process, if I had list1, list2 and list3 pre-defined before entering the outermost loop, itertools.product might have given me the required combinations efficiently. However, I don't think I can use it in this situation.
Can someone explain the last line of this Python code snippet to me?
Cell is just another class. I don't understand how the for loop is being used to store Cell objects into the Column object.
class Column(object):
def __init__(self, region, srcPos, pos):
self.region = region
self.cells = [Cell(self, i) for i in xrange(region.cellsPerCol)] #Please explain this line.
The line of code you are asking about is using list comprehension to create a list and assign the data collected in this list to self.cells. It is equivalent to
self.cells = []
for i in xrange(region.cellsPerCol):
self.cells.append(Cell(self, i))
Explanation:
To best explain how this works, a few simple examples might be instructive in helping you understand the code you have. If you are going to continue working with Python code, you will come across list comprehension again, and you may want to use it yourself.
Note, in the example below, both code segments are equivalent in that they create a list of values stored in list myList.
For instance:
myList = []
for i in range(10):
myList.append(i)
is equivalent to
myList = [i for i in range(10)]
List comprehensions can be more complex too, so for instance if you had some condition that determined if values should go into a list you could also express this with list comprehension.
This example only collects even numbered values in the list:
myList = []
for i in range(10):
if i%2 == 0: # could be written as "if not i%2" more tersely
myList.append(i)
and the equivalent list comprehension:
myList = [i for i in range(10) if i%2 == 0]
Two final notes:
You can have "nested" list comrehensions, but they quickly become hard to comprehend :)
List comprehension will run faster than the equivalent for-loop, and therefore is often a favorite with regular Python programmers who are concerned about efficiency.
Ok, one last example showing that you can also apply functions to the items you are iterating over in the list. This uses float() to convert a list of strings to float values:
data = ['3', '7.4', '8.2']
new_data = [float(n) for n in data]
gives:
new_data
[3.0, 7.4, 8.2]
It is the same as if you did this:
def __init__(self, region, srcPos, pos):
self.region = region
self.cells = []
for i in xrange(region.cellsPerCol):
self.cells.append(Cell(self, i))
This is called a list comprehension.
I have the following pieces of code doing the sorting of a list by swapping pairs of elements:
# Complete the minimumSwaps function below.
def minimumSwaps(arr):
counter = 0
val_2_indx = {val: arr.index(val) for val in arr}
for indx, x in enumerate(arr):
if x != indx+1:
arr[indx] = indx+1
s_indx = val_2_indx[indx+1]
arr[s_indx] = x
val_2_indx[indx+1] = indx
val_2_indx[x] = s_indx
counter += 1
return counter
def minimumSwaps(arr):
temp = [0] * (len(arr) + 1)
for pos, val in enumerate(arr):
temp[val] = pos
swaps = 0
for i in range(len(arr)):
if arr[i] != i+1:
swaps += 1
t = arr[i]
arr[i] = i+1
arr[temp[i+1]] = t
temp[t] = temp[i+1]
temp[i+1] = i
return swaps
The second function works much faster than the first one. However, I was told that dictionary is faster than list.
What's the reason here?
A list is a data structure, and a dictionary is a data structure. It doesn't make sense to say one is "faster" than the other, any more than you can say that an apple is faster than an orange. One might grow faster, you might be able to eat the other one faster, and they might both fall to the ground at the same speed when you drop them. It's not the fruit that's faster, it's what you do with it.
If your problem is that you have a sequence of strings and you want to know the position of a given string in the sequence, then consider these options:
You can store the sequence as a list. Finding the position of a given string using the .index method requires a linear search, iterating through the list in O(n) time.
You can store a dictionary mapping strings to their positions. Finding the position of a given string requires looking it up in the dictionary, in O(1) time.
So it is faster to solve that problem using a dictionary.
But note also that in your first function, you are building the dictionary using the list's .index method - which means doing n linear searches each in O(n) time, building the dictionary in O(n^2) time because you are using a list for something lists are slow at. If you build the dictionary without doing linear searches, then it will take O(n) time instead:
val_2_indx = { val: i for i, val in enumerate(arr) }
But now consider a different problem. You have a sequence of numbers, and they happen to be the numbers from 1 to n in some order. You want to be able to look up the position of a number in the sequence:
You can store the sequence as a list. Finding the position of a given number requires linear search again, in O(n) time.
You can store them in a dictionary like before, and do lookups in O(1) time.
You can store the inverse sequence in a list, so that lst[i] holds the position of the value i in the original sequence. This works because every permutation is invertible. Now getting the position of i is a simple list access, in O(1) time.
This is a different problem, so it can take a different amount of time to solve. In this case, both the list and the dictionary allow a solution in O(1) time, but it turns out it's more efficient to use a list. Getting by key in a dictionary has a higher constant time than getting by index in a list, because getting by key in a dictionary requires computing a hash, and then probing an array to find the right index. (Getting from a list just requires accessing an array at an already-known index.)
This second problem is the one in your second function. See this part:
temp = [0] * (len(arr) + 1)
for pos, val in enumerate(arr):
temp[val] = pos
This creates a list temp, where temp[val] = pos whenever arr[pos] == val. This means the list temp is the inverse permutation of arr. Later in the code, temp is used only to get these positions by index, which is an O(1) operation and happens to be faster than looking up a key in a dictionary.
Consider the following problem: I want to keep elements of list1 that belongs to list2. So I can do something like this:
filtered_list = [w for w in list1 if w in list2]
I need to repeat this same procedure for different examples of list1 (about 20000 different examples) and a "constant" (frozen) list2.
How can I speed up the process?
I also know the following properties:
1) list1 has repeated elements and it is not sorted and it has about 10000 (ten thousand) items.
2) list2 is a giant sorted list (about 200000 - two hundred thousand) entries in Python) and each element is unique.
The first thing that comes to me is that maybe I can use a kind of binary search. However, is there a way to do this in Python?
Furthermore, I do not mind if filtered_list has the same order of items of list1. So, maybe I can check only a unrepeated version of list1 and after removing the elements in list1 that do not belong to list 2, I can return the repeated items.
Is there a fast way to do this in Python 3?
Convert list2 to a set:
# do once
set2 = set(list2)
# then every time
filtered_list = [w for w in list1 if w in set2]
x in list2 is sequential; x in set2 uses the same mechanism as dictionaries, resulting in a very quick lookup.
If list1 didn't have duplicates, converting both to sets and taking set intersection would be the way to go:
filtered_set = set1 & set2
but with duplicates you're stuck with iterating over list1 as above.
(As you said, you could even see elements that you should delete, using set1 - set2, but then you'd still be stuck in a loop in order to delete - there shouldn't be any difference in performance between filtering keepers vs filtering trash, you still have to iterate over list1, so that's no win over the method above.)
EDIT in response to comment: Converting list1 to a Counter would might (EDIT: or not; testing needed!) speed it up if you can use it normally like that (i.e. you never have a list, you always just deal with a Counter). But if you have to preprocess list1 into counter1 each time you do the above operation, again it's no win - creating a Counter will again involve a loop.
Consider I've got a list of 2-tuples named listuple and another simple list named list0. I want to generate a list of 1s and -1s based on comparing my two given lists.
def Vectomparison (listuple, list0):
result = []
for EachElement in listuple:
if EachElement [0] in list0:
result.append (1)
else:
result.append (-1)
return result
But I really think that this not a Pythonic approach. Any idea for making this Pythonically more compressed?
A direct rewrite using a list comprehesion, and ternary operator:
def Vectomparison (listuple, list0):
return [1 if item[0] in list0 else -1 for item in listuple]
Note that item[0] in list0 uses a linear search if list0 is a list. That makes the algorithms time complexity O(N*M) where N = len(listuple), M = len(list0)
You can make it faster:
def Vectomparison (listuple, list0):
set0 = set(list0)
return [1 if item[0] in set0 else -1 for item in listuple]
This version has a time complexity of O(N+M)