I want to check the occurrence of a particular item in all other items (even be it a sub string) - python-3.x

I want to check the occurrence of a particular item in all other items (even be it a sub string) .
n_a = ['28', '4663', '66', '66']
occ_arr = [[0,0]]*len(n_a)
for i in range(len(n_a)):
count=0
for j in range(len(n_a)):
if n_a[i] in n_a[j]:
count+=1
occ_arr[i][0] = n_a[i]
occ_arr[i][1] = count
print(occ_arr)
This is my piece of code.
The result is
[['66', 3], ['66', 3], ['66', 3], ['66', 3]]
but the desired output is
[['28', 1], ['4663', 1], ['66', 3], ['66',3]].
Please help me to figure out what is wrong with the code.

All your sub-lists in the occ_arr list are referencing the same list because you're using the * operator to copy the reference of the same list, so any change in one sub-list is reflected on all the other sub-lists. You should instead use list comprehension to create a distinct sub-lists.
Change:
occ_arr = [[0,0]]*len(n_a)
to:
occ_arr = [[0,0] for _ in range(len(n_a))]

Changing:
occ_arr = [[0,0]]*len(n_a)
To:
occ_arr = []
for i in range(len(n_a)):
occ_arr.append([0,0])
Will fix the bug occuring with the program. If you want to make this a one line statement, use the following list comprehension:
occ_arr = [[0,0] for _ in n_a]
#Add the list [0,0] for each item in the list n_a
All together, the program turns into (using the one line solution):
n_a = ['28', '4663', '66', '66']
occ_arr = [[0,0] for _ in n_a]
for i in range(len(n_a)):
count=0
for j in range(len(n_a)):
if n_a[i] in n_a[j]:
count+=1
occ_arr[i][0] = n_a[i]
occ_arr[i][1] = count
print(occ_arr)
print(occ_arr)
Explanation of bug
The reason why the bug occurs is because of the way lists are stored. Rather than being stored as literal data (like ints, floats, etc...), they are stored as objects, with memory addresses and ids. The line:
cc_arr = [[0,0]]*len(n_a)
Creates a list with it's own unique id and then copies it (shallowly [copying just the memory address, rather than the data]) four times. This can be shown through the following example:
>>> x = [[0,0]] * 4
>>> for item in x:
... print(id(x))
4500701640
4500701640
4500701640
4500701640
Note that the output will be different for you.
Hence, when you change one list, you change the underlying representation of the object, which changes the other shallow copies, which is why your program was outputting [['66', 3], ['66', 3], ['66', 3], ['66', 3]] rather than [['28', 1], ['4663', 1], ['66', 3], ['66',3]]

Related

How to get duplicates in sorted list with O(n) python

Hear i have a list, and i have to get duplicates from it. So i can use solution like this:
arr = [1, 1, 2, 3, 4, 4, 5]
def get_duplicates(arr):
duplicates = []
for index in range(len(arr)-1):
if arr[index] == arr[index+1]:
duplicates.append(arr[index])
return duplicates
print(*duplicates)
Ok, what if i have three or more duplicates in my list? I did something like that:
arr = [1, 1, 1, 2, 3, 4, 4, 4, 4, 5]
def get_duplicates(arr):
duplicates = []
for index in range(len(arr)-1):
if arr[index] == arr[index+1]:
duplicates.append(arr[index])
return duplicates
print(*set(duplicates))
Is that both my code works with O(n) or not? I just dont know what is speed of set() function in python, but i think that first for loop takes O(n),
if set() takes O(n), it doesnt matter, because finally i will have O(2n) = O(n) in this case.
Do i solve that task correctly, or my code is not effective? Smart people, help me please))
If u know how to do it in wright direction, explain me please.
Here is a version that is clearly O(n):
def get_duplicates(arr):
last_duplicate = None
duplicates = []
for i,v in enumerate(arr[1:]):
if v==arr[i-1] and v!=last_duplicate:
duplicates.append(v)
last_duplicate = v
return duplicates
Note that this assumes, as your original code does, that duplicates will be adjacent to one another. It also assumes that the first duplicate is not None.

Why does the sorting function not work in python for index slicing and how it can be done?

I was trying to sort a list from a specific index to the last index so, i tried below code:
x=[4,3,6,1]
x[1::].sort()
print(x)
but resultant list was not sorted (means output was : [4, 3, 6, 1])
So can someone tell me why is it happening so and how it can be done?
(Note: My expected output was [4,1,3,6])
x=[4,3,6,1]
#what x[1::].sort() did here
_y = x[1::] # create new list
_y.sort() #and sort the new list
del _y #but new list _y do not have reference to any variable.so it destroyed
#as jasonharper said.the sorted function return a variable then append to x
x[1:] = sorted(x[1::])
print(x)
[4, 1, 3, 6]
I just give details to jasonharper's answer

Python: Convert 2d list to dictionary with indexes as values

I have a 2d list with arbitrary strings like this:
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
I want to create a dictionary out of this:
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}
How do I do this? This answer answers for 1D list for non-repeated values, but, I have a 2d list and values can repeat. Is there a generic way of doing this?
Maybe you could use two for-loops:
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
d = {}
overall_idx = 0
for sub_lst in lst:
for word in sub_lst:
if word not in d:
d[word] = overall_idx
# Increment overall_idx below if you want to only increment if word is not previously seen
# overall_idx += 1
overall_idx += 1
print(d)
Output:
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}
You could first convert the list of lists to a list using a 'double' list comprehension.
Next, get rid of all the duplicates using a dictionary comprehension, we could use set for that but would lose the order.
Finally use another dictionary comprehension to get the desired result.
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
# flatten list of lists to a list
flat_list = [item for sublist in lst for item in sublist]
# remove duplicates
ordered_set = {x:0 for x in flat_list}.keys()
# create required output
the_dictionary = {v:i for i, v in enumerate(ordered_set)}
print(the_dictionary)
""" OUTPUT
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}
"""
also, with collections and itertools:
import itertools
from collections import OrderedDict
lstdict={}
lst = [['a', 'xyz' , 'tps'], ['rtr' , 'xyz']]
lstkeys = list(OrderedDict(zip(itertools.chain(*lst), itertools.repeat(None))))
lstdict = {lstkeys[i]: i for i in range(0, len(lstkeys))}
lstdict
output:
{'a': 0, 'xyz': 1, 'tps': 2, 'rtr': 3}

Why does shallow copy behaves as deep copy for a simple list

I was going understanding shallow copy and deep copy concepts in python. I observe most of the posts/blogs/SO answer explain these concepts are using a nested lists.
import copy
lst = [[1,2,3],[4,5,6]]
b = copy.copy(lst)
c = copy.deepcopy(lst)
# Shallow copy demo
b[0][0] = 9
print(b)
# >>> [[9, 2, 3], [4, 5, 6]]
print(lst)
# >>> [[9, 2, 3], [4, 5, 6]]
# Deepcopy demo
c[0][0] = 10
print(c)
# >>> [[10, 2, 3], [4, 5, 6]]
print(lst)
# >>> [[9, 2, 3], [4, 5, 6]]
I understood the shallow and deep copy concept with the above simple example. But when I implement the concept, on a simple list (one-dimensional list), the observation is shallow copy behaves as deep copy.
import copy
lst = [1,2,3]
b = copy.copy(lst)
c = copy.deepcopy(lst)
# Shallow copy demo
b[0] = 0
print(b)
# >>> [0, 2, 3]
print(lst)
# >>> [1,2,3]
# Deepcopy demo
c[0] = 9
print(c)
# >>> [9,2,3]
print(lst)
# >>> [1,2,3]
This shows that copy.copy(lst) behaves different and does deep copy instead of shallow copy.
I would like to understand, why the behavior of copy.copy() is different for nested list and simple list. Also if i have to get shallow copy working for simple list, how can i achieve it?.
The results that you're getting are not directly related with the "level of depth", the
most important thing to keep in mind here is the concept of mutabiliy.
List are mutable, meanwhile numeric values are not. That means that you can add or modify items on a list, but those operations doesn't create or destroy the list, they only change it. You can verify that using the built-in function id(), which gives you the memory address of a variable:
lst = [1, 2, 3]
print(id(lst)) # the number printed by this...
lst.append(4)
lst[1] = 0
print(id(lst)) # should be the same printed by this one. That tells us that
# the variable 'lst' keeps referecing the same object, although
# the object have changed in form (mutated)
Numbers are totally different, and it makes sense, since a numeric type variable can only
store a single numeric value:
a = 5
print(id(a)) # the number printed by this...
a = 6
print(id(a)) # should be different than this one, meaning that a new numeric
# value were created and stored in a different memory address
On the line
b[0][0] = 9
of your first example, the list at b[0] is being manipulated, but it remains being the same object, and since b[0] is nothing more than a reference to the same list at lst[0] (because b is a shallow copy), when we print lst we will see that it changed too.
On your implementation, when you assign b[0] = 0, python is creating the value 0, storing it on a new memory location, and overriding the reference that b[0] had to the same value as lst[0] (cause thats the natural behavior of numeric types).
As is said, this doesn't have to be with the level of nesting of compound data structures,
since some of the are inmutable (as for example the tuple) and the same that happened on your implementation would happen with this inmutable data structures.
You can read some more about the id() built-in function here, and more about
mutable and inmutable types here
Hope this answer helps you!

How to assing values to a dictionary

I am creating a function which is supposed to return a dictionary with keys and values from different lists. But I amhavin problems in getting the mean of a list o numbers as values of the dictionary. However, I think I am getting the keys properly.
This is what I get so far:
def exp (magnitudes,measures):
"""return for each magnitude the associated mean of numbers from a list"""
dict_expe = {}
for mag in magnitudes:
dict_expe[mag] = 0
for mea in measures:
summ = 0
for n in mea:
summ += n
dict_expe[mag] = summ/len(mea)
return dict_expe
print(exp(['mag1', 'mag2', 'mag3'], [[1,2,3],[3,4],[5]]))
The output should be:
{mag1 : 2, mag2: 3.5, mag3: 5}
But what I am getting is always 5 as values of all keys. I thought about the zip() method but im trying to avoid it as because the it requieres the same length in both lists.
An average of a sequence is sum(sequence) / len(sequence), so you need to iterate through both magnitudes and measures, calculate these means (arithmetical averages) and store it in a dictionary.
There are much more pythonic ways you can achieve this. All of these examples produce {'mag1': 2.0, 'mag2': 3.5, 'mag3': 5.0} as result.
Using for i in range() loop:
def exp(magnitudes, measures):
means = {}
for i in range(len(magnitudes)):
means[magnitudes[i]] = sum(measures[i]) / len(measures[i])
return means
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))
But if you need both indices and values of a list you can use for i, val in enumerate(sequence) approach which is much more suitable in this case:
def exp(magnitudes, measures):
means = {}
for i, mag in enumerate(magnitudes):
means[mag] = sum(measures[i]) / len(measures[i])
return means
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))
Another problem hides here: i index belongs to magnitudes but we are also getting values from measures using it, this is not a big deal in your case if you have magnitudes and measures the same length but if magnitudes will be larger you will get an IndexError. So it seems to me like using zip function is what would be the best choice here (actually as of python3.6 it doesn't require two lists to be the same length, it will just use the length of shortest one as the length of result):
def exp(magnitudes, measures):
means = {}
for mag, mes in zip(magnitudes, measures):
means[mag] = sum(mes) / len(mes)
return means
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))
So feel free to use the example which suits your requirements of which one you like and don't forget to add docstring.
More likely you don't need such pythonic way but it can be even shorter when dictionary comprehension comes into play:
def exp(magnitudes, measures):
return {mag: sum(mes) / len(mes) for mag, mes in zip(magnitudes, measures)}
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))

Resources