Deque appending dictionary vs Deque(dictionary) in Python - python-3.x

Try to get my head around adding dictionary to a deque .
Take this example:
from collections import deque
graph={}
graph['key']=['value_1','value_2','value_3']
implement_first=deque()
implement_second=deque(graph['key'])
implement_first.append(graph['key'])
and If I print:
print(implement_first)
print(implement_first.popleft())
I get this deque([['value_1', 'value_2', 'value_3']]) and
['value_1', 'value_2', 'value_3']
and If I print:
print(implement_second)
print(implement_second.popleft())
I get this :
deque(['value_1', 'value_2', 'value_3']) and value_1
So what is going on here ? why am I getting list of list for implement_first.append(graph['key']) and what is this implementation implement_second=deque(graph['key']) does?

The following 2 are not equivalent
d1 = deque(x)
d2 = deque()
d2.append(x)
The deque constructor takes an iterable and appends all its elements. So the following 2 would be the same
d1 = deque(x)
d2 = deque()
for y in x:
d2.append(y)
Yours does not raise any errors because graph["key"] is a list (an iterable) which can be both turned into a deque and be an element of a deque.

Related

Python : Create a function that takes a list of integers and strings and returns a new list with the strings filtered out

I am new to coding in Python and I am struggling with a very simple problem. There is the same question but for javascript on the forum but it does not help me.
My code is :
def filter_list(l):
for i in l:
if i != str():
l.append(i)
i = i + 1
return(l)
print(filter_list([1,2,'a','b']))
If you can help!
thanks
Before I present solution here are some problems you need to understand.
str()
str() creates a new instance of the string class. Comparing it to an object with == will only be true if that object is the same string.
print(1 == str())
>>> False
print("some str" == str())
>>> False
print('' == str())
>>> True
iterators (no +1)
You have i = i + 1 in your loop. This doesn't make any sense. i comes from for i in l meaning i looping over the members of list l. There's no guarantee you can add 1 to it. On the next loop i will have a new value
l = [1,2,'a']
for i in l:
print(i)
>>> 1
>>> 2
>>> 'a'
To filter you need a new list
You are appending to l when you find a string. This means that when your loop finds an integer it will append it to the end of the list. And later it will find that integer on another loop interation. And append it to the end AGAIN. And find it in the next iteration.... Forever.
Try it out! See the infinite loop for yourself.
def filter_list(l):
for i in l:
print(i)
if type(i) != str:
l.append(i)
return(l)
filter_list([1,2,'a','b'])
Fix 1: Fix the type check
def filter_list(l):
for i in l:
if type(i) != str:
l.append(i)
return(l)
print(filter_list([1,2,'a','b']))
This infinite loops as discussed above
Fix 2: Create a new output array to push to
def filter_list(l):
output = []
for i in l:
if type(i) != str:
output.append(i)
return output
print(filter_list([1,2,'a','b']))
>>> [1,2]
There we go.
Fix 3: Do it in idiomatic python
Let's use a list comprehension
l = [1,2,'a','b']
output = [x for x in l if type(x) != str]
print(output)
>>> [1, 2]
A list comprehension returns the left most expression x for every element in list l provided the expression on the right (type(x) != str) is true.

How to convert a list with to wildcards into a list of lists using python?

I obtained a list of all files in a folder using glob:
lista = glob.glob("*.h5")
The list basically contains files with names like:
abc_000000000_000.h5
abc_000000000_001.h5
abc_000000000_002.h5
......
abc_000000000_011.h5
......
abc_000000001_000.h5
abc_000000001_001.h5
abc_000000001_002.h5
....
abc_000000026_000.h5
abc_000000026_001.h5
....
abc_000000027_000.h5
....
abc_000000027_011.h5
which has a format abc_0*_0*.h5. How do I reshape this into a list of lists? The inner list would be ['abc_000000027_0*.h5'] and the outer list would be the sequence of the 'abc_000000*' i.e first wildcard.
One way to create an input would be:
lista=[]
for i in range(115):
for j in range(14):
item="abc_%0.9d_%0.3d"%(i,j)
lista.append(item)
My attempt: my solution is not nice and ugly.
listb = glob.glob("*_011.h5")
then for each item in listb split and glob again, for example
listc = glob.glob("abc_000000027*.h5")
Given:
ls -1
abc_00000001_1.h5
abc_00000001_2.h5
abc_00000001_3.h5
abc_00000002_1.h5
abc_00000002_2.h5
abc_00000002_3.h5
abc_00000003_1.h5
abc_00000003_2.h5
abc_00000003_3.h5
You can use pathlib, itertools.groupby and natural sorting to achieve this:
from pathlib import Path
from itertools import groupby
import re
p=Path('/tmp/t2')
def _k(s):
s=str(s)
try:
return tuple(map(int, re.search(r'_(\d+)_(\d*)', s).groups()))
except ValueError:
return (0,0)
def k1(s):
return _k(s)
def k2(s):
return _k(s)[0]
result=[]
files=sorted(p.glob('abc_000000*.h5'), key=k1)
for k,g in groupby(files, key=k2):
result.append(list(map(str, g)))
Which could be simplified to:
def _k(p):
try:
return tuple(map(int, p.stem.split('_')[-2:]))
except ValueError:
return (0,0)
files=sorted(p.glob('abc_000000*_*.h5'), key=lambda e: _k(e))
result=[list(map(str, g)) for k,g in groupby(files, key=lambda e: _k(e)[0])]
Result (in either case):
>>> result
[['/tmp/t2/abc_00000001_1.h5', '/tmp/t2/abc_00000001_2.h5', '/tmp/t2/abc_00000001_3.h5'], ['/tmp/t2/abc_00000002_1.h5', '/tmp/t2/abc_00000002_2.h5', '/tmp/t2/abc_00000002_3.h5'], ['/tmp/t2/abc_00000003_1.h5', '/tmp/t2/abc_00000003_2.h5', '/tmp/t2/abc_00000003_3.h5']]
Which easily could be a dict:
>>> {k:list(map(str, g)) for k,g in groupby(files, key=k2)}
{1: ['/tmp/t2/abc_00000001_1.h5', '/tmp/t2/abc_00000001_2.h5', '/tmp/t2/abc_00000001_3.h5'],
2: ['/tmp/t2/abc_00000002_1.h5', '/tmp/t2/abc_00000002_2.h5', '/tmp/t2/abc_00000002_3.h5'],
3: ['/tmp/t2/abc_00000003_1.h5', '/tmp/t2/abc_00000003_2.h5', '/tmp/t2/abc_00000003_3.h5']}

faster method for comparing two lists element-wise

I am building a relational DB using python. So far I have two tables, as follows:
>>> df_Patient.columns
[1] Index(['NgrNr', 'FamilieNr', 'DosNr', 'Geslacht', 'FamilieNaam', 'VoorNaam',
'GeboorteDatum', 'PreBirth'],
dtype='object')
>>> df_LaboRequest.columns
[2] Index(['RequestId', 'IsComplete', 'NgrNr', 'Type', 'RequestDate', 'IntakeDate',
'ReqMgtUnit'],
dtype='object')
The two tables are quite big:
>>> df_Patient.shape
[3] (386249, 8)
>>> df_LaboRequest.shape
[4] (342225, 7)
column NgrNr on df_LaboRequest if foreign key (FK) and references the homonymous column on df_Patient. In order to avoid any integrity error, I need to make sure that all the values under df_LaboRequest[NgrNr] are in df_Patient[NgrNr].
With list comprehension I tried the following (to pick up the values that would throw an error):
[x for x in list(set(df_LaboRequest['NgrNr'])) if x not in list(set(df_Patient['NgrNr']))]
Though this is taking ages to complete. Would anyone recommend a faster method (method as a general word, as synonym for for procedure, nothing to do with the pythonic meaning of method) for such a comparison?
One-liners aren't always better.
Don't check for membership in lists. Why on earth would you create a set (which is the recommended data structure for O(1) membership checks) and then cast it to a list which has O(N) membership checks?
Make the set of df_Patient once outside the list comprehension and use that instead of making the set in every iteration
patients = set(df_Patient['NgrNr'])
lab_requests = set(df_LaboRequest['NgrNr'])
result = [x for x in lab_requests if x not in patients]
Or, if you like to use set operations, simply find the difference of both sets:
result = lab_requests - patients
Alternatively, use pandas isin() function.
patients = patients.drop_duplicates()
lab_requests = lab_requests.drop_duplicates()
result = lab_requests[~lab_requests.isin(patients)]
Let's test how much faster these changes make the code:
import pandas as pd
import random
import timeit
# Make dummy dataframes of patients and lab_requests
randoms = [random.randint(1, 1000) for _ in range(10000)]
patients = pd.DataFrame("patient{0}".format(x) for x in randoms[:5000])[0]
lab_requests = pd.DataFrame("patient{0}".format(x) for x in randoms[2000:8000])[0]
# Do it your way
def fun1(pat, lr):
return [x for x in list(set(lr)) if x not in list(set(pat))]
# Do it my way: Set operations
def fun2(pat, lr):
pat_s = set(pat)
lr_s = set(lr)
return lr_s - pat_s
# Or explicitly iterate over the set
def fun3(pat, lr):
pat_s = set(pat)
lr_s = set(lr)
return [x for x in lr_s if x not in pat_s]
# Or using pandas
def fun4(pat, lr):
pat = pat.drop_duplicates()
lr = lr.drop_duplicates()
return lr[~lr.isin(pat)]
# Make sure all 3 functions return the same thing
assert set(fun1(patients, lab_requests)) == set(fun2(patients, lab_requests)) == set(fun3(patients, lab_requests)) == set(fun4(patients, lab_requests))
# Time it
timeit.timeit('fun1(patients, lab_requests)', 'from __main__ import patients, lab_requests, fun1', number=100)
# Output: 48.36615000000165
timeit.timeit('fun2(patients, lab_requests)', 'from __main__ import patients, lab_requests, fun2', number=100)
# Output: 0.10799920000044949
timeit.timeit('fun3(patients, lab_requests)', 'from __main__ import patients, lab_requests, fun3', number=100)
# Output: 0.11038020000069082
timeit.timeit('fun4(patients, lab_requests)', 'from __main__ import patients, lab_requests, fun4', number=100)
# Output: 0.32021789999998873
Looks like we have a ~150x speedup with pandas and a ~500x speedup with set operations!
I don't have a pandas installed right now to try this. But you could try removing the list(..) cast. I don't think it provides anything meaningful to the program and sets are much faster for lookup, e.g. x in set(...), than lists.
Also you could try doing this with the pandas API rather than lists and sets, sometimes this faster. Try searching for unique. Then you could compare the size of the two columns and if it is the same, sort them and do an equality check.

Python, removing elements under nested loops from a list

I have written a code to get prime numbers upto a certain limit in a list.
As shown above.
import math
primes = []
for i in range(1, 101):
primes.append(i)
primes.remove(10) # Just removing for sake of experiment
tot = math.sqrt(len(primes))
for j in range(2, math.ceil(tot), 1):
for l in range(0, len(primes)):
k = j**2 + l*j
primes.remove(k)
primes.remove(12) # Just removing for sake of experiment
print(primes)
This code is showing error while when it removes elements from nested loop.
Error is shown above.
Traceback (most recent call last):
File "/root/PycharmProjects/love/love.py", line 13, in <module>
primes.remove(k)
ValueError: list.remove(x): x not in list
Why is this happening as this code was able to remove element which is not under nested loop but was unable to remove element which is being removed under nested loops.
Is there any alternate solution to this problem?
You are iterating over a list while you are editing a list, which is something you should never do! When you iterate the list here:
for l in range(0, len(primes)):
You are actually changing the value of len(primes) when you remove the primes! So this causes the code to act irregularly, as:
In the list comprehension, the original list is left intact, instead a new one is created. (SOURCE)
Instead, you can use list comprehension to achieve the same result!
import math
primes = []
for i in range(1, 101):
primes.append(i)
primeslst = []
def isPrime(number):
for i in range(2,int(number/2)+1):
if number%i == 0:
return True
return False
primes = [p for p in primes if not isPrime(p)]
print(primes)
Hope it helps!

Python/Pandas element wise union of 2 Series containing sets in each element

I have 2 pandas data Series that I know are the same length. Each Series contains sets() in each element. I want to figure out a computationally efficient way to get the element wise union of these two Series' sets. I've created a simplified version of the code with fake and short Series to play with below. This implementation is a VERY inefficient way of doing this. There has GOT to be a faster way to do this. My real Series are much longer and I have to do this operation hundreds of thousands of times.
import pandas as pd
set_series_1 = pd.Series([{1,2,3}, {'a','b'}, {2.3, 5.4}])
set_series_2 = pd.Series([{2,4,7}, {'a','f','g'}, {0.0, 15.6}])
n = set_series_1.shape[0]
for i in range(0,n):
set_series_1[i] = set_series_1[i].union(set_series_2[i])
print set_series_1
>>> set_series_1
0 set([1, 2, 3, 4, 7])
1 set([a, b, g, f])
2 set([0.0, 2.3, 15.6, 5.4])
dtype: object
I've tried combining the Series into a data frame and using the apply function, but I get an error saying that sets are not supported as dataframe elements.
pir4
After testing several options, I finally came up with a good one... pir4 below.
Testing
def jed1(s1, s2):
s = s1.copy()
n = s1.shape[0]
for i in range(n):
s[i] = s2[i].union(s1[i])
return s
def pir1(s1, s2):
return pd.Series([item.union(s2[i]) for i, item in enumerate(s1.values)], s1.index)
def pir2(s1, s2):
return pd.Series([item.union(s2[i]) for i, item in s1.iteritems()], s1.index)
def pir3(s1, s2):
return s1.apply(list).add(s2.apply(list)).apply(set)
def pir4(s1, s2):
return pd.Series([set.union(*z) for z in zip(s1, s2)])

Resources