I understand the process of passing the function as a parameter to a different function but, coming from the c# background, I don't understand the need of it.
Can someone please make me aware of some scenarios in which this is preferred?
One of the reasons why passing a function as a parameter is useful is the concept of lambda functions in python.
method2(lambda: method1('world'))
>>> hello world
The benefit of lambda functions are easily visible when used with python functions map(), filter(), and reduce().
Lambda functions with map()
>map(lambda x: x*2, my_list)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38]
Lambda with reduce()
>reduce(lambda x, y: x+y, my_list)
190
Lambda with filter()
filter(lambda x: x >10, my_list)
[11, 12, 13, 14, 15, 16, 17, 18, 19]
Basically unlike c# your code gets reduced number of lines and becomes more efficient
since your function call and execution happens on the same line
Passing functions into functions allows to parameterise behaviour. This is not unlike passing values into functions allows to parameterise data.
def is_greater(what: int, base: int):
if what > base: # fixed behaviour, parameterised data
print(f'{what} is greater')
def is_valid(what: int, condition: 'Callable'):
if condition(what): # parameterised behaviour
print(f'{what} is valid')
Some common use-cases include:
map, filter and others that apply some behaviour to iterables. The functions itself merely implement the "apply to each element" part, but the behaviour can be swapped out:
>>> print(*map(float, ['1', '2', '3.0'])
1.0 2.0 3.0
In such situations, one often uses a lambda to define the behaviour on the fly.
>>> print(sorted(
... ['Bobby Tables', 'Brian Wayne', 'Charles Chapeau'],
... key=lambda name: name.split()[1]), # sort by last name
... )
['Charles Chapeau', 'Bobby Tables', 'Brian Wayne']
Function decorators that wrap a function with additional behaviour.
def print_call(func):
"""Decorator that prints the arguments its target is called with"""
def wrapped_func(*args, **kwargs):
print(f'call {func} with {args} and {kwargs}')
return func(*args, **kwargs)
return wrapped_func
#print_call
def rolling_sum(*numbers, initial=0):
totals = [initial]
for number in numbers:
totals.append(totals[-1] + number)
return totals
rolling_sum(1, 10, 27, 42, 5, initial=100)
# call <function rolling_sum at 0x10ed6fd08> with ([1, 10, 27, 42, 5],) and {'initial': 100}
Every time you see a decorator applied with # it is a higher order function.
Callbacks and payloads that are executed at another time, context, condition, thread or even process.
def call_after(delay: float, func: 'Callable', *args, **kwargs):
"""Call ``func(*args, **kwargs)`` after ``delay`` seconds"""
time.sleep(delay)
func(*args, **kwargs)
thread = threading.Thread(
target=call_after, # payload for the thread is a function
args=(1, print, 'Hello World'))
thread.start()
print("Let's see what happens...")
# Let's see what happens...
#
# Hello World
Passing functions instead of values allows to emulate lazy evaluation.
def as_needed(expensive_computation, default):
if random_condition():
return expensive_computation()
return default
Related
I have read that the function apply_async doesn't give ordered results. If I have repeated calls to a function which prints the squares of a list of numbers, I can see from the display that the list is not ordered.
However when the function returns the number instead of printing it and I use .get() to get the values, then I see that the results are ordered.
I have a few questions --
Why are the results from .get() ordered?
If I have a loop which as a variable named a and its value is different for different iterations. Will using apply_async cause overwrites of the values of a as it runs the processes in parallel and asynchronously?
Will I be able to save computational time if I run apply instead of apply_async? My code shows that apply is slower than the for loop. Why is that so?
Can we use a function declared within the ___main___ function with apply_async?
Here is a small working example:
from multiprocessing import Pool
import time
def f(x):
return x*x
if __name__ == '__main__':
print('For loop')
t1f = time.time()
for ii in range(20):
f(ii)
t2f = time.time()
print('Time taken for For loop = ', t2f-t1f,' seconds')
pool = Pool(processes=4)
print('Apply async loop')
t1a = time.time()
results = [pool.apply_async(f, args = (j,)) for j in range(20)]
pool.close()
pool.join()
t2a = time.time()
print('Time taken for pool = ', t2a-t1a,' seconds')
print([results[hh].get() for hh in range(len(results))])
This results as:
For loop Time taken for For loop = 5.9604644775390625e-06 seconds
Apply async loop Time taken for pool = 0.10188460350036621 seconds
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225,
256, 289, 324, 361]
Why are the results from .get() ordered?
because the results list is ordered.
If I have a loop which as a variable named a and its value is different for different iterations. Will using apply_async cause
overwrites of the values of a as it runs the processes in parallel
and asynchronously?
generally no, but I can't tell without seeing the code.
Will I be able to save computational time if I run apply instead of apply_async? My code shows that apply is slower than the for
loop. Why is that so?
no, apply blocks on each call, there is no parallelism. apply is slower because of multiprocessing overhead.
Can we use a function declared within the ___main___ function with apply_async?
yes for *nix, no for windows, because there is no fork().
your time measurement of .apply_async is wrong, you should take t2a after result.get, and don't assume the result is finished in order:
while not all(r.ready() for r in results):
time.sleep(0.1)
btw, your work function runs too fast to finish, do more computation to do a true benchmark.
I don't manage to pass a list as arguments to func1d in numply.apply_along_axis(...).
def test(a, value):
print(value)
return a
a = np.zeros((49), dtype=list)
kwargs = {"value":[1,1,1]}
zep = np.vectorize(test)
np.apply_along_axis(zep, 0, a, **kwargs)
Out:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/ibpc/osx/lbt/numpy/1.9.2/lib/python3.4/site-packages/nump/lib/shape_base.py", line 91, in apply_along_axis
res = func1d(arr[tuple(i.tolist())], *args, **kwargs)
File "/ibpc/osx/lbt/numpy/1.9.2/lib/python3.4/site-packages/numpy/lib/function_base.py", line 1700, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/ibpc/osx/lbt/numpy/1.9.2/lib/python3.4/site-packages/numpy/lib/function_base.py", line 1769, in _vectorize_call
outputs = ufunc(*inputs)
ValueError: operands could not be broadcast together with shapes (49,) (3,)
So, He want that my len(kwargs["value"])==49. But it's not what I want.
I need to change value if I want (during the numpy.apply_along_axis(func1d) I need to update my list).
How can I pass a list as argument? Or may be use another way to resolve this problem..
In real, I have a numpy.array of list of position in 3Dspace for a particle.
Like this:
dim = [49,49,49]
dx = 3
origin = [3,3,3]
nb_iter = 5
ntoto=np.load("ntoto.npy")
ntoto = ntoto.flatten()
liste_particles=np.zeros((5), dtype=list)
for i in range(len(liste_particles)):
liste_particles[i]=[[r.uniform(0,150),r.uniform(0,150),r.uniform(0,150)]]*nb_iter #nb_iter is just the number of iteration I want to do in calcTrajs.
vtraj=np.vectorize(calcTrajs, otypes=[list])
np.apply_along_axis(vtraj, 0, liste_particules)
Here, I have five particles randomly place. Moreover, I have another numpy.array (shape==(49,49,49)) which contains a vector_field.
Here the func1d which I need to run:
def calcTrajs(a):
global ntoto, dim, dx, origin #ntoto is my vector_field
for b in range(1,len(a)):
ijk = s2g(a[b-1], dx, origin, dim) # function to have on which vector my particle is.(space to grid, because my vector_field is like a grid).
value = np.asarray(ntoto[flatten3Dto1D(ijk, dim[1], dim[2])]) # so value contains the vector who influence my particle.
try:
a[b] = list(a[b-1] + value*1000)
except:
print("error")
break
return a
this function permits me to launch a particle in my vector_field and calculate its trajectory.
As you can see, I put global variables. But I want to pass this variables as arguments and not as global. ntoto is a numpy.array, dim is a list (dimension of my vector field), dx is the cell spacing (because my vector_field is in a grid which contains many cells and each cell contains a vector) and origin is the first point of my grid.
Best regards,
Adam
As I commented, neither vectorize or apply... is a speed tool. vectorize can be useful for broadcasting several arrays against each other. apply ... can be useful for iterating over more than 2 dimensions. With only one or two it is overkill. Both are tools that beginners often misuse.
It looks like the apply_along_axis part is ok, though I haven't tested it. The error lies in broadcasting in vectorize.
Especially since you are defining a as object dtype, you should specify like return dtype for vectorize. Otherwise it performs a test calc to determine it.
In [223]: def test(a, value):
...: print(value)
...: return a
In [224]: zep = np.vectorize(test, otypes=['O'])
In [225]: a = np.array([[1,2,3],[4,5]])
In [226]: a
Out[226]: array([list([1, 2, 3]), list([4, 5])], dtype=object)
zep works with a and a scalar
In [227]: zep(a,1)
1
1
Out[227]: array([list([1, 2, 3]), list([4, 5])], dtype=object)
But when a has 2 items, and value as 3 items, I get the same sort of error as you did:
In [228]: zep(a,[1,2,3])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-228-382aaa7a2dc6> in <module>()
----> 1 zep(a,[1,2,3])
/usr/local/lib/python3.6/dist-packages/numpy/lib/function_base.py in __call__(self, *args, **kwargs)
2753 vargs.extend([kwargs[_n] for _n in names])
2754
-> 2755 return self._vectorize_call(func=func, args=vargs)
2756
2757 def _get_ufunc_and_otypes(self, func, args):
/usr/local/lib/python3.6/dist-packages/numpy/lib/function_base.py in _vectorize_call(self, func, args)
2829 for a in args]
2830
-> 2831 outputs = ufunc(*inputs)
2832
2833 if ufunc.nout == 1:
ValueError: operands could not be broadcast together with shapes (2,) (3,)
(2,) and (2,) is fine:
In [229]: zep(a,['a','b'])
a
b
Out[229]: array([list([1, 2, 3]), list([4, 5])], dtype=object)
So is (2,) with (2,1), producing a (2,2) output. This is an example of the kind of broadcasting where vectoring can help.
In [230]: zep(a,[['a'],['b']])
a
a
b
b
Out[230]:
array([[list([1, 2, 3]), list([4, 5])],
[list([1, 2, 3]), list([4, 5])]], dtype=object)
I would like to use this CLI template
https://mike.depalatis.net/blog/simplifying-argparse.html
for creating a tool for accessing the EMC Unity REST API.
It appears to be written with python3 in mind. Particularly the argument helper function.
def argument(*name_or_flags, **kwargs):
return ([*name_or_flags], kwargs)
I don't believe I understand exactly how the argument function is supposed to work and thus how I can modify it to work with python2.
e.g. If I had a function create_lun that had a few options, I think I need argument to return a list of arguments defined and thus I would decorate it as so:
#subcommand([argument('-o', '--pool', dest='pool', default="pool_1",
type=str, help='Name of Pool of str arg'),
argument('lun_name', metavar='lun_name', type=str,
help='Name of LUN to create in format: ldom-vol#')])
def create_lun(args):
and thus cli.py create_lun lun_name would create the lun and -h would show me this syntax.
If that assumption is correct I would need to translate python3's ability to
return ([*name_or_flags], kwargs)
into a python2.7 equivalent. Any thoughts on this greatly appreciated.
The line
return ([*name_or_flags], kwargs)
is the same as
return [*name_or_flags], kwargs
which is the same as
return name_or_flags, kwargs
In python 3, the syntax [*a_list] expands the elements of a list into a new list literal. It's intended purpose was for inserting an existing list into a new list, not for simply making a copy of a list.
In [1]: a = list(range(9))
In [2]: a
Out[2]: [0, 1, 2, 3, 4, 5, 6, 7, 8]
In [3]: [*a]
Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8]
In [4]: ['hey', *a, 'there']
Out[4]: ['hey', 0, 1, 2, 3, 4, 5, 6, 7, 8, 'there']
To write [*name_or_flags] seems like an attempt at obfuscation. Possibly the author had wanted to create a new copy of name_or_flags, and in that case a slice would have been enough:
def argument(*name_or_flags, **kwargs):
return name_or_flags[:], kwargs
If no copy was needed, then the following will suffice.
def argument(*name_or_flags, **kwargs):
return name_or_flags, kwargs
This will work in both Python2 and Python3 and should yield the same results.
I´m still new to Spark / PySpark and have the following question. I got a nested list with ID´s in it:
result = [[411, 44, 61], [42, 33], [1, 100], [44, 42]]
The thing I´m trying to achieve is, that if any item of sublist matches an item in another sublist the both should be merged. The result should look like this:
merged_result = [[411, 44, 61, 42, 33, 44, 42], [1,100]]
The first list in "result" matches with the fourth list. The fourth list matches with the second, so all 3 should be merged into one list. The third list doesn´t match with any other list, so it stays the same.
I could achieve this by writing loops with Python.
result_after_matching = []
for i in result:
new_list = i
for s in result:
if any(x in i for x in s):
new_list = new_list + s
result_after_matching.append(set(new_list))
#merged_result = [[411, 44, 61, 42], [42,33,44], [1, 100], [44,42,33,411,61]]
As this is not the desired output I would need to repeat the loop and do another set() overt the "merged_result")
set([[411,44,61,42,33], [42,33,44,411,61],[1,100], [44,42,33,411,61]])
-> [[411, 44, 61, 42, 33], [1,100]]
As the list of lists, and the sublists gets bigger and bigger by time as new data will be incoming, this will not be the function to use.
Can anyone tell me if there is a function, in Spark / Pyspark, to match / merge / groupby / reduce these nested lists much easier and faster?!
Thanks a lot in advance!
MG
Most rdd or dataframe based solutions will probably be fairly inefficient. This is because the nature of your problem requires every element of your data set to be compared to every other element potentially multiple times. This makes it so that distributing the work across a cluster is at best inefficient.
Perhaps a different way to do this would be to reformulate this as a graph problem. If you treat each item in a list as a node on a graph, and each list as a subgraph, then the connected components of a parent graph constructed from the subgraphs will be the desired result. Here is an example using the networkx package in python:
import networkx as nx
result = [[411, 44, 61], [42, 33], [1, 100], [44, 42]]
g = nx.DiGraph()
for subgraph in result:
g.add_path(subgraph)
u = g.to_undirected()
output=[]
for component in nx.connected_component_subgraphs(u):
output.append(component.nodes())
print(output)
# [[33, 42, 411, 44, 61], [1, 100]]
This should be fairly efficient, but if your data is very large it will make sense to use a more scalable graph analysis tool. Spark does have a graph processing library called GraphX:
https://spark.apache.org/docs/latest/graphx-programming-guide.html
Unfortunately the pyspark implementation is lagging behind a bit. So if you intend to use something like this, you might be stuck using scala-spark or a different framework entirely for right now.
I think you can use aggregate action from RDD. Below I'm putting example implementation in Scala. Please note that I've used recursion, to make it more readable, but to improve performance it's good idea to reimplement those functions.
def overlap(s1: Seq[Int], s2: Seq[Int]): Boolean =
s1.exists(e => s2.contains(e))
def mergeSeq(s1: Seq[Int], s2: Seq[Int]): Seq[Int] =
s1.union(s2).distinct
def mergeSeqWithSeqSeq(s: Seq[Int], ss: Seq[Seq[Int]]): Seq[Seq[Int]] = ss match {
case Nil => Seq(s)
case h +: tail =>
if(overlap(h, s)) mergeSeqWithSeqSeq(mergeSeq(h, s), tail)
else h +: mergeSeqWithSeqSeq(s, tail)
}
def mergeSeqSeqWithSeqSeq(s1: Seq[Seq[Int]], s2: Seq[Seq[Int]]): Seq[Seq[Int]] = s1 match {
case Nil => s2
case h +: tail => mergeSeqWithSeqSeq(h, mergeSeqSeqWithSeqSeq(tail, s2))
}
val result = rdd
.aggregate(Seq.empty[Seq[Int]]) (
{case (ss, s) => mergeSeqWithSeqSeq(s, ss)},
{case (s1, s2) => mergeSeqSeqWithSeqSeq(s1, s2)}
)
filter, map, and reduce work perfectly in Python 2. Here is an example:
>>> def f(x):
return x % 2 != 0 and x % 3 != 0
>>> filter(f, range(2, 25))
[5, 7, 11, 13, 17, 19, 23]
>>> def cube(x):
return x*x*x
>>> map(cube, range(1, 11))
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
>>> def add(x,y):
return x+y
>>> reduce(add, range(1, 11))
55
But in Python 3, I receive the following outputs:
>>> filter(f, range(2, 25))
<filter object at 0x0000000002C14908>
>>> map(cube, range(1, 11))
<map object at 0x0000000002C82B70>
>>> reduce(add, range(1, 11))
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
reduce(add, range(1, 11))
NameError: name 'reduce' is not defined
I would appreciate if someone could explain to me why this is.
Screenshot of code for further clarity:
You can read about the changes in What's New In Python 3.0. You should read it thoroughly when you move from 2.x to 3.x since a lot has been changed.
The whole answer here are quotes from the documentation.
Views And Iterators Instead Of Lists
Some well-known APIs no longer return lists:
[...]
map() and filter() return iterators. If you really need a list, a quick fix is e.g. list(map(...)), but a better fix is often to use a list comprehension (especially when the original code uses lambda), or rewriting the code so it doesn’t need a list at all. Particularly tricky is map() invoked for the side effects of the function; the correct transformation is to use a regular for loop (since creating a list would just be wasteful).
[...]
Builtins
[...]
Removed reduce(). Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable.
[...]
The functionality of map and filter was intentionally changed to return iterators, and reduce was removed from being a built-in and placed in functools.reduce.
So, for filter and map, you can wrap them with list() to see the results like you did before.
>>> def f(x): return x % 2 != 0 and x % 3 != 0
...
>>> list(filter(f, range(2, 25)))
[5, 7, 11, 13, 17, 19, 23]
>>> def cube(x): return x*x*x
...
>>> list(map(cube, range(1, 11)))
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
>>> import functools
>>> def add(x,y): return x+y
...
>>> functools.reduce(add, range(1, 11))
55
>>>
The recommendation now is that you replace your usage of map and filter with generators expressions or list comprehensions. Example:
>>> def f(x): return x % 2 != 0 and x % 3 != 0
...
>>> [i for i in range(2, 25) if f(i)]
[5, 7, 11, 13, 17, 19, 23]
>>> def cube(x): return x*x*x
...
>>> [cube(i) for i in range(1, 11)]
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
>>>
They say that for loops are 99 percent of the time easier to read than reduce, but I'd just stick with functools.reduce.
Edit: The 99 percent figure is pulled directly from the What’s New In Python 3.0 page authored by Guido van Rossum.
As an addendum to the other answers, this sounds like a fine use-case for a context manager that will re-map the names of these functions to ones which return a list and introduce reduce in the global namespace.
A quick implementation might look like this:
from contextlib import contextmanager
#contextmanager
def noiters(*funcs):
if not funcs:
funcs = [map, filter, zip] # etc
from functools import reduce
globals()[reduce.__name__] = reduce
for func in funcs:
globals()[func.__name__] = lambda *ar, func = func, **kwar: list(func(*ar, **kwar))
try:
yield
finally:
del globals()[reduce.__name__]
for func in funcs: globals()[func.__name__] = func
With a usage that looks like this:
with noiters(map):
from operator import add
print(reduce(add, range(1, 20)))
print(map(int, ['1', '2']))
Which prints:
190
[1, 2]
Just my 2 cents :-)
Since the reduce method has been removed from the built in function from Python3, don't forget to import the functools in your code. Please look at the code snippet below.
import functools
my_list = [10,15,20,25,35]
sum_numbers = functools.reduce(lambda x ,y : x+y , my_list)
print(sum_numbers)
One of the advantages of map, filter and reduce is how legible they become when you "chain" them together to do something complex. However, the built-in syntax isn't legible and is all "backwards". So, I suggest using the PyFunctional package (https://pypi.org/project/PyFunctional/).
Here's a comparison of the two:
flight_destinations_dict = {'NY': {'London', 'Rome'}, 'Berlin': {'NY'}}
PyFunctional version
Very legible syntax. You can say:
"I have a sequence of flight destinations. Out of which I want to get
the dict key if city is in the dict values. Finally, filter out the
empty lists I created in the process."
from functional import seq # PyFunctional package to allow easier syntax
def find_return_flights_PYFUNCTIONAL_SYNTAX(city, flight_destinations_dict):
return seq(flight_destinations_dict.items()) \
.map(lambda x: x[0] if city in x[1] else []) \
.filter(lambda x: x != []) \
Default Python version
It's all backwards. You need to say:
"OK, so, there's a list. I want to filter empty lists out of it. Why?
Because I first got the dict key if the city was in the dict values.
Oh, the list I'm doing this to is flight_destinations_dict."
def find_return_flights_DEFAULT_SYNTAX(city, flight_destinations_dict):
return list(
filter(lambda x: x != [],
map(lambda x: x[0] if city in x[1] else [], flight_destinations_dict.items())
)
)
Here are the examples of Filter, map and reduce functions.
numbers = [10,11,12,22,34,43,54,34,67,87,88,98,99,87,44,66]
//Filter
oddNumbers = list(filter(lambda x: x%2 != 0, numbers))
print(oddNumbers)
//Map
multiplyOf2 = list(map(lambda x: x*2, numbers))
print(multiplyOf2)
//Reduce
The reduce function, since it is not commonly used, was removed from the built-in functions in Python 3. It is still available in the functools module, so you can do:
from functools import reduce
sumOfNumbers = reduce(lambda x,y: x+y, numbers)
print(sumOfNumbers)
from functools import reduce
def f(x):
return x % 2 != 0 and x % 3 != 0
print(*filter(f, range(2, 25)))
#[5, 7, 11, 13, 17, 19, 23]
def cube(x):
return x**3
print(*map(cube, range(1, 11)))
#[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
def add(x,y):
return x+y
reduce(add, range(1, 11))
#55
It works as is. To get the output of map use * or list