oneliner using reduce in python - python-3.x

I'm trying to take my Python skills (beginner) to the next level.
and I'm trying to teach myself the functools.reduce() function
So I'm running every time into the same error;
I'd appreciate if someone could explain to me what is the problem in my code
I'm trying to build a simple func that receive a number and return the sum of number digits
import functools
def sum_using_reduce(number):
return str(number)[0] + str(number)[1]
number = 104
print(functools.reduce(sum_using_reduce, number))

Try this:
number = 104
functools.reduce(lambda x, y: x+y, [int(i) for i in str(number)])
Output: 5
Using your example:
import functools
def sum_using_reduce(x, y) -> int:
return x + y
print(functools.reduce(sum_using_reduce, [int(i) for i in str(105)]))
Output: 6
Another approach:
import functools
def sum_using_reduce(number: int) -> int:
return functools.reduce(lambda x, y: x+y, [int(i) for i in str(number)])
print(sum_using_reduce(124))
Output: 7

In your sum_using_reduce function you are trying to sum two strings, which would simply perform concatenation. Moreover, you are providing an integer as the second argument to the reduce function, where the reduce function requires an iterable to be provided.
Below is a solution that fixes both these requirements:
from functools import reduce
number=104
print(reduce(lambda x,y:x+y,map(int,str(number))))
map(int,str(number)) transforms the number to a string (104->"104") and then turns every character in the string to an integer, returning an iterable map object ("104"->[1,0,4]).
lambda x,y:x+y is a function which takes two integers and sums them.

Related

faster method for comparing two lists element-wise

I am building a relational DB using python. So far I have two tables, as follows:
>>> df_Patient.columns
[1] Index(['NgrNr', 'FamilieNr', 'DosNr', 'Geslacht', 'FamilieNaam', 'VoorNaam',
'GeboorteDatum', 'PreBirth'],
dtype='object')
>>> df_LaboRequest.columns
[2] Index(['RequestId', 'IsComplete', 'NgrNr', 'Type', 'RequestDate', 'IntakeDate',
'ReqMgtUnit'],
dtype='object')
The two tables are quite big:
>>> df_Patient.shape
[3] (386249, 8)
>>> df_LaboRequest.shape
[4] (342225, 7)
column NgrNr on df_LaboRequest if foreign key (FK) and references the homonymous column on df_Patient. In order to avoid any integrity error, I need to make sure that all the values under df_LaboRequest[NgrNr] are in df_Patient[NgrNr].
With list comprehension I tried the following (to pick up the values that would throw an error):
[x for x in list(set(df_LaboRequest['NgrNr'])) if x not in list(set(df_Patient['NgrNr']))]
Though this is taking ages to complete. Would anyone recommend a faster method (method as a general word, as synonym for for procedure, nothing to do with the pythonic meaning of method) for such a comparison?
One-liners aren't always better.
Don't check for membership in lists. Why on earth would you create a set (which is the recommended data structure for O(1) membership checks) and then cast it to a list which has O(N) membership checks?
Make the set of df_Patient once outside the list comprehension and use that instead of making the set in every iteration
patients = set(df_Patient['NgrNr'])
lab_requests = set(df_LaboRequest['NgrNr'])
result = [x for x in lab_requests if x not in patients]
Or, if you like to use set operations, simply find the difference of both sets:
result = lab_requests - patients
Alternatively, use pandas isin() function.
patients = patients.drop_duplicates()
lab_requests = lab_requests.drop_duplicates()
result = lab_requests[~lab_requests.isin(patients)]
Let's test how much faster these changes make the code:
import pandas as pd
import random
import timeit
# Make dummy dataframes of patients and lab_requests
randoms = [random.randint(1, 1000) for _ in range(10000)]
patients = pd.DataFrame("patient{0}".format(x) for x in randoms[:5000])[0]
lab_requests = pd.DataFrame("patient{0}".format(x) for x in randoms[2000:8000])[0]
# Do it your way
def fun1(pat, lr):
return [x for x in list(set(lr)) if x not in list(set(pat))]
# Do it my way: Set operations
def fun2(pat, lr):
pat_s = set(pat)
lr_s = set(lr)
return lr_s - pat_s
# Or explicitly iterate over the set
def fun3(pat, lr):
pat_s = set(pat)
lr_s = set(lr)
return [x for x in lr_s if x not in pat_s]
# Or using pandas
def fun4(pat, lr):
pat = pat.drop_duplicates()
lr = lr.drop_duplicates()
return lr[~lr.isin(pat)]
# Make sure all 3 functions return the same thing
assert set(fun1(patients, lab_requests)) == set(fun2(patients, lab_requests)) == set(fun3(patients, lab_requests)) == set(fun4(patients, lab_requests))
# Time it
timeit.timeit('fun1(patients, lab_requests)', 'from __main__ import patients, lab_requests, fun1', number=100)
# Output: 48.36615000000165
timeit.timeit('fun2(patients, lab_requests)', 'from __main__ import patients, lab_requests, fun2', number=100)
# Output: 0.10799920000044949
timeit.timeit('fun3(patients, lab_requests)', 'from __main__ import patients, lab_requests, fun3', number=100)
# Output: 0.11038020000069082
timeit.timeit('fun4(patients, lab_requests)', 'from __main__ import patients, lab_requests, fun4', number=100)
# Output: 0.32021789999998873
Looks like we have a ~150x speedup with pandas and a ~500x speedup with set operations!
I don't have a pandas installed right now to try this. But you could try removing the list(..) cast. I don't think it provides anything meaningful to the program and sets are much faster for lookup, e.g. x in set(...), than lists.
Also you could try doing this with the pandas API rather than lists and sets, sometimes this faster. Try searching for unique. Then you could compare the size of the two columns and if it is the same, sort them and do an equality check.

Why does overloading math operators in python depend on what order it is used?

If I create a class such as 'A' below:
class A(object):
a = 1
def __truediv__(self, var):
return self.a / var
and then try to divide an int by A as:
print(3 / A())
python raises a TypeError. However, if I divide an int by this object python prints:
print(A() / 3)
python prints 0.333333.
How can I make the class work so that I can perform mathematical operations in any order?
N.B. Numpy arrays seem to be able to work both ways i.e:
import numpy as np
1 / np.arange(1, 5)
np.arange(1, 5) / 1
runs and works as expected.
Also implement the reflected dunder methods. In your case, that's __rtruediv__()

define a "derivation" function in sympy

I am trying to make a derivation Function in sympy (I am using sympy version 1.4), but I am not sure how. In particular, I am trying to define a general function (that could just take sympy variables, not functions for now) that has the following properties:
d(f+g)=d(f)+d(g)
d(f*g)=f*d(g)+d(f)*g
I have tried reading the sympy documentation on defining Functions but I am not sure how to define a Function class that has the above properties for any two symbols.
For some background/context, I know how to make derivations in Mathematica; I would just type
d[f_+g_]:=d[f]+d[g]
d[f_ g_]:=f d[g] + d[f] g
You can write your own rule. The following might get you started:
def d(e, func):
"""
>>> from sympy import x, y
>>> from sympy import Function
>>> D = Function('D')
>>> d(x + y, D)
D(x) + D(y)
"""
if e.is_Add:
return Add(*[d(a, func) for a in e.args])
elif e.is_Mul:
return Add(*[Mul(*(e.args[:i]+(d(e.args[i],func),)+e.args[i+1:]))
for i in range(len(e.args))])
else:
return func(e)
Or you could try this with a class:
class d(Function):
#classmethod
def eval(cls, e):
if e.is_Add:
return Add(*[d(a) for a in e.args])
elif e.is_Mul:
return Add(*[Mul(*(e.args[:i]+(d(e.args[i]),)+e.args[i+1:]))
for i in range(len(e.args))])
else:
return d(e, evaluate=False)
See also, linapp.

Python/Pandas element wise union of 2 Series containing sets in each element

I have 2 pandas data Series that I know are the same length. Each Series contains sets() in each element. I want to figure out a computationally efficient way to get the element wise union of these two Series' sets. I've created a simplified version of the code with fake and short Series to play with below. This implementation is a VERY inefficient way of doing this. There has GOT to be a faster way to do this. My real Series are much longer and I have to do this operation hundreds of thousands of times.
import pandas as pd
set_series_1 = pd.Series([{1,2,3}, {'a','b'}, {2.3, 5.4}])
set_series_2 = pd.Series([{2,4,7}, {'a','f','g'}, {0.0, 15.6}])
n = set_series_1.shape[0]
for i in range(0,n):
set_series_1[i] = set_series_1[i].union(set_series_2[i])
print set_series_1
>>> set_series_1
0 set([1, 2, 3, 4, 7])
1 set([a, b, g, f])
2 set([0.0, 2.3, 15.6, 5.4])
dtype: object
I've tried combining the Series into a data frame and using the apply function, but I get an error saying that sets are not supported as dataframe elements.
pir4
After testing several options, I finally came up with a good one... pir4 below.
Testing
def jed1(s1, s2):
s = s1.copy()
n = s1.shape[0]
for i in range(n):
s[i] = s2[i].union(s1[i])
return s
def pir1(s1, s2):
return pd.Series([item.union(s2[i]) for i, item in enumerate(s1.values)], s1.index)
def pir2(s1, s2):
return pd.Series([item.union(s2[i]) for i, item in s1.iteritems()], s1.index)
def pir3(s1, s2):
return s1.apply(list).add(s2.apply(list)).apply(set)
def pir4(s1, s2):
return pd.Series([set.union(*z) for z in zip(s1, s2)])

Appending result of function into a list

I'm calling a function within a function. I want it so that every time the function loops (4 instances) it adds the result to a list. In bad programmer terms, something like: for each instance of the loop, run function, add result to list and increment counter, and go again.
Can you help here?
def genDigit():
import random
digit = (random.randint(0, 9))
print(digit)
return
def genNumber():
numList = list
for counter in range(0,4):
'from here on I need to finish the function
Any pointers would be greatly appreciated. I understand in english terms how I would go about finishing this.
Kind regards,
JJP
Instead of printing the digit in your function and then returning nothing, you should return the generated digit. You can then collect those in a list, and return that list. Finally, you print the result.
import random # import once
def genDigit():
digit = random.randint(0, 9)
return digit # return the digit
def genNumber():
numList = list() # add missing ()
for counter in range(0,4):
numList.append(genDigit()) # add digits to list
return numList # return the list
print(genNumber()) # now print the entire list
Or shorter:
def genDigit():
return random.randint(0, 9)
def genNumber():
return [genDigit() for c in range(4)]

Resources