Convert list of numpy.float64 to float in Python quickly - python-3.x

What is the fastest way of converting a list of elements of type numpy.float64 to type float? I am currently using the straightforward for loop iteration in conjunction with float().
I came across this post: Converting numpy dtypes to native python types, however my question isn't one of how to convert types in python but rather more specifically how to best convert an entire list of one type to another in the quickest manner possible in python (i.e. in this specific case numpy.float64 to float). I was hoping for some secret python machinery that I hadn't come across that could do it all at once :)

The tolist() method should do what you want. If you have a numpy array, just call tolist():
In [17]: a
Out[17]:
array([ 0. , 0.14285714, 0.28571429, 0.42857143, 0.57142857,
0.71428571, 0.85714286, 1. , 1.14285714, 1.28571429,
1.42857143, 1.57142857, 1.71428571, 1.85714286, 2. ])
In [18]: a.dtype
Out[18]: dtype('float64')
In [19]: b = a.tolist()
In [20]: b
Out[20]:
[0.0,
0.14285714285714285,
0.2857142857142857,
0.42857142857142855,
0.5714285714285714,
0.7142857142857142,
0.8571428571428571,
1.0,
1.1428571428571428,
1.2857142857142856,
1.4285714285714284,
1.5714285714285714,
1.7142857142857142,
1.857142857142857,
2.0]
In [21]: type(b)
Out[21]: list
In [22]: type(b[0])
Out[22]: float
If, in fact, you really have python list of numpy.float64 objects, then #Alexander's answer is great, or you could convert the list to an array and then use the tolist() method. E.g.
In [46]: c
Out[46]:
[0.0,
0.33333333333333331,
0.66666666666666663,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]
In [47]: type(c)
Out[47]: list
In [48]: type(c[0])
Out[48]: numpy.float64
#Alexander's suggestion, a list comprehension:
In [49]: [float(v) for v in c]
Out[49]:
[0.0,
0.3333333333333333,
0.6666666666666666,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]
Or, convert to an array and then use the tolist() method.
In [50]: np.array(c).tolist()
Out[50]:
[0.0,
0.3333333333333333,
0.6666666666666666,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]
If you are concerned with the speed, here's a comparison. The input, x, is a python list of numpy.float64 objects:
In [8]: type(x)
Out[8]: list
In [9]: len(x)
Out[9]: 1000
In [10]: type(x[0])
Out[10]: numpy.float64
Timing for the list comprehension:
In [11]: %timeit list1 = [float(v) for v in x]
10000 loops, best of 3: 109 µs per loop
Timing for conversion to numpy array and then tolist():
In [12]: %timeit list2 = np.array(x).tolist()
10000 loops, best of 3: 70.5 µs per loop
So it is faster to convert the list to an array and then call tolist().

You could use a list comprehension:
floats = [float(np_float) for np_float in np_float_list]

So out of the possible solutions I've come across (big thanks to Warren Weckesser and Alexander for pointing out all of the best possible approaches) I ran my current method and that presented by Alexander to give a simple comparison for runtimes (the two choices come as a result of the fact that I have a true list of elements of numpy.float64 and wish to convert them to float speedily):
2 approaches covered: list comprehension and basic for loop iteration
First here's the code:
import datetime
import numpy
list1 = []
for i in range(0,1000):
list1.append(numpy.float64(i))
list2 = []
t_init = time.time()
for num in list1:
list2.append(float(num))
t_1 = time.time()
list2 = [float(np_float) for np_float in list1]
t_2 = time.time()
print("t1 run time: {}".format(t_1-t_init))
print("t2 run time: {}".format(t_2-t_1))
I ran four times to give a quick set of results:
>>> run 1
t1 run time: 0.000179290771484375
t2 run time: 0.0001533031463623047
Python 3.4.0
>>> run 2
t1 run time: 0.00018739700317382812
t2 run time: 0.0001518726348876953
Python 3.4.0
>>> run 3
t1 run time: 0.00017976760864257812
t2 run time: 0.0001513957977294922
Python 3.4.0
>>> run 4
t1 run time: 0.0002455711364746094
t2 run time: 0.00015997886657714844
Python 3.4.0
Clearly to convert a true list of numpy.float64 to float, the optimal approach is to use python's list comprehension.

Related

Spurious exponents in sympy expressions

I build sp expressions (import sympy as sp) by multiplying / dividing others.
Sometimes I find results like (expr1)
1.0*K1**1.11e-16*K2**1.11e-16/K3**1.11e-16*K4**0.8
while the correct expression is (expr2)
1.0*K4**0.8
My first question is what are possible ways of converting expr1 into expr2?
I can set an arbitrary threshold for the exponents of, say, 1e-10 to consider them spurious.
I can concoct a recursive checking of the expression, but perhaps there is something more concise/pythonic.
For the time being I can assume all powers are only at the first mul/div level, there are no additions/subtractions involved.
My second question is what are possible reasons for these spurious powers to appear?
There are no sp.N involved, e.g.
The way I build my expressions is quite intricate to post here, so I welcome possible causes for this.
I guess this is well within reach of knowledgeable fellows.
In the meantime, I will try conceiving reasonable and representative MCVEs.
Related (some, marginally):
Evaluating coefficients but not exponents in sympy
How to isolate the exponent in Sympy (how to `match` large formulas in Sympy)
Sympy - Find all expressions of a certain level
As noted in the comments this happens because you are using floats rather than exact rational numbers:
In [16]: from sympy import *
In [17]: x = symbols('x')
In [18]: x**0.11/x**0.1/x**0.01
Out[18]:
-5.20417042793042e-18
x
The problem here is that in (binary) floating point 0.11 for example is not exactly equal to 0.1 + 0.01 because in fact none of these numbers floats is equal to the intended number that cannot be represented exactly.
The best approach in general is that in cases where you did not mean for a number to be approximate you should use exact rational numbers. There are different ways of creating exact numbers rational numbers:
In [19]: x**Rational('0.11')/x**Rational('0.1')/x**Rational('0.01')
Out[19]: 1
In [20]: x**Rational(11,100)/x**Rational(10,100)/x**Rational(1,100)
Out[20]: 1
In [22]: x**(S(11)/100)/x**(S(10)/100)/x**(S(1)/100)
Out[22]: 1
Another approach also suggested in the comments is to use nsimplify:
In [23]: x**0.11
Out[23]:
0.11
x
In [24]: nsimplify(x**0.11)
Out[24]:
11
───
100
x
In [25]: x**Rational(0.11)
Out[25]:
7926335344172073
─────────────────
72057594037927936
x
Observe that the exact value of the float 0.11 is not in fact equal to the mathematical number 0.11. What nsimplify does is to try to guess what number you really intended so it in fact performs an inexact conversion from float to Rational. This is useful as a convenience but cannot be expected to be always reliable so it is better to use rational numbers in the first place and have exact calculations throughout.
Another reason you might have these floats is because of calling evalf:
In [35]: e = sqrt(x)*pi
In [36]: e
Out[36]: π⋅√x
In [37]: e.evalf()
Out[37]:
0.5
3.14159265358979⋅x
In [38]: nfloat(e)
Out[38]: 3.14159265358979⋅√x
Here the nfloat function can be used to avoid calling evalf on the exponent. This is useful because having floats in exponents of symbols is particularly problematic.
Another approach is using functions for making roots e.g.:
In [39]: x**0.5
Out[39]:
0.5
x
In [40]: x**(1/3)
Out[40]:
0.333333333333333
x
In [41]: sqrt(x)
Out[41]: √x
In [42]: cbrt(x)
Out[42]:
3 ___In [39]: x**0.5
Out[39]:
0.5
x
In [40]: x**(1/3)
Out[40]:
0.333333333333333
x
In [41]: sqrt(x)
Out[41]: √x
In [42]: cbrt(x)
Out[42]:
3 ___
╲╱ x
In [43]: root(x, 4)
Out[43]:
4 ___
╲╱ x
╲╱ x
In [43]: root(x, 4)
Out[43]:
4 ___
╲╱ x
Finally if you have an expression like the one that you showed it is not possible to use nsimplify at that stage because it will not guess that small numbers are supposed to be zero. Instead you can manually replace small floats though:
In [49]: K1, K2, K3, K4 = symbols('K1:5')
In [50]: e = 1.0*K1**1.11e-16*K2**1.11e-16/K3**1.11e-16*K4**0.8
In [51]: e
Out[51]:
1.11e-16 1.11e-16 -1.11e-16 0.8
1.0⋅K₁ ⋅K₂ ⋅K₃ ⋅K₄
In [52]: nsimplify(e)
Out[52]:
111 111
─────────────────── ───────────────────
1000000000000000000 1000000000000000000 4/5
K₁ ⋅K₂ ⋅K₄
─────────────────────────────────────────────────
111
───────────────────
1000000000000000000
K₃
In [53]: e.replace(lambda t: isinstance(t, Float), lambda f: f if abs(f) > 1e-10 else 0)
Out[53]:
0.8
1.0⋅K₄

Using Pandas " | " operator between two boolean Series objects behaving strangely

I have two large pandas Series.
In [32]: mask.shape
Out[32]: (13919455,)
In [33]: t.shape
Out[33]: (13919455,)
Both are bool arrays, mask is only False, while t contains a few True values
In [28]: sum(mask)
Out[28]: 0
In [29]: sum(t)
Out[29]: 7724
I would expect that when I apply the pandas OR operator, | , I would get a sum of 7724 and that the operator is commutative.
However, I get the following result:
In [44]: sum(mask|t)
Out[44]: 7565
In [45]: sum(t | mask)
Out[45]: 7724
Is this a bug?
I just figured this out, its a "feature" of how pandas must do OR operations.
It turned out that I had previously dropped some rows from "t", and while it was the same size as the other variable, its index was slightly larger.
After dropping the index to a default using Series.reset_index(), I get the results initially expected.

Numpy , Quoted float array to float array

This might be a rookie mistake
but currently, I am trying to convert a float array with is quoted to an actual float array
I am getting data like "[1.0,2.0,3.0,4.0,5.0,6.0]" which I am trying to convert to [1.0,2.0,3.0,4.0,5.0]
I tried this np.asarray(quotedArray,dtype=np.float64)
but its failing with error message ValueError: could not convert string to float: "[1.0,2.0,3.0,4.0,5.0,6.0]"
You can use the json package, and its loads() function to do so:
>>> import json
>>> a = '[1.0,2.0,3.0,4.0,5.0,6.0]'
>>> a
'[1.0,2.0,3.0,4.0,5.0,6.0]'
>>> b = json.loads(a)
>>> b
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
You can use eval() even though this can sometimes give unwanted behaviour, so if you can, you should avoid quoted lists to begin with.
a = '[1.2, 2, 3.4, 5]'
a = eval(a) # a = [1.2, 2, 3.4, 5], type(a) = <class 'list'>
If you want to play around with eval() it can be used to take in variable names and function names as strings as well.

read the scipy.beta distribution parameters from a scipy.stats._continuous_distns.beta_gen object

Having an instance of the beta object, how do I get back the parameters a and b?
There are properties a and b, but it seems they mean something else as I expected:
>>> import scipy
>>> scipy.__version__
'0.19.1'
>>> from scipy import stats
>>> my_beta = stats.beta(a=1, b=5)
>>> my_beta.a, my_beta.b
(0.0, 1.0)
Is there a way to get the parameters of the distribution? I could always fit a huge rvs sample but that seems silly :)
When you create a "frozen" distribution with a call such as my_beta = stats.beta(a=1, b=5), the positional and keyword arguments are saved as the attributes args and kwds, respectively, on the returned object. So in your case, you can access those values in the dictionary my_beta.kwds:
In [10]: from scipy import stats
In [11]: my_beta = stats.beta(a=1, b=5)
In [12]: my_beta.kwds
Out[12]: {'a': 1, 'b': 5}
The attributes my_beta.a and my_beta.b are, as you guessed, something different. They define the end points of the support of the probability distribution:
In [13]: my_beta.a
Out[13]: 0.0
In [14]: my_beta.b
Out[14]: 1.0

argsort on numpy.array as generator

I am new in python, so may be I do something wrong. Let me first explain what I want.
I have a huge 1d numpy.array with some values and I need to know the indices of first n smallest values. I need them for later computation. Of course I can just do something like ind = numpy.argsort(hugearray)[:n].
The problem is that beforehand I don't know how many indices I need, my computations are iterative and fetch one by one index til there are enough for computation.
Another thing is, that I want a lazy argsort to avoid creating new entire array of argsorted values and prevent unnesessary searching, so I thought of a generator. But truly I don't know how to do it with a numpy.array.
UPD: from hpaulj answer, I tried to create a generator:
def gargsort(arr):
arr=arr.copy()
for i in range(len(arr)):
k = np.argmin(arr)
arra[k] = np.iinfo(arr[k]).max
yield k
May be it's possible to do it better?
Here's an iterative approach that appears to be faster than argsort, provided n isn't too large:
In [135]: arr = np.arange(200000)
In [136]: np.random.shuffle(arr)
In [137]: def foo(arr):
...: arr=arr.copy()
...: alist=[]
...: for i in range(10):
...: k=np.argmin(arr)
...: alist.append(k)
...: arr[k]=200000
...: return alist
...:
In [138]: foo(arr)
Out[138]: [176806, 180397, 139992, 151809, 59931, 59866, 130026, 191357, 84166, 130359]
In [139]: np.argsort(arr)[:10]
Out[139]:
array([176806, 180397, 139992, 151809, 59931, 59866, 130026, 191357,
84166, 130359], dtype=int32)
In [140]: timeit np.argsort(arr)[:10]
100 loops, best of 3: 15.8 ms per loop
In [141]: timeit foo(arr)
1000 loops, best of 3: 1.69 ms per loop
(I'll comment later if needed).

Resources