This might be a rookie mistake
but currently, I am trying to convert a float array with is quoted to an actual float array
I am getting data like "[1.0,2.0,3.0,4.0,5.0,6.0]" which I am trying to convert to [1.0,2.0,3.0,4.0,5.0]
I tried this np.asarray(quotedArray,dtype=np.float64)
but its failing with error message ValueError: could not convert string to float: "[1.0,2.0,3.0,4.0,5.0,6.0]"
You can use the json package, and its loads() function to do so:
>>> import json
>>> a = '[1.0,2.0,3.0,4.0,5.0,6.0]'
>>> a
'[1.0,2.0,3.0,4.0,5.0,6.0]'
>>> b = json.loads(a)
>>> b
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
You can use eval() even though this can sometimes give unwanted behaviour, so if you can, you should avoid quoted lists to begin with.
a = '[1.2, 2, 3.4, 5]'
a = eval(a) # a = [1.2, 2, 3.4, 5], type(a) = <class 'list'>
If you want to play around with eval() it can be used to take in variable names and function names as strings as well.
Related
One way to make a numpy array hashable is setting it to read-only. This has worked for me in the past. But when I use such a numpy array in a tuple, the whole tuple is no longer hashable, which I do not understand. Here is the sample code I put together to illustrate the problem:
import numpy as np
npArray = np.ones((1,1))
npArray.flags.writeable = False
print(npArray.flags.writeable)
keySet = (0, npArray)
print(keySet[1].flags.writeable)
myDict = {keySet : 1}
First I create a simple numpy array and set it to read-only. Then I add it to a tuple and check if it is still read-only (which it is).
When I want to use the tuple as key in a dictionary, I get the error TypeError: unhashable type: 'numpy.ndarray'.
Here is the output of my sample code:
False
False
Traceback (most recent call last):
File "test.py", line 10, in <module>
myDict = {keySet : 1}
TypeError: unhashable type: 'numpy.ndarray'
What can I do to make my tuple hashable and why does Python show this behavior in the first place?
You claim that
One way to make a numpy array hashable is setting it to read-only
but that's not actually true. Setting an array to read-only just makes it read-only. It doesn't make the array hashable, for multiple reasons.
The first reason is that an array with the writeable flag set to False is still mutable. First, you can always set writeable=True again and resume writing to it, or do more exotic things like reassign its shape even while writeable is False. Second, even without touching the array itself, you could mutate its data through another view that has writeable=True.
>>> x = numpy.arange(5)
>>> y = x[:]
>>> x.flags.writeable = False
>>> x
array([0, 1, 2, 3, 4])
>>> y[0] = 5
>>> x
array([5, 1, 2, 3, 4])
Second, for hashability to be meaningful, objects must first be equatable - == must return a boolean, and must be an equivalence relation. NumPy arrays don't do that. The purpose of hash values is to quickly locate equal objects, but when your objects don't even have a built-in notion of equality, there's not much point to providing hashes.
You're not going to get hashable tuples with arrays inside. You're not even going to get hashable arrays. The closest you can get is to put some other representation of the array's data in the tuple.
The fastest way to hash a numpy array is likely tostring.
In [11]: %timeit hash(y.tostring())
What you could do is rather than use a tuple define a class:
class KeySet(object):
def __init__(self, i, arr):
self.i = i
self.arr = arr
def __hash__(self):
return hash((self.i, hash(self.arr.tostring())))
Now you can use it in a dict:
In [21]: ks = KeySet(0, npArray)
In [22]: myDict = {ks: 1}
In [23]: myDict[ks]
Out[23]: 1
The following code works well in python2, but after migration to python3, it does not work.
How do I change this code in python3?
for i, idx in enumerate(indices):
user_id, item_id = idx
feature_seq = np.array(map(lambda x: user_id, item_id))
X[i, :len(item_id), :] = feature_seq # ---- error here ----
error:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'map'
Thank you.
In python3, Map returns a iterator, not a list.
You can also try numpy.fromiter to get an array from a map obj,only for 1-D data.
Example:
a=map(lambda x:x,range(10))
b=np.fromiter(a,dtype=np.int)
b
Ouput:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
For multidimensional arrays, refer to Reconcile np.fromiter and multidimensional arrays in Python
In PY3, map is like a generator. You need to wrap it in list() to produce a list that np.array can use: e.g.
np.array(list(map(...)))
What's going on here? I'm sure there's a simple solution/fact I'm overlooking. I just can't understand why I can't save values/changes to a NumPy array in this manner.
>>> import numpy as np
>>> memoize = []
>>> parameters = np.array([1, 2]).astype(np.float64)
>>> memoize.append(parameters)
>>> parameters -= np.array([0.5, -0.5])
>>> memoize.append(parameters)
>>> memoize
[array([ 0.5, 2.5]), array([ 0.5, 2.5])]
I expected the answer to be
[array([ 1., 2.]), array([ 0.5, 2.5])]
Does it have anything to do with a list being mutable ?
The issue you're having is that parameters is a mutable value, and you're appending multiple references to it while mutating it in place. If you rebound the variable to a new array each time, you wouldn't have an issue.
Try changing
parameters -= np.array([0.5, -0.5])
to
parameters = parameters - np.array([0.5, -0.5])
The original version makes an in-place modification to parameters. The second version makes a new array with copied data. This is probably a little slower, but it does what you want in this situation.
What is the fastest way of converting a list of elements of type numpy.float64 to type float? I am currently using the straightforward for loop iteration in conjunction with float().
I came across this post: Converting numpy dtypes to native python types, however my question isn't one of how to convert types in python but rather more specifically how to best convert an entire list of one type to another in the quickest manner possible in python (i.e. in this specific case numpy.float64 to float). I was hoping for some secret python machinery that I hadn't come across that could do it all at once :)
The tolist() method should do what you want. If you have a numpy array, just call tolist():
In [17]: a
Out[17]:
array([ 0. , 0.14285714, 0.28571429, 0.42857143, 0.57142857,
0.71428571, 0.85714286, 1. , 1.14285714, 1.28571429,
1.42857143, 1.57142857, 1.71428571, 1.85714286, 2. ])
In [18]: a.dtype
Out[18]: dtype('float64')
In [19]: b = a.tolist()
In [20]: b
Out[20]:
[0.0,
0.14285714285714285,
0.2857142857142857,
0.42857142857142855,
0.5714285714285714,
0.7142857142857142,
0.8571428571428571,
1.0,
1.1428571428571428,
1.2857142857142856,
1.4285714285714284,
1.5714285714285714,
1.7142857142857142,
1.857142857142857,
2.0]
In [21]: type(b)
Out[21]: list
In [22]: type(b[0])
Out[22]: float
If, in fact, you really have python list of numpy.float64 objects, then #Alexander's answer is great, or you could convert the list to an array and then use the tolist() method. E.g.
In [46]: c
Out[46]:
[0.0,
0.33333333333333331,
0.66666666666666663,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]
In [47]: type(c)
Out[47]: list
In [48]: type(c[0])
Out[48]: numpy.float64
#Alexander's suggestion, a list comprehension:
In [49]: [float(v) for v in c]
Out[49]:
[0.0,
0.3333333333333333,
0.6666666666666666,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]
Or, convert to an array and then use the tolist() method.
In [50]: np.array(c).tolist()
Out[50]:
[0.0,
0.3333333333333333,
0.6666666666666666,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]
If you are concerned with the speed, here's a comparison. The input, x, is a python list of numpy.float64 objects:
In [8]: type(x)
Out[8]: list
In [9]: len(x)
Out[9]: 1000
In [10]: type(x[0])
Out[10]: numpy.float64
Timing for the list comprehension:
In [11]: %timeit list1 = [float(v) for v in x]
10000 loops, best of 3: 109 µs per loop
Timing for conversion to numpy array and then tolist():
In [12]: %timeit list2 = np.array(x).tolist()
10000 loops, best of 3: 70.5 µs per loop
So it is faster to convert the list to an array and then call tolist().
You could use a list comprehension:
floats = [float(np_float) for np_float in np_float_list]
So out of the possible solutions I've come across (big thanks to Warren Weckesser and Alexander for pointing out all of the best possible approaches) I ran my current method and that presented by Alexander to give a simple comparison for runtimes (the two choices come as a result of the fact that I have a true list of elements of numpy.float64 and wish to convert them to float speedily):
2 approaches covered: list comprehension and basic for loop iteration
First here's the code:
import datetime
import numpy
list1 = []
for i in range(0,1000):
list1.append(numpy.float64(i))
list2 = []
t_init = time.time()
for num in list1:
list2.append(float(num))
t_1 = time.time()
list2 = [float(np_float) for np_float in list1]
t_2 = time.time()
print("t1 run time: {}".format(t_1-t_init))
print("t2 run time: {}".format(t_2-t_1))
I ran four times to give a quick set of results:
>>> run 1
t1 run time: 0.000179290771484375
t2 run time: 0.0001533031463623047
Python 3.4.0
>>> run 2
t1 run time: 0.00018739700317382812
t2 run time: 0.0001518726348876953
Python 3.4.0
>>> run 3
t1 run time: 0.00017976760864257812
t2 run time: 0.0001513957977294922
Python 3.4.0
>>> run 4
t1 run time: 0.0002455711364746094
t2 run time: 0.00015997886657714844
Python 3.4.0
Clearly to convert a true list of numpy.float64 to float, the optimal approach is to use python's list comprehension.
I'm currently using np.loadtxt to load some mixed data into a structured numpy array. I do some calculations on a few of the columns to output later. For compatibility reasons I need to maintain a specific output format so I'd like to insert those columns at specific points and use np.savetxt to export the array in one shot.
A simple setup:
import numpy as np
x = np.zeros((2,),dtype=('i4,f4,a10'))
x[:] = [(1,2.,'Hello'),(2,3.,'World')]
newcol = ['abc','def']
For this example I'd like to make newcol the 2nd column. I'm very new to Python (coming from MATLAB). From my searching all I've been able to find so far are ways to append newcol to the end of x to make it the last column, or x to newcol to make it the first column. I also turned up np.insert but it doesn't seem to work on a structured array because it's technically a 1D array (from my understanding).
What's the most efficient way to accomplish this?
EDIT1:
I investigated np.savetxt a little further and it seems like it can't be used with a structured array, so I'm assuming I would need to loop through and write each row with f.write. I could specify each column explicitly (by field name) with that approach and not have to worry about the order in my structured array, but that doesn't seem like a very generic solution.
For the above example my desired output would be:
1, abc, 2.0, Hello
2, def, 3.0, World
This is a way to add a field to the array, at the position you require:
from numpy import zeros, empty
def insert_dtype(x, position, new_dtype, new_column):
if x.dtype.fields is None:
raise ValueError, "`x' must be a structured numpy array"
new_desc = x.dtype.descr
new_desc.insert(position, new_dtype)
y = empty(x.shape, dtype=new_desc)
for name in x.dtype.names:
y[name] = x[name]
y[new_dtype[0]] = new_column
return y
x = zeros((2,), dtype='i4,f4,a10')
x[:] = [(1, 2., 'Hello'), (2, 3., 'World')]
new_dt = ('my_alphabet', '|S3')
new_col = ['abc', 'def']
x = insert_dtype(x, 1, new_dt, new_col)
Now x looks like
array([(1, 'abc', 2.0, 'Hello'), (2, 'def', 3.0, 'World')],
dtype=[('f0', '<i4'), ('my_alphabet', 'S3'), ('f1', '<f4'), ('f2', 'S10')])
The solution is adapted from here.
To print the recarray to file, you could use something like:
from matplotlib.mlab import rec2csv
rec2csv(x,'foo.txt')