Why doesn't a list append reflect variable changes? - python-3.x

What's going on here? I'm sure there's a simple solution/fact I'm overlooking. I just can't understand why I can't save values/changes to a NumPy array in this manner.
>>> import numpy as np
>>> memoize = []
>>> parameters = np.array([1, 2]).astype(np.float64)
>>> memoize.append(parameters)
>>> parameters -= np.array([0.5, -0.5])
>>> memoize.append(parameters)
>>> memoize
[array([ 0.5, 2.5]), array([ 0.5, 2.5])]
I expected the answer to be
[array([ 1., 2.]), array([ 0.5, 2.5])]
Does it have anything to do with a list being mutable ?

The issue you're having is that parameters is a mutable value, and you're appending multiple references to it while mutating it in place. If you rebound the variable to a new array each time, you wouldn't have an issue.
Try changing
parameters -= np.array([0.5, -0.5])
to
parameters = parameters - np.array([0.5, -0.5])
The original version makes an in-place modification to parameters. The second version makes a new array with copied data. This is probably a little slower, but it does what you want in this situation.

Related

Numpy , Quoted float array to float array

This might be a rookie mistake
but currently, I am trying to convert a float array with is quoted to an actual float array
I am getting data like "[1.0,2.0,3.0,4.0,5.0,6.0]" which I am trying to convert to [1.0,2.0,3.0,4.0,5.0]
I tried this np.asarray(quotedArray,dtype=np.float64)
but its failing with error message ValueError: could not convert string to float: "[1.0,2.0,3.0,4.0,5.0,6.0]"
You can use the json package, and its loads() function to do so:
>>> import json
>>> a = '[1.0,2.0,3.0,4.0,5.0,6.0]'
>>> a
'[1.0,2.0,3.0,4.0,5.0,6.0]'
>>> b = json.loads(a)
>>> b
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
You can use eval() even though this can sometimes give unwanted behaviour, so if you can, you should avoid quoted lists to begin with.
a = '[1.2, 2, 3.4, 5]'
a = eval(a) # a = [1.2, 2, 3.4, 5], type(a) = <class 'list'>
If you want to play around with eval() it can be used to take in variable names and function names as strings as well.

Why is taking a slice of a list which is assigned to another list not changing the original?

I have a class that is a representation of a mathematical tensor. The tensor in the class, is stored as a single list, not lists inside another list. That means [[1, 2, 3], [4, 5, 6]] would be stored as [1, 2, 3, 4, 5, 6].
I've made a __setitem__() function and a function to handle taking slices of this tensor while it's in single list format. For example slice(1, None, None) would become slice(3, None, None) for the list mentioned above. However when I assign this slice a new value, the original tensor isn't updated.
Here is what the simplified code looks like
class Tensor:
def __init__(self, tensor):
self.tensor = tensor # Here I would flatten it, but for now imagine it's already flattened.
def __setitem__(self, slices, value):
slices = [slices]
temp_tensor = self.tensor # any changes to temp_tensor should also change self.tensor.
for s in slices: # Here I would call self.slices_to_index(), but this is to keep the code simple.
temp_tensor = temp_tensor[slice]
temp_tensor = value # In my mind, this should have also changed self.tensor, but it hasn't.
Maybe i'm just being stupid and can't see why this isn't working. Maybe my actual questions isn't just ' why doesn't this work?' but also 'is there a better way to do this?'. Thanks for any help you can give me.
NOTES:
Each 'dimension' of the list must have the same shape, so [[1, 2, 3], [4, 5]] isn't allowed.
This code is massively simplified as there are many other helper functions and stuff like that.
in __init__() I would flatten the list but as I just said to keep things simple I left that out, along with self.slice_to_index().
You should not think of python variables as in c++ or java. Think of them as labels you place on values. Check this example:
>>> l = []
>>> l.append
<built-in method append of list object at 0x7fbb0d40cf88>
>>> l.append(10)
>>> l
[10]
>>> ll = l
>>> ll.append(10)
>>> l
[10, 10]
>>> ll
[10, 10]
>>> ll = ["foo"]
>>> l
[10, 10]
As you can see, ll variable first points to the same l list but later we just make it point to another list. Modifying the later ll won't modify the original list pointed by l.
So, in your case if you want self.tensor to point to a new value, just do it:
class Tensor:
def __init__(self, tensor):
self.tensor = tensor # Here I would flatten it, but for now imagine it's already flattened.
def __setitem__(self, slices, value):
slices = [slices]
temp_tensor = self.tensor # any changes to the list pointed by temp_tensor will be reflected in self.tensor since it is the same list
for s in slices:
temp_tensor = temp_tensor[slice]
self.tensor = value

Array entry used in function turns from nan to 0 numpy python

I made a simple function that produces a weighted average of several time series using supplied weights. It is designed to handle missing values (NaNs), which is why I am not using numpy's supplied average function.
However, when I feed it my array containing missing values, the array has its nan values replaced by 0s! I would have assumed that since I am changing the name of the array and it is not a global variable this should not happen. I want my X array to retain its original form including the nan value
I am a relative novice using python (obviously).
Example:
X = np.array([[1, 2, 3], [1, 2, 3], [1, 2, np.nan]]) # 3 time series to be weighted together
weights = np.array([[1,1,1]]) # simple example with weights for each series as 1
def WeightedMeanNaN(Tseries, weights):
## calculates weighted mean
N_Tseries = Tseries
Weights = np.repeat(weights, len(N_Tseries), axis=0) # make a vector of weights matching size of time series
loc = np.where(np.isnan(N_Tseries)) # get location of nans
Weights[loc] = 0
N_Tseries[loc] = 0
Weights = Weights/Weights.sum(axis=1)[:,None] # normalize each row so that weights sum to 1
WeightedAve = np.multiply(N_Tseries,Weights)
WeightedAve = WeightedAve.sum(axis=1)
return WeightedAve
WeightedMeanNaN(Tseries = X, weights = weights)
Out[161]: array([2. , 2. , 1.5])
In:X
Out:
array([[1., 2., 3.],
[1., 2., 3.],
[1., 2., 0.]]) # no longer nan!! ```
Where you call
loc = np.where(np.isnan(N_Tseries)) # get location of nans
Weights[loc] = 0
N_Tseries[loc] = 0
You remove all NaNs and set them to zeros.
To reverse this you could iterate over the array and replace zeros with NaNs.
However, this would also set regular zeros to Nans.
So it turns out this is a mistake caused by me being used to working in Matlab. Python treats arguments supplied to the function as pointers to the original object. In contrast, Matlab creates copies that are discarded when the function ends.
I solved my problem by adding ".copy()" when assigning variables in the function, so that the first line in the function above becomes:
N_Tseries = Tseries.copy().
However, one thing that puzzles me is that some people have suggested that using Tseries[:] should also create a copy of Tseries rather than a pointer to the original variable. This did not work for me though.
I found this answer useful:
Python function not supposed to change a global variable

read the scipy.beta distribution parameters from a scipy.stats._continuous_distns.beta_gen object

Having an instance of the beta object, how do I get back the parameters a and b?
There are properties a and b, but it seems they mean something else as I expected:
>>> import scipy
>>> scipy.__version__
'0.19.1'
>>> from scipy import stats
>>> my_beta = stats.beta(a=1, b=5)
>>> my_beta.a, my_beta.b
(0.0, 1.0)
Is there a way to get the parameters of the distribution? I could always fit a huge rvs sample but that seems silly :)
When you create a "frozen" distribution with a call such as my_beta = stats.beta(a=1, b=5), the positional and keyword arguments are saved as the attributes args and kwds, respectively, on the returned object. So in your case, you can access those values in the dictionary my_beta.kwds:
In [10]: from scipy import stats
In [11]: my_beta = stats.beta(a=1, b=5)
In [12]: my_beta.kwds
Out[12]: {'a': 1, 'b': 5}
The attributes my_beta.a and my_beta.b are, as you guessed, something different. They define the end points of the support of the probability distribution:
In [13]: my_beta.a
Out[13]: 0.0
In [14]: my_beta.b
Out[14]: 1.0

How to make a tuple including a numpy array hashable?

One way to make a numpy array hashable is setting it to read-only. This has worked for me in the past. But when I use such a numpy array in a tuple, the whole tuple is no longer hashable, which I do not understand. Here is the sample code I put together to illustrate the problem:
import numpy as np
npArray = np.ones((1,1))
npArray.flags.writeable = False
print(npArray.flags.writeable)
keySet = (0, npArray)
print(keySet[1].flags.writeable)
myDict = {keySet : 1}
First I create a simple numpy array and set it to read-only. Then I add it to a tuple and check if it is still read-only (which it is).
When I want to use the tuple as key in a dictionary, I get the error TypeError: unhashable type: 'numpy.ndarray'.
Here is the output of my sample code:
False
False
Traceback (most recent call last):
File "test.py", line 10, in <module>
myDict = {keySet : 1}
TypeError: unhashable type: 'numpy.ndarray'
What can I do to make my tuple hashable and why does Python show this behavior in the first place?
You claim that
One way to make a numpy array hashable is setting it to read-only
but that's not actually true. Setting an array to read-only just makes it read-only. It doesn't make the array hashable, for multiple reasons.
The first reason is that an array with the writeable flag set to False is still mutable. First, you can always set writeable=True again and resume writing to it, or do more exotic things like reassign its shape even while writeable is False. Second, even without touching the array itself, you could mutate its data through another view that has writeable=True.
>>> x = numpy.arange(5)
>>> y = x[:]
>>> x.flags.writeable = False
>>> x
array([0, 1, 2, 3, 4])
>>> y[0] = 5
>>> x
array([5, 1, 2, 3, 4])
Second, for hashability to be meaningful, objects must first be equatable - == must return a boolean, and must be an equivalence relation. NumPy arrays don't do that. The purpose of hash values is to quickly locate equal objects, but when your objects don't even have a built-in notion of equality, there's not much point to providing hashes.
You're not going to get hashable tuples with arrays inside. You're not even going to get hashable arrays. The closest you can get is to put some other representation of the array's data in the tuple.
The fastest way to hash a numpy array is likely tostring.
In [11]: %timeit hash(y.tostring())
What you could do is rather than use a tuple define a class:
class KeySet(object):
def __init__(self, i, arr):
self.i = i
self.arr = arr
def __hash__(self):
return hash((self.i, hash(self.arr.tostring())))
Now you can use it in a dict:
In [21]: ks = KeySet(0, npArray)
In [22]: myDict = {ks: 1}
In [23]: myDict[ks]
Out[23]: 1

Resources