Difference of two IntVectors in Rpy

Difference of two IntVectors in Rpy - rpy2

a and b are both rpy2 IntVectors:
<IntVector - Python:0x10676dfc8 / R:0x7fc714d64948>
[ 81, NA_integer_, NA_integer_, ..., 120, 46, NA_integer_]
How can I calculate the b - a difference? I want the result as an IntVector.

Try using the R operator attribute .ro:
In [1]: from rpy2 import robjects
In [2]: x = robjects.IntVector(range(10))
In [3]: y = robjects.IntVector(range(10))
In [4]: x.ro-y
Out[4]:
<IntVector - Python:0x1067d3830 / R:0x102d6ef20>
[ 0, 0, 0, ..., 0, 0, 0]
In [5]: x.ro+y
Out[5]:
<IntVector - Python:0x1067d3cf8 / R:0x102d6eec8>
[ 0, 2, 4, ..., 14, 16, 18]

subtract = r('''function(x, y) x - y''')
subtract(b, a)
The good about this solution is that it can not only handle IntVectors, but any R type.
The bad is that passing commands as strings to the R interpreter is ugly.

Related

Numpy choice based on another array

I want to select some elements from an array main_array if the indexes correspond indexes with a True value in another array. For example y should contain [14,15,16] in arbitrary order
import numpy as np
main_array = np.array([11,12,13,14,15,16])
selector = np.array([0,1,2,3,3,3])
x = np.random.choice(main_array, 3, replace=False) # This works
y = np.random.choice(main_array, 3, replace=False, p=np.where(selector>2)) # This fails
However, I get ValueError: 'p' must be 1-dimensional
What is the correct way to limit selection to indexes based on another array?

A way to do it is just make by parts:
import numpy as np
main_array = np.array([11, 12, 13, 14, 15, 16])
selector = np.array([0, 1, 2, 3, 3, 3])
x = np.random.choice(main_array, 3, replace=False)
z = main_array[selector > 2]
y = np.random.choice(z, len(z), replace=False)
print(f"x={x}")
print(f"z={z}")
print(f"y={y}")
The output is
x=[16 14 13]
z=[14 15 16]
y=[16 15 14]
Another way to make it is to put the probabilities equal to zero where the mask doesn't apply:
import numpy as np
main_array = np.array([11, 12, 13, 14, 15, 16])
selector = np.array([0, 1, 2, 3, 3, 3])
x = np.random.choice(main_array, 3, replace=False)
p = 1 * (selector > 2)
y = np.random.choice(main_array, 3, replace=False, p=p / np.sum(p))
print(y)

Array conforming shape of a given variable

I need to do some calculations with a NetCDF file.
So I have two variables with following dimensions and sizes:
A [time | 1] x [lev | 12] x [lat | 84] x [lon | 228]
B [lev | 12]
What I need is to produce a new array, C, that is shaped as (1,12,84,228) where B contents are propagated to all dimensions of A.
Usually, this is easily done in NCL with the conform function. I am not sure what is the equivalent of this in Python.
Thank you.

The numpy.broadcast_to function can do something like this, although in this case it does require B to have a couple of extra trailing size 1 dimension added to it to satisfy the numpy broadcasting rules
>>> import numpy
>>> B = numpy.arange(12).reshape(12, 1, 1)
>>> B
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> B = B.reshape(12, 1, 1)
>>> B.shape
(12, 1, 1)
>>> C = numpy.broadcast_to(b, (1, 12, 84, 228))
>>> C.shape
(1, 12, 84, 228)
>>> C[0, :, 0, 0]
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> C[-1, :, -1, -1]
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

How to find the index position of items in a pandas list which satisfy a certain condition?

How can I find the index position of items in a list which satisfy a certain condition?
Like suppose, I have a list like:
myList = [0, 100, 335, 240, 300, 450, 80, 500, 200]
And the condition is to find out the position of all elements within myList which lie between 0 and 300 (both inclusive).
I am expecting the output as:
output = [0, 1, 3, 4, 6, 8]
How can I do this in pandas?
Also, how to find out the index of the maximum element in the subset of elements which satisfy the condition? Like, in the above case, out of the elements which satisfy the given condition 300 is the maximum and its index is 4. So, need to retrieve its index.
I have been trying many ways but not getting the desired result. Please help, I am new to the programming world.

You can try this,
>>> import pandas as pd
>>> df = pd.DataFrame({'a': [0, 100, 335, 240, 300, 450, 80, 500, 200]})
>>> index = list(df[(df.a >= 0) & (df.a <= 300)].index)
>>> df.loc[index,].idxmax()
a 4
dtype: int64
or using the list,
>>> l = [0, 100, 335, 240, 300, 450, 80, 500, 200]
>>> index = [(i, v) for i, v in enumerate(l) if v >= 0 and v <= 300]
>>> [t[0] for t in index]
[0, 1, 3, 4, 6, 8]
>>> sorted(index, key=lambda x: x[1])[-1][0]
4
As Grzegorz Skibinski says, if we use numpy to get rid of many computations,
>>> import numpy as np
>>> l = [0, 100, 335, 240, 300, 450, 80, 500, 200]
>>> index = np.array([[i, v] for i, v in enumerate(l) if v >= 0 and v <= 300])
>>> index[:,0]
array([0, 1, 3, 4, 6, 8])
>>> index[index.argmax(0)[1]][0]
4

You can use numpy for that purpose:
import numpy as np
myList =np.array( [0, 100, 335, 240, 300, 450, 80, 500, 200])
res=np.where((myList>=0)&(myList<=300))[0]
print(res)
###and to get maximum:
res2=res[myList[res].argmax()]
print(res2)
Output:
[0 1 3 4 6 8]
4
[Program finished]

This is between in pandas:
myList = [0, 100, 335, 240, 300, 450, 80, 500, 200]
s= pd.Series(myList)
s.index[s.between(0,300)]
Output:
Int64Index([0, 1, 3, 4, 6, 8], dtype='int64')

What is this operation in numpy called?

I've been going over the numpy docs looking for a specific operation. The words I would use for this are "overlay" or "mask" but the numpy concepts of those words don't seem to match mine.
I want to take two arrays, one dense and one sparse and combine them thusly:
[ 1, 2, 3, 4, 5 ]
X [ N, N, 10, N, 12 ]
= [ 1, 2, 10, 4, 12 ]
where X is the operation and N is None, or Null, -1, or some other special character.
How is this achieved in numpy/python3?

You can use np.where:
# pick special value
N = -1
dns = [ 1, 2, 3, 4, 5 ]
sprs = [ N, N, 10, N, 12 ]
# this is important otherwise the comparison below
# is not done element by element
sprs = np.array(sprs)
# tada!
np.where(sprs==N,dns,sprs)
# array([ 1, 2, 10, 4, 12])
When called with three arguments m,a,b where "mixes" a and b taking elements from a where m is True and from b where it is False.

You can "fill" the masked array, with np.ma.filled(..) [numpy-doc], for example:
>>> a
array([1, 2, 3, 4, 5])
>>> b
masked_array(data=[--, --, 10, --, 12],
mask=[ True, True, False, True, False],
fill_value=999999)
>>> b.filled(a)
array([ 1, 2, 10, 4, 12])
>>> np.ma.filled(b, a)
array([ 1, 2, 10, 4, 12])
Here we thus fill the masked values from b with the corresponding values of a.

numpy apply_along_axis vectorisation

I am trying to implement a function that takes each row in a numpy 2d array and returns me scalar result of a certain calculations. My current code looks like the following:
img = np.array([
[0, 5, 70, 0, 0, 0 ],
[10, 50, 4, 4, 2, 0 ],
[50, 10, 1, 42, 40, 1 ],
[10, 0, 0, 6, 85, 64],
[0, 0, 0, 1, 2, 90]]
)
def get_y(stride):
stride_vals = stride[stride > 0]
pix_thresh = stride_vals.max() - 1.5*stride_vals.std()
return np.argwhere(stride>pix_thresh).mean()
np.apply_along_axis(get_y, 0, img)
>> array([ 2. , 1. , 0. , 2. , 2.5, 3.5])
It works as expected, however, performance isn't great as in real dataset there are ~2k rows and ~20-50 columns for each frame, coming 60 times a second.
Is there a way to speed-up the process, perhaps by not using np.apply_along_axis function?

Here's one vectorized approach setting the zeros as NaN and that let's us use np.nanmax and np.nanstd to compute those max and std values avoiding the zeros, like so -
imgn = np.where(img==0, np.nan, img)
mx = np.nanmax(imgn,0) # np.max(img,0) if all are positive numbers
st = np.nanstd(imgn,0)
mask = img > mx - 1.5*st
out = np.arange(mask.shape[0]).dot(mask)/mask.sum(0)
Runtime test -
In [94]: img = np.random.randint(-100,100,(2000,50))
In [95]: %timeit np.apply_along_axis(get_y, 0, img)
100 loops, best of 3: 4.36 ms per loop
In [96]: %%timeit
...: imgn = np.where(img==0, np.nan, img)
...: mx = np.nanmax(imgn,0)
...: st = np.nanstd(imgn,0)
...: mask = img > mx - 1.5*st
...: out = np.arange(mask.shape[0]).dot(mask)/mask.sum(0)
1000 loops, best of 3: 1.33 ms per loop
Thus, we are seeing a 3x+ speedup.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Difference of two IntVectors in Rpy - rpy2

a and b are both rpy2 IntVectors: <IntVector - Python:0x10676dfc8 / R:0x7fc714d64948> [ 81, NA_integer_, NA_integer_, ..., 120, 46, NA_integer_] How can I calculate the b - a difference? I want the result as an IntVector.

subtract = r('''function(x, y) x - y''') subtract(b, a) The good about this solution is that it can not only handle IntVectors, but any R type. The bad is that passing commands as strings to the R interpreter is ugly.

Related

Numpy choice based on another array

Array conforming shape of a given variable

How to find the index position of items in a pandas list which satisfy a certain condition?

What is this operation in numpy called?

numpy apply_along_axis vectorisation

Categories

Resources