numpy matrix not functioning as intended - python-3.x

This is my code:
import random
import numpy as np
import math
populacao = 5
x_min = -10
x_max = 10
nbin = 4
def fitness(xy, populacao, resultado):
fit = np.matrix(resultado)
xy_fit = np.append(xy, fit.T, axis = 1)
xy_fit_sorted = xy_fit[np.argsort(xy_fit[:,-1].T),:]
return xy_fit_sorted
def codifica(x, x_min, x_max,n):
x = float(x)
xdec = round((x-x_min)/(x_max-x_min)*(2**n-1))
xbin = int(bin(xdec)[2:])
return(xbin)
xy = np.array([[1, 2],[3,4],[0,0],[-5,-1],[9,-2]])
resultado = np.array([5, 25, 0, 26, 85])
print(xy)
xy_fit_sorted = np.array(fitness(xy, populacao, resultado))
print(xy_fit_sorted)
parents = (xy_fit_sorted[:,:2])
print(parents)
the problem i'm having is that to select the 2 rows of "xy_fit_sorted", i'm doing this strange thing:
parents = (xy_fit_sorted[:,:2])
Intead of what makes sense in my mind:
parents = (xy_fit_sorted[:1,:])
it's like the whole matrix is in one line.

I'm not sure what most of your code is doing, so here's just a guess: are you thrown off by the shape of xy_fit_sorted being (1, 5, 3), having an extra zero axis?
That could be fixed e.g. by constructing xy_fit without the use of np.matrix:
xy_fit = np.append(xy, resultado[:, np.newaxis], axis=1)
Then xy_fit_sorted comes out with a shape of (5, 3).
The underlying issue was that np.matrix is always a 2-D array. When indexing xy_fit[...] you intend to index with a vector. But using np.matrix for xy_fit, xy_fit[:,-1].T is not a vector, but a 2-D array as well (of shape (1,5)). This leads to xy_fit_sorted having an extra dimension as well.
Note that the numpy doc says about np.matrix anyhow:
It is no longer recommended to use this class, even for linear algebra. Instead use regular arrays. The class may be removed in the future.

Related

Python, Extract spline coefficient

I am using python3, Scipy
I have a 3d points (x,y,z]
From them I make s apline using scipy.interpolate.splprep
x_points = np.linspace(0, 2*np.pi, 10)
y_points = np.sin(x_points)
z_points = np.cos(x_points)
path = np.vstack([x_points, y_points, z_points])
tck, u = sc.splprep(path, k=3, s=0)
I wish to get the coefficients of the spline[i]:
For example the latest splins:
sp9 = a9 + b9(x-x4) + c9(x-x4)^2 + d9(x-x4)^3
I know that the tck is (t,c,k) a tuple containing the vector of knots, the B-spline coefficients, and the degree of the spline.
But I don't see how I can get this spline function and plot only it
I tried using this method:
import numpy as np
import scipy.interpolate as sc
x_points = np.linspace(0, 2*np.pi, 10)
y_points = np.sin(x_points)
z_points = np.cos(x_points)
path = np.vstack([x_points, y_points, z_points])
tck, u = sc.splprep(path, k=3, s=0)
p = sc.PPoly.from_spline(tck)
but I'm getting this error on the last line:
p = sc.PPoly.from_spline(tck) File
"C:\Users...\Python38\lib\site-packages\scipy\interpolate\interpolate.py",
line 1314, in from_spline cvals = np.empty((k + 1, len(t)-1),
dtype=c.dtype)
AttributeError: 'list' object has no attribute 'dtype'
The coefficients in the tck tuple are in the b-spline basis. If you want to convert them to the power basis, you can do PPoly.from_spline(tck) .
An obligatory note however: converting between bases incurs numerical errors.
EDIT. First, as it's splprep, you'll need to convert the list-of-arrays c into a proper numpy array and transpose (it's a known wart of splPrep). Then, as it turns out, PPoly.from_spline does not handle multidimensional c (this might be a nice pull request to the scipy repository), so you'll need to e.g. loop over the dimensions. Something along the lines of (continuing from your OP)
t, c, k = tck
cc = np.asarray(c) # cc.shape is (3, 10) now
spl0 = sc.PPoly.from_spline((t, cc.T[0], k))
print(spl0.c) # here are your coefficients for the component 0

Could someone please help me with sklearn.metrics.roc_curve's use and what does the function expect?

I am trying to construct 2 numpy ndarray-s from a networkx Graph's data structures that look like a list of tuples and a simple list. I would like to make a roc curve where
the validation set is the above mentioned list of tuples of the edges of a G graph that I was trying to construct like this:
x = []
for i in G_orig.nodes():
for j in G_orig.nodes():
if j > I and (i, j) not in G.edges():
if (i, j) in G_orig.edges():
x.append((i, j, 1))
else:
x.append((i, j, 0))
y_validation = np.array(x)
It looks something like this: [(1, 344, 1), (2, 23, 0), (3, 5, 0), ...... (333, 334, 1)].
The first 2 numbers mean 2 nodes, the 3rd one means whether there is an edge between them. 1 means edge, 0 means no edge.
Then roc_curve expects something called y_score in the documentation. I have a list for that made with a method called preferential attachment, therefore I named it pref_att_types. I tried to make a numpy array of it in case the roc_curve expects only it.
positive_class_predicted_probabilities = np.array(pref_att_types)
3.Then I just did what we used in class.
FPRs, TPRs, thresholds = roc_curve(y_validation,
positive_class_predicted_probabilities,
pos_label=1)
It is literally just Ctrl C + Ctrl V. But it says Value error and 'multiclass-multioutput format is not supported'. Please note that I am not a programmer just someone who studies to be a mathematics analyst.
The first argument, y_true, needs to be just the true labels, in this case 0/1 without the pair of nodes. Just be sure that the indices of the arrays y_validation and pref_att_types match
The code below draws the ROC curves for two RF models:
from sklearn.metrics import roc_curve
#create array of probabilities
y_test_predict1_probaRF = rf1.predict_proba(X_test)
y_test_predict2_probaRF = rf2.predict_proba(X_test)
RFfpr1, RFtpr1, thresholds = roc_curve(y_test, y_test_predict1_probaRF[:,1])
RFfpr2, RFtpr2, thresholds = roc_curve(y_test, y_test_predict2_probaRF[:,1])
def plot_roc_curve (fpr, tpr, label = None):
plt.plot(fpr, tpr, linewidth = 2, label = label)
plt.plot([0,1], [0,1], "k--")
plt.axis([0,1,0,1])
plt.xlabel("False positive rate")
plt.ylabel("True positive rate")
plot_roc_curve (RFfpr1,RFtpr1,"RF1")
plot_roc_curve (RFfpr2,RFtpr2,"RF2")
plt.legend()
plt.show()

How can I interpolate a numpy array so that it becomes a certain length?

I have three numpy arrays each with different lengths:
A.shape = (3401,)
B.shape = (2200,)
C.shape = (4103,)
I would like to average the three arrays to produce a new array with size of the largest array (in this case C):
D.shape = (4103,)
Problem is, I don't think I can do this without adding "fake" data to A and B, by interpolation.
How can I perform interpolation on the first two numpy arrays so that they are of the same length as array C?
Do I even need to interpolate here?
First thing that comes to mind is zoom from scipy:
The array is zoomed using spline interpolation of the requested order.
Code:
import numpy as np
from scipy.ndimage import zoom
A = np.random.rand(3401)
B = np.random.rand(2200)
C = np.ones(4103)
for arr in [A, B]:
zoom_rate = C.shape[0] / arr.shape[0]
arr = zoom(arr, zoom_rate)
print(arr.shape)
Output:
(4103,)
(4103,)
I think the simplest option is to do the following:
D = np.concatenate([np.average([A[:2200], B, C[:2200]], axis=0),
np.average([A[2200:3401], C[2200:3401]], axis=0),
C[3401:]])

dask array map_blocks, with differently shaped dask array as argument

I'm trying to use dask.array.map_blocks to process a dask array, using a second dask array with different shape as an argument. The use case is firstly running some peak finding on a 2-D stack of images (4-dimensions), which is returned as a 2-D dask array of np.objects. Ergo, the two first dimensions of the two dask arrays are the same. The peaks are then used to extract intensities from the 4-dimensional dataset. In the code below, I've omitted the peak finding part. Dask version 1.0.0.
import numpy as np
import dask.array as da
def test_processing(data_chunk, position_chunk):
output_array = np.empty(data_chunk.shape[:-2], dtype='object')
for index in np.ndindex(data_chunk.shape[:-2]):
islice = np.s_[index]
intensity_list = []
data = data_chunk[islice]
positions = position_chunk[islice]
for x, y in positions:
intensity_list.append(data[x, y])
output_array[islice] = np.array(intensity_list)
return output_array
data = da.random.random(size=(4, 4, 10, 10), chunks=(2, 2, 10, 10))
positions = np.empty(data.shape[:-2], dtype='object')
for index in np.ndindex(positions.shape):
positions[index] = np.arange(10).reshape(5, 2)
data_output = da.map_blocks(test_processing, data, positions, dtype=np.object,
chunks=(2, 2), drop_axis=(2, 3))
data_output.compute()
This gives the error ValueError: Can't drop an axis with more than 1 block. Please useatopinstead., which I'm guessing is due to positions having 3 dimensions, while data has 4 dimensions.
The same function, but without the positions dask array works fine.
import numpy as np
import dask.array as da
def test_processing(data_chunk):
output_array = np.empty(data_chunk.shape[:-2], dtype='object')
for index in np.ndindex(data_chunk.shape[:-2]):
islice = np.s_[index]
intensity_list = []
data = data_chunk[islice]
positions = [[5, 2], [1, 3]]
for x, y in positions:
intensity_list.append(data[x, y])
output_array[islice] = np.array(intensity_list)
return output_array
data = da.random.random(size=(4, 4, 10, 10), chunks=(2, 2, 10, 10))
data_output = da.map_blocks(test_processing, data, dtype=np.object,
chunks=(2, 2), drop_axis=(2, 3))
data_computed = data_output.compute()
This has been fixed in more recent versions of dask: running the same code on version 2.3.0 of dask works fine.

How to get a theano function to return the an array of the same length as another tensor variable

I am really new to Theano, and I am just trying to figure out some basic functionality. I have a tensor variable x, and i would like the functio to return a tensor variable y of the same shape, but filled with value 0.2. I am not sure how to define y.
For example if x = [1,2,3,4,5], then I would like y = [0,2, 0,2, 0,2, 0,2, 0.2]
from theano import tensor, function
y = tensor.dmatrix('y')
masked_array = function([x],y)
There's probably a dozen different ways to do this and which is best will depend on the context: how this piece of code/functionality fits into the wider program.
Here's one approach:
import theano
import theano.tensor as tt
x = tt.vector()
y = tt.ones_like(x) * 0.2
f = theano.function([x], outputs=y)
print f([1, 2, 3, 4, 5])

Resources