Apply this function to a 2D numpy Matrix vector operations only

Apply this function to a 2D numpy Matrix vector operations only - python-3.x

guys, I have this function
def averageRating(a,b):
avg = (float(a)+float(b))/2
return round(avg/25)*25
Currently, I am looping over my np array which is just a 2D array that has numerical values. What I want to be able to do is have "a" be the 1st array and "b" be the 2nd array and get the average per row and what I want for my return is just an array with the values. I have used mean but could not find a way to edit it and have the round() part or multiple (avg*25)/25.
My goal is to get rid of looping and replace it with a vectorized operations because of how slow looping is.
Sorry for the question new to python and numpy.

def averageRating(a,b):
avg = (np.average(a,axis=1) + np.average(b,axis=1))/2
return np.round(avg,0)
This should do what you are looking for if I understand the question correctly. Specifying axis = 1 in np.average will give the average of the rows (axis = 0 would be the average of the columns). And the 0 in np.round will round to 0 decimal places, changing it will change the number of decimal places you round to. Hope that helps!

def averageRating(a, b):
averages = []
for i in range( len(a) ):
averages.append( (a[i] + b[i]) / 2 )
return averages
Giving your arrays are of equal length this should be a simple resolution.
This doesn't eliminate the use of for loops, however, it will be computationally cheaper than the current approach.

Related

Efficient sparse matrix column change

I'm implementing an efficient PageRank algorithm so I'm using sparse matrices. I'm close, but there's one problem. I have a matrix where I want the sum of each column to be one. This is easy to implement, but the problem occurs when I get a matrix with a zero column.
In this case, I want to set each element in the column to be 1/(n-1) where n is the dimension of the matrix. I divide by n-1 and not n because I wish to keep the diagonals zero, always.
How can I implement this efficiently? My naive solution is to just determine the sum of each column and then find the column indices that are zero and replace the entire column with an 1/(n-1) value like so:
# naive approach (too slow!)
# M is my nxn sparse matrix where each column sums to one
col_sums = M.sum(axis=0)
for i in range(n):
if col_sums[0,i] == 0:
# set entire column to 1/(n-1)
M[:, i] = 1/(n-1)
# make sure diagonal is zeroed
M[i,i] = 0
My M matrix is very very very large and this method simply doesn't scale. How can I do this efficiently?

You can't add new nonzero values without reallocating and copying the underlying data structure. If you expect these zero columns to be very common (> 25% of the data) you should handle them in some other way, or you're better off with a dense array.
Otherwise try this:
import scipy.sparse
M = scipy.sparse.rand(1000, 1000, density=0.001, format='csr')
nz_col_weights = scipy.sparse.csr_matrix(M.shape, dtype=M.dtype)
nz_col_weights[:, M.getnnz(axis=0) == 0] = 1 / (M.shape[0] - 1)
nz_col_weights.setdiag(0)
M += nz_col_weights
This has only two allocation operations

Excel: How to find closest number in table, many times

Excel
Need to find nearest float in a table, for each integer 0..99
https://www.excel-easy.com/examples/closest-match.html explains a great technique for finding the CLOSEST number from an array to a constant cell.
I need to perform this for many values (specifically, find nearest to a vertical list of integers 0..99 from within a list of floats).
Array formulas don't allow the compare-to value (integers) to change as we move down the list of integers, it treats it like a constant location.
I tried Tables, referring to the integers (works) but the formula from the above web site requires an Array operation (F2, control shift Enter), which are not permitted in Tables. Correction: You can enter the formula, control-enter the array function for one cell, copy the formulas, then insert table. Don't change the search cell reference!
Update:
I can still use array operations, but I manually have to copy the desired function into each 100 target cells. No biggie.
Fixed typo in formula. See end of question for details about "perfection".
Example code:
AI4=some integer
AJ4=MATCH(MIN(ABS(Table[float_column]-AI4)), ABS(Table[float_column]-AI4), 0)
repeat for subsequent integers in AI5...AI103
Example data:
0.1 <= matches 0
0.5
0.95 <= matches 1
1.51 <= matches 2
2.89
Consider the case where target=5, and 4.5, 5.5 exist in the list. One gives -0.5 and the other +0.5. Searching for ABS(-.5) will give the first one. Either one is decent, unless your data is non-monotonic.
This still needs a better solution.
Thanks in advance!

I had another problem, which pushed to a better solution.
Specifically, since the Y values for the X that I am interested in can be at varying distances in X, I will interpolate X between the X point before and after. Ie search for less than or equal, also greater than or equal, interpolate the desired X, then interpolate the Y values.
I could go a step further and interpolate N - 1 to N + 1, which will give cleaner results for noisy data.

Convert series to float pandas

i have a function that is supposed to chain link a list of daily returns in a dataframe, but when i pass the column, the function is returning a series, rather than a float
def my_aggfunc(x):
y = np.exp(np.log1p(x).cumsum())
return y
if however i change the second line to be
np.sum(x)
this returns a float
Any ideas pls?

np.log1p(x) is an array.
np.log1p(x).cumsum() is another array of the same size.
np.exp(np.log1p(x).cumsum()) is yet another array.
I'm assuming you didn't want cumsum you wanted sum
np.exp(np.log1p(x).sum())

From the np.exp docs:
Calculate the exponential of all elements in the input array.
Returns:   out : ndarray   Output array, element-wise exponential of x.
So y is an array.

Pandas: Filling random empty rows with data

I have a dataframe with several currently-empty columns. I want a fraction of these filled with data drawn from a normal distribution, while all the rest are left blank. So, for example, if 60% of the elements should be blank, then 60% would be, while the other 40% would be filled. I already have the normal distribution, via numpy, but I'm trying to figure out how to choose random rows to fill. Currently, the only way I can think of involves FOR loops, and I would rather avoid that.
Does anyone have any ideas for how I could fill empty elements of a dataframe at random? I have a bit of the code below, for the random numbers.
data.loc[data['ColumnA'] == 'B', 'ColumnC'] = np.random.normal(1000, 500, rowsB).astype('int64')

piRSquared's advice is good. We are left guessing what to solve.
Having just looked through some of the latest unanswered pandas questions there are worse.
import pandas as pd
import numpy as np
#some redundancy here as i make an empty dataframe -pretending i start like you with a Dataframe.
df = pd.DataFrame(index = range(11),columns=list('abcdefg'))
num_cells = np.product(df.shape)
# make a 2-dim array with number from 1 to number cells.
arr =np.arange(1,num_cells+1)
#inplace shuffle - this is the key randomization operation
np.random.shuffle(arr)
arr = arr.reshape(df.shape)
#place the shuffled values, normalized to the number of cells, into my dateframe.
df = pd.DataFrame(index = df.index,columns = df.columns,data=arr/np.float(num_cells))
#use applymap to set keep 40% of cells as ones, the other 60% as nan.
df = df.applymap(lambda x: 1 if x > 0.6 else np.nan)
# now sample a full set from normal distribution
# but when multiplying the nans will cause the sampled value to nullify, whilst the multiply by 1 will retain the sample value.
df * np.random.normal(1000,500,df.shape)
Thus you are left with a random 40% of the cells containing a draw from your normal distribution.
If your dataframe was large you could assume the stability of the uniform rand() function. Here i didn't do that and rather determined explicitly how many cells are above and below the threshold.

masking a double over a string

This is a question in MatLab...
I have two matrices, one being a (5 x 1 double) :
1
2
3
1
3
And the second matrix being a (5 x 3 string), with spaces where no character appears :
a
bc
def
g
hij
I am trying to get an output such that a (5 x 1 string) is created and outputs the nth value from each line of matrix two, where n is the value in matrix one. I am unsure how to do this using a mask which would be able to handle much larger matrces. My target matrix would have the following :
a
c
f
g
j
Thank you very much for the help!!!

There are so many ways you can accomplish this task. I'll give you two.
Method #1 - Generate linear indices and access elements
Use sub2ind to generate a set of linear indices that correspond to the row and column locations you want to access in your matrix. You'll note that the column locations are the ones changing, but the row locations are always increasing by 1 as you want to access each row. As such, given your string matrix A, and your columns you want to access stored in ind, just do this:
A = ['a '; 'bc '; 'def'; 'g ';'hij'];
ind = [1 2 3 1 3];
out = A(sub2ind(size(A), (1:numel(ind)).', ind(:)))
out =
a
c
f
g
j
Method #2 - Create a sparse matrix, convert to logical and access
Alternatively, you can create a sparse matrix through sparse where the non-zero entries are rows vary from 1 up to as many elements as you have in ind and the columns vary like what you have given us.
S = sparse((1:numel(ind)).',ind(:),true,size(A,1),size(A,2));
A = A.'; out = A(S.');
Be mindful that you are trying to access each element in a row-major fashion, yet MATLAB will do this in a column-major format. As such, we would need to transpose our data matrix, and also take our sparse matrix and transpose that too. The end result should give you the same order as Method #1.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Apply this function to a 2D numpy Matrix vector operations only - python-3.x

Related

Efficient sparse matrix column change

Excel: How to find closest number in table, many times

Convert series to float pandas

Pandas: Filling random empty rows with data

masking a double over a string

Categories

Resources