masking a double over a string - string

This is a question in MatLab...
I have two matrices, one being a (5 x 1 double) :
1
2
3
1
3
And the second matrix being a (5 x 3 string), with spaces where no character appears :
a
bc
def
g
hij
I am trying to get an output such that a (5 x 1 string) is created and outputs the nth value from each line of matrix two, where n is the value in matrix one. I am unsure how to do this using a mask which would be able to handle much larger matrces. My target matrix would have the following :
a
c
f
g
j
Thank you very much for the help!!!

There are so many ways you can accomplish this task. I'll give you two.
Method #1 - Generate linear indices and access elements
Use sub2ind to generate a set of linear indices that correspond to the row and column locations you want to access in your matrix. You'll note that the column locations are the ones changing, but the row locations are always increasing by 1 as you want to access each row. As such, given your string matrix A, and your columns you want to access stored in ind, just do this:
A = ['a '; 'bc '; 'def'; 'g ';'hij'];
ind = [1 2 3 1 3];
out = A(sub2ind(size(A), (1:numel(ind)).', ind(:)))
out =
a
c
f
g
j
Method #2 - Create a sparse matrix, convert to logical and access
Alternatively, you can create a sparse matrix through sparse where the non-zero entries are rows vary from 1 up to as many elements as you have in ind and the columns vary like what you have given us.
S = sparse((1:numel(ind)).',ind(:),true,size(A,1),size(A,2));
A = A.'; out = A(S.');
Be mindful that you are trying to access each element in a row-major fashion, yet MATLAB will do this in a column-major format. As such, we would need to transpose our data matrix, and also take our sparse matrix and transpose that too. The end result should give you the same order as Method #1.

Related

Efficient sparse matrix column change

I'm implementing an efficient PageRank algorithm so I'm using sparse matrices. I'm close, but there's one problem. I have a matrix where I want the sum of each column to be one. This is easy to implement, but the problem occurs when I get a matrix with a zero column.
In this case, I want to set each element in the column to be 1/(n-1) where n is the dimension of the matrix. I divide by n-1 and not n because I wish to keep the diagonals zero, always.
How can I implement this efficiently? My naive solution is to just determine the sum of each column and then find the column indices that are zero and replace the entire column with an 1/(n-1) value like so:
# naive approach (too slow!)
# M is my nxn sparse matrix where each column sums to one
col_sums = M.sum(axis=0)
for i in range(n):
if col_sums[0,i] == 0:
# set entire column to 1/(n-1)
M[:, i] = 1/(n-1)
# make sure diagonal is zeroed
M[i,i] = 0
My M matrix is very very very large and this method simply doesn't scale. How can I do this efficiently?
You can't add new nonzero values without reallocating and copying the underlying data structure. If you expect these zero columns to be very common (> 25% of the data) you should handle them in some other way, or you're better off with a dense array.
Otherwise try this:
import scipy.sparse
M = scipy.sparse.rand(1000, 1000, density=0.001, format='csr')
nz_col_weights = scipy.sparse.csr_matrix(M.shape, dtype=M.dtype)
nz_col_weights[:, M.getnnz(axis=0) == 0] = 1 / (M.shape[0] - 1)
nz_col_weights.setdiag(0)
M += nz_col_weights
This has only two allocation operations

split multiple values into two columns based on single seprator

I am new to pandas.I have a situation I want to split length column into two columns a and b.Values in length column are in pair.I want to compare first pair smaller value should be in a nad larger in b.then compare next pair on same row and smaller in a,larger in b.
I have hundred rows.I think I can not use str.split because there are multiple values and same delimiter.I have no idea how to do it
The output should be same like this.
Any help will be appreciated
length a b
{22.562,"35.012","25.456",37.342,24.541,38.241} 22.562,25.45624.541 35.012,37.342,38.241
{21.562,"37.012",25.256,36.342} 31.562,25.256 37.012,36.342
{22.256,36.456,26.245,35.342,25.56,"36.25"} 22.256,26.245,25.56 36.456,35.342,36.25
I have tried
df['a'] = df['length'].str.split(',').str[0::2]
df['b'] = df['length'].str.split(',').str[1::3]
through this ode column b output is perfect but col a is printing first full pair then second.. It is not giving only 0,2,4th values
The problem comes from the fact that your length column is made of set not lists.
Here is a way to do what you want by casting your length column as list:
df['length'] = [list(x) for x in df.length] # We cast the sets as lists
df['a'] = [x[0::2] for x in df.length]
df['b'] = [x[1::2] for x in df.length]
Output:
length a \
0 [35.012, 37.342, 38.241, 22.562, 24.541, 25.456] [35.012, 38.241, 24.541]
1 [25.256, 36.342, 21.562, 37.012] [25.256, 21.562]
2 [35.342, 36.456, 36.25, 22.256, 25.56, 26.245] [35.342, 36.25, 25.56]
b
0 [37.342, 22.562, 25.456]
1 [36.342, 37.012]
2 [36.456, 22.256, 26.245]

Two regular loops with using given values for a parameter in MATLAB

I have an S1 (21x21) matrix and a W (21x21) matrix given. I define a matrix results with each element as a matrix as results = {W};
and then, I have two regular for loops such that it runs all the values in index1 and then goes to the second index; but each time it should take a specific value of k for example.
There are also two given vectors cos and ens each having dimension 21x1. Here is the code:
rowsP=21;
M=0;
beta=0.9;
p=0.5;
q=0.5;
k= [1:rowsP-1];
for j=1:rowsP-k
for i=1:rowsP-k
R(i,j) = ( S1(i,end-k) - cos(j+k) ) *ens(j)-0.001*M +
beta*(p*results{k}(i,end-j)+q*results{k}(i+1,end-j));
results{k+1}=fliplr(R);
end
end
I am getting the error
Matrix Dimensions must agree.
So I am trying to calculate a matrix results each time using two for loops given results{1}=W (a given matrix) given k=1.
Then flipping the matrix left to right, I get results{2} which then helps to calculate R again but for k=2. And this is then repeated until k=21.
As you see, I keep dropping the last column of each successive R, the matrix results should be appended each time giving a row of 21 elements each cell having 21x21 matrix (the given matrix W) and then a matrix of 20x20 and then 19x19 and so on... until a matrix of 1x1. I am unable to solve the problem as Matlab only does 1 iteration and then does not compute the correct answer. I keep getting two cells in results with a 21x21 matrix (the one given) and the next 20x20 matrix.
I tried with another for loop for k, but in that case, for a given k, starting from k=1, it runs the whole code for j and then i, but it does not solve my problem.

Use dynamic programming to find a subset of numbers whose sum is closest to given number M

Given a set A of n positive integers a1, a2,... a3 and another positive integer M, I'm going to find a subset of numbers of A whose sum is closest to M. In other words, I'm trying to find a subset A′ of A such that the absolute value |M - 􀀀 Σ a∈A′| is minimized, where [ Σ a∈A′ a ] is the total sum of the numbers of A′. I only need to return the sum of the elements of the solution subset A′ without reporting the actual subset A′.
For example, if we have A as {1, 4, 7, 12} and M = 15. Then, the solution subset is A′ = {4, 12}, and thus the algorithm only needs to return 4 + 12 = 16 as the answer.
The dynamic programming algorithm for the problem should run in
O(nK) time in the worst case, where K is the sum of all numbers of A.
You construct a Dynamic Programming table of size n*K where
D[i][j] = Can you get sum j using the first i elements ?
The recursive relation you can use is: D[i][j] = D[i-1][j-a[i]] OR D[i-1][j] This relation can be derived if you consider that ith element can be added or left.
Time complexity : O(nK) where K=sum of all elements
Lastly you iterate over entire possible sum you can get, i.e. D[n][j] for j=1..K. Which ever is closest to M will be your answer.
For dynamic algorithm, we
Define the value we would work on
The set of values here is actually a table.
For this problem, we define value DP[i , j] as an indicator for whether we can obtain sum j using first i elements. (1 means yes, 0 means no)
Here 0<=i<=n, 0<=j<=K, where K is the sum of all elements in A
Define the recursive relation
DP[i+1 , j] = 1 , if ( DP[i,j] == 1 || DP[i,j-A[i+1]] ==1)
Else, DP[i+1, j] = 0.
Don't forget to initialize the table to 0 at first place. This solves boundary and trivial case.
Calculate the value you want
Through bottom-up implementation, you can finally fill the whole table.
Now, things become easy. You just need to find out the closest value to M in the table whose value is one.
Here, just work on DP[n][j], since n covers the whole set. Find the closest j to M whose value is 1.
Time complexity is O(kn), since you iterate k*n times in total.

How to change stringified numbers in data frame into pure numeric values in R

I have the following data.frame:
employee <- c('John Doe','Peter Gynn','Jolie Hope')
# Note that the salary below is in stringified format.
# In reality there are more such stringified numerical columns.
salary <- as.character(c(21000, 23400, 26800))
df <- data.frame(employee,salary)
The output is:
> str(df)
'data.frame': 3 obs. of 2 variables:
$ employee: Factor w/ 3 levels "John Doe","Jolie Hope",..: 1 3 2
$ salary : Factor w/ 3 levels "21000","23400",..: 1 2 3
What I want to do is to convert the change the value from string into pure number
straight fro the df variable. At the same time preserve the string name for employee.
I tried this but won't work:
as.numeric(df)
At the end of the day I'd like to perform arithmetic on these numeric
values from df. Such as df2 <- log2(df), etc.
Ok, there's a couple of things going on here:
R has two different datatypes that look like strings: factor and character
You can't modify most R objects in place, you have to change them by assignment
The actual fix for your example is:
df$salary = as.numeric(as.character(df$salary))
If you try to call as.numeric on df$salary without converting it to character first, you'd get a somewhat strange result:
> as.numeric(df$salary)
[1] 1 2 3
When R creates a factor, it turns the unique elements of the vector into levels, and then represents those levels using integers, which is what you see when you try to convert to numeric.

Resources