I have to create a sparse matrix in python using a function similar to the Matlab function
S = sparse(i,j,v,m,n) where i, j, and v such that S(i(k),j(k)) = v(k) and the size of S is specified as m-by-n.
I have chosen the function scipy.sparse.csr_matrix to do this. My code is something like the following.
arg_shape=np.array([ndof,ndof])
K = csr_matrix((arg_data,(arg_x,arg_y)),shape=arg_shape)
here ndof=786432 and arg_data, arg_x, arg_y are numpy arrays and all of the same shape.i.e. (150994944,).
when I run this code, I get the following error:
ValueError: row index exceeds matrix dimensions
In Matlab the code looks like this and works:
K = sparse(arg_x,arg_y,arg_data, ndof, ndof);
Could anyone please help me with the following points:
1). Is scipy.sparse.csr_matrix a good replacement for the Matlab spare function.
2). If yes, what is the mistake I am making in the code?
Thank you very much.
Here is background information to the problem I am encountering:
1) output is a cell array, each cell contains a matrix of size = 1024 x 1024, type = double
2) labelbout is a cell array which is the identical to output, except that each matrix has been binarized.
3) I am using the function regionprops to extract the mean intensity and centroid values for ROIs (there are multiple ROIs in each image) for each cell of output
4) props is a 5 x 1 struct with 2 fields (centroid and mean intensity)
The problem: I would like to take the mean intensity values for each ROI in every matrix and export to excel. Here is what I have so far:
for i = 1:size(output,2)
props = regionprops(labelboutput{1,i},output{1,i},'MeanIntensity','Centroid');
end
for i = 1:size(output,2)
meanValues = getfield(props(1:length(props),'MeanIntensity'));
end
writetable(struct2table(props), 'advanced_test.xlsx');
There seem to be a few issues:
1) my getfield command is not working and gets the error: "Index exceeds matrix dimensions"
2) when the information is being stored into props, it overwrites the values for each matrix. How do I make props a 5 x n (where n = number of cells in output)?
Please help!!
1) my getfield command is not working and gets the error: "Index exceeds matrix dimensions"
An easier way to get numeric values out of the same field in an array of structs, as an array is: [structArray.fieldName]. In your case this will be:
meanValues = [props.MeanIntensity];
2) when the information is being stored into props, it overwrites the values for each matrix. How do I make props a 5 x n (where n = number of cells in output)?
One option would be to preallocate an empty cell of the necessary dimensions and then fill it in with your regionprops output. Like this:
props = cell(size(output,1),1);
for k = 1:size(output,2)
props{k} = regionprops(labelboutput{1,k},output{1,k},'MeanIntensity','Centroid');
end
for k = 1:size(output,2)
meanValues = [props{k}.MeanIntensity];
end
...
Another option would be to combine your loops so that you can use your matrix data before it is overwritten. Like this:
for i = 1:size(output,2)
props = regionprops(labelboutput{1,i},output{1,i},'MeanIntensity','Centroid');
meanValues = [props.MeanIntensity];
% update this call to place props in non-overlapping parts of your file (e.g. append)
% writetable(struct2table(props), 'advanced_test.xlsx');
end
The bad thing about this second one is it has a file I/O step right inside your loop which can really slow things down; not to mention you will need to curtail your writetable call so it places the resulting table in non-overlapping regions of 'advanced_test.xlsx'.
I have variable 'x_data' sized 360x190, I am trying to select particular rows of data.
x_data_train = []
x_data_train = np.append([x_data_train,
x_data[0:20,:],
x_data[46:65,:],
x_data[91:110,:],
x_data[136:155,:],
x_data[181:200,:],
x_data[226:245,:],
x_data[271:290,:],
x_data[316:335,:]],axis = 0)
I get the following error :
TypeError: append() missing 1 required positional argument: 'values'
where did I go wrong ?
If I am using
x_data_train = []
x_data_train.append(x_data[0:20,:])
x_data_train.append(x_data[46:65,:])
x_data_train.append(x_data[91:110,:])
x_data_train.append(x_data[136:155,:])
x_data_train.append(x_data[181:200,:])
x_data_train.append(x_data[226:245,:])
x_data_train.append(x_data[271:290,:])
x_data_train.append(x_data[316:335,:])
the size of the output is 8 instead of 160 rows.
Update:
In matlab, I will load the text file and x_data will be variable having 360 rows and 190 columns.
If I want to select 1 to 20 , 46 to 65, ... rows of data , I simply give
x_data_train = xdata([1:20,46:65,91:110,136:155,181:200,226:245,271:290,316:335], :);
the resulting x_data_train will be the array of my desired.
How can do that in python because it results array of 8 subsets of array for 20*192 each, but I want it to be one array 160*192
Short version: the most idiomatic and fastest way to do what you want in python is this (assuming x_data is a numpy array):
x_data_train = np.vstack([x_data[0:20,:],
x_data[46:65,:],
x_data[91:110,:],
x_data[136:155,:],
x_data[181:200,:],
x_data[226:245,:],
x_data[271:290,:],
x_data[316:335,:]])
This can be shortened (but made very slightly slower) by doing:
xdata[np.r_[0:20,46:65,91:110,136:155,181:200,226:245,271:290,316:335], :]
For your case where you have a lot of indices I think it helps readability, but in cases where there are fewer indices I would use the first approach.
Long version:
There are several different issues at play here.
First, in python, [] makes a list, not an array like in MATLAB. Lists are more like 1D cell arrays. They can hold any data type, including other lists, but they cannot have multiple dimensions. The equivalent of MATLAB matrices in Python are numpy arrays, which are created using np.array.
Second, [x, y] in Python always creates a list where the first element is x and the second element is y. In MATLAB [x, y] can do one of several completely different things depending on what x and y are. In your case, you want to concatenate. In Python, you need to explicitly concatenate. For two lists, there are several ways to do that. The simplest is using x += y, which modifies x in-place by putting the contents of y at the end. You can combine multiple lists by doing something like x += y + z + w. If you want to keep x, unchanged, you can assign to a new variable using something like z = x + y. Finally, you can use x.extend(y), which is roughly equivalent to x += y but works with some data types besides lists.
For numpy arrays, you need to use a slightly different approach. While Python lists can be modified in-place, strictly speaking neither MATLAB matrices nor numpy arrays can be. MATLAB pretends to allow this, but it is really creating a new matrix behind-the-scenes (which is why you get a warning if you try to resize a matrix in a loop). Numpy requires you to be more explicit about creating a new array. The simplest approach is to use np.hstack, which concatenates two arrays horizontally (or np.vstack or np.dstack for vertical and depth concatenation, respectively). So you could do z = np.hstack([v, w, x, y]). There is an append method and function in numpy, but it almost never works in practice so don't use it (it requires careful memory management that is more trouble than it is worth).
Third, what append does is to create one new element in the target list, and put whatever variable append is called with in that element. So if you do x.append([1,2,3]), it adds one new element to the end of list x containing the list [1,2,3]. It would be more like x = [x, {{1,2,3}}}, where x is a cell array.
Fourth, Python makes heavy use of "methods", which are basically functions attached to data (it is a bit more complicated than that in practice, but those complexities aren't really relevant here). Recent versions of MATLAB has added them as well, but they aren't really integrated into MATLAB data types like they are in Python. So where in MATLAB you would usually use sum(x), for numpy arrays you would use x.sum(). In this case, assuming you were doing appending (which you aren't) you wouldn't use the np.append(x, y), you would use x.append(y).
Finally, in MATLAB x:y creates a matrix of values from x to y. In Python, however, it creates a "slice", which doesn't actually contain all the values and so can be processed much more quickly by lists and numpy arrays. However, you can't really work with multiple slices like you do in your example (nor does it make sense to because slices in numpy don't make copies like they do in MATLAB, while using multiple indexes does make a copy). You can get something close to what you have in MATLAB using np.r_, which creates a numpy array based on indexes and slices. So to reproduce your example in numpy, where xdata is a numpy array, you can do xdata[np.r_[1:20,46:65,91:110,136:155,181:200,226:245,271:290,316:335], :]
More information on x_data and np might be needed to solve this but...
First: You're creating 2 copies of the same list: np and x_data_train
Second: Your indexes on x_data are strange
Third: You're passing 3 objects to append() when it only accepts 2.
I'm pretty sure revisiting your indexes on x_data will be where you solve the current error, but it will result in another error related to passing 2 values to append.
And I'm also sure you want
x_data_train.append(object)
not
x_data_train = np.append(object)
and you may actually want
x_data_train.extend([objects])
More on append vs extend here: append vs. extend
I'm trying to port a minizinc model in choco. I know how to define variables and other basic stuff but despite having read the tutorial and some code examples I've some trouble defining some non trivial constraints.
Could someone give me some advice how to translate the following code (just z) in a choco solver style?
array[1..n,1..n] of int: c;
array[1..n] of var 0..10: next;
var 0..sum(c): z = sum(i in 1..n)(c[i,next[i]]);
Thanks!
I believe you know how to post a sum constraint so the non trivial part lies in the c[i,next[i]] which retrieves the integer in matrix c at row i and column next[i]. The problem is that next[i] is a variable so you cannot use it directly to access a (Java) array.
You need to use the element constraint (that is also in minizinc):
/**
* Creates an element constraint: value = table[index]
*
* #param value an integer variable taking its value in table
* #param table an array of integer values
* #param index an integer variable representing the value of value in table
*/
default Constraint element(IntVar value, int[] table, IntVar index)
As you work with a matrix, you need to do that for each row and then post a sum on them.
Note also that in Java, array cells are accessed from 0 to n-1 (in minizinc it is from 1 to n), so you may need to update the model accordingly or use an offset.
Hope this helps
https://www.cosling.com/
When you take centiles of a variable in Stata, for eg.
*set directory
cd"C:\Etc\Etc Etc\"
*open data file
use "dataset.dta",clear
*get centiles
centile var1, centile(1,5(5)95,99)
is there some way to record the resulting centile table to excel? The centile values are stored in r(c_#), where # indicates the centile at which you want the data. But I need a vector of the values at all the centiles, more or less as it appears in the output window.
I have attempted to use foreach loop to get the centiles into a vector, as follows:
*Create column of centiles
foreach i in r(centiles) {
xx[1,`i']=r(c_`i')
}
without success.
Thanks
EDIT:
I've since found this to work:
matrix X = 0,0
forvalues i=1/21 {
matrix X = `i',round(r(c_`i'),.001)\ X
}
Only inconveniences are 1) I have to include a a first row of 0,0 in the output, which I will then subsequently drop. 2) In this case I have 21 centiles, but it would be nice to automate the number of centiles in case I want to change it, for example something like this:
forvalues i=1/r(n_cent) {
matrix X = `i',round(r(c_`i'),.001)\ X
}
But the "i=1/r(n_cent)" is invalid syntax. Any advice as to how I might overcome these two inconveniences would be much appreciated.
Thanks
You can use the following syntax.
Load some data and compute the percentiles.
sysuse auto, clear
centile price, centile(1,5(5)95,99)
The matrix that is supposed to contain the results has to be initialized. This matrix is called X. It has as many rows as there are centiles requested via the centile command. It has two columns. At this stage, the matrix is populated with zeroes.
matrix X = J(`=wordcount("`r(centiles)'")', 2, 0)
The following loop is stepping through the results of the centile command and is replacing the zeroes in matrix X with the appropriate results. The first column of the matrix contains the number of the centile (1, 5, 10, ...) and the second column contains the result
forvalues i = 1 / `=wordcount("`r(centiles)'")' {
local cent: word `i' of `r(centiles)'
matrix X[`i', 1] = `cent'
matrix X[`i', 2] = r(c_`i')
}
Print the results:
matrix list X
If you are using round(), you are likely doing something wrong. There are few reasons to deliberately lose precision in the data; you can always display as many digits as you like using format this way or another (either applied to the data, or as an option of list or matrix list).
I wrote epctile command that returns percentiles as an estimation command, i.e., in the e(b) vector. This can be usable immediately; findit epctile to download.
You can modify your proposal as follows:
local thenumlist 1, 5(5)95, 99
centile variable, centile(`thenumlist')
forvalues i=1/`=r(n_cent)' {
matrix X = nullmat(X) \ r(c_`i')
}
numlist "`thenumlist'"
matrix rownames X = `r(numlist)'
matrix list X, format(%9.3f)