I was reading this answer on SO where someone was pulling a mat4 attribute.
When setting up the vertex attrib array, there was one thing I noticed:
gl.vertexAttribPointer(row3Location, floatsPerRow, gl.FLOAT,
false, bytesPerMatrix, row3Offset);
I understand that the mat4 being supplied takes up 4 attribute slots, but why do we pass bytesPerMatrix as the stride instead of something like bytesPerRow? Shouldn't each attribute slot pull 16 bytes from its offset instead of 64?
This is how I imagine a stride of 16 bytes and offsets being multiples of 16.
0000111122223333444455556666777788889999AAAABBBBCCCCDDDDEEEEFFFF
^---------------
^---------------
^---------------
^---------------
And this is how I imagine a stride of 64 bytes and offsets being multiples of 16.
0000111122223333444455556666777788889999AAAABBBBCCCCDDDDEEEEFFFF
^---------------------------------------------------------------
^---------------------------------------------------------------
^---------------------------------------------------------------
^---------------------------------------------------------------
^ considerable overlap when pulling attributes for matrix
So, clearly my mental model of stride and offset is wrong. How does this actually work? Why does the stride need to be the size of the whole matrix when this attribute is only pulling the equivalent of a vec4 at a time?
The stride is how many bytes to skip to get to the next value for that attribute. For a mat3 there are 3 attributes, one for each row of the matrix. The data for each attribute, assuming you put your matrices in a buffer linearly next to each other, is:
| Matrix0 | Matrix1 | Matrix2 | ...
| row0 | row1 | row2 | row0 | row1 | row2 | row0 | row1 | row2 | ...
| x,y,z | x,y,z | x,y,z | x,y,z | x,y,z | x,y,z | x,y,z | x,y,z | x,y,z | ...
so the first attribute wants data for row0, for each matrix. To get from row0 in the first matrix to row0 in the second matrix is bytesPerMatrix
| Matrix0 | Matrix1 | Matrix2 | ...
| row0 | row1 | row2 | row0 | row1 | row2 | row0 | row1 | row2 | ...
| x,y,z | x,y,z | x,y,z | x,y,z | x,y,z | x,y,z | x,y,z | x,y,z | x,y,z | ...
| --- bytesPerMatrix -->|
| --- bytesPerMatrix -->|
| --- bytesPerMatrix -->|
stride is how many bytes to skip to get to the next value, not how many bytes to read. How many bytes to read is defined by the size and type parameters of the attribute as in
const size = 3;
const type = gl.FLOAT;
const normalize = false;
const stride = bytesPerMatrix;
const offset = row * bytesPerRow
gl.vertexAttribPointer(location, size, type, normalize, stride, offset);
So above because size = 3 and type = FLOAT it will read 12 bytes.
The process is
start at offset bytes in the buffer
read 12 bytes from the buffer and apply them to the attribute's value
add stride to offset
goto 2
for however many vertices you asked it to process.
note: that assumes you actually put your data in the buffer one matrix after another. You don't have to do that. You could put all row0s next to each other, followed by all row1s, followed by all row2s. Or you could even put all row0s in a different buffer from row1s and row2s. I don't think I've ever seen anyone do that but just pointing it out that the way described at the top is not set it stone. It's just the most common way.
Related
Given is a typical pandas dataframe with "relational data"
|--------------|------------|------------|
| Column1 | Column2 | Column3 |
|-------- -----|------------|------------|
| A | 1 | C |
|--------------|------------|------------|
| B | 2 | C |
|--------------|------------|------------|
| A | 2 | C |
|--------------|------------|------------|
| A | 1 | C |
|--------------|------------|------------|
| ... | ... | ... |
|--------------|------------|------------|
I am trying to calculate the probabilities between all column values with length 2, meaning the tuple (A,1) --> 0.66, (A,2) --> 0.33, (B,2) --> 1, (2,B) --> 0.5 and so on.
I am expecting the result back in a list similar to:
[
[A,1,0.66],
[A,2,0.33],
[B,2,1],
[2,b,0.5],
...
]
Currently, my approach is really inefficient (even while using multiprocessing). Simplified i am iterating over all possibilities without any Cython.
# iterating through all columns
for colname in colnames:
# evaluating all other columns except the one under assessment
for x in [x for x in colnames if not x==colname]:
# through groupby we get their counts
groups = df.groupby([colname,x]).size().reset_index(name='counts')
# for each group we
for index,row in groups.iterrows():
# calculate their probability over the entire population
# of the column and push it in the result list
result.append([row[colname],row[x],(row["counts"]/df[x].count())])
What is the most efficient way to complete this transformation?
I am trying to set up a table that will allow for very quick classification of sites across about 50 different characteristics. The method I have thought of but am unsure if it's possible is as follows.
Worksheet A: the raw data about 100R x 50C with each cell
describing a characteristic of that row where the last column is the
overall classification.
Worksheet B: a table of about 5R x 50C with the columns
corresponding to the columns in Worksheet A.
A row of Worksheet B would look something like:
* | * | * | 1 | * | 3 | * | Y | * | ... | * | * | * |
And a row from Worksheet A that corresponds with this data would look something like:
A | B | C | 1 | 5 | 3 | Z | Y | 1 | ... | F | 2 | X | High Priority
Where the asterisks indicate a wildcard where I don't care what the content is. All of the other cells would be required conditions. Then I was thinking of applying an array formula on the last column to get the classification. Something like:
{=IF(AND(A2:BV2='Worksheet B'!$A$2:$BV$2), "High Priority", "Low Priority")}
But Excel takes the asterisks as literal string content and evaluates it as FALSE.
Is there a way to make this work? Or an alternative method that would be just as simple to implement?
I got to the bottom of it with a reasonably elegant solution. Please post criticisms if there is a situation where this won't work.
={IF(SUM(IF(A2:BV2='Worksheet B'!A2:BV2,0,1))=COUNTIF('Worksheet B'!A2:BW2,"x"),"Top Priority", "Low Priority")}
Where x is for those cells in which I don't care about the outcome. So instead of "*", I am using "x" in the cells above such that Worksheet B is more like:
x | x | x | 1 | x | 3 | x | Y | x | ... | x | x | x |
If anyone is interested, the formula works by counting all of the mismatched elements and checking them against the number of cells with "x" in the result. If these two numbers are equal, the number of mismatches is equal to the number of cells we don't care about.
I am trying to create a scatter plot with the y values stored in the y1,y2,y3 columns. The objective is to obtain a color for each column. This works fine. My problem is that blank values are also plotted if the column is computed with an equation. Concretely, E2, E3 and E4 should not appear in the plot.
y1 and y2 are pure values, whereas y3 is computed with a simple if condition. For example, E2 cell is defined as =IF(C2="", 40,""). I also tried to check the 'Hidden and Empty Cells' option, as suggested in this post, but nothing happens.
\ | A | B | C | D | E |
1 | x | y1 | y2 | y3 |
2 | 1 | | 14 | |
3 | 2 | 6 | 45 | |
4 | 3 | 12 | 6 | |
5 | 4 | 4 | | 40 |
I have a Rpy2 data frame as <class 'rpy2.robjects.vectors.DataFrame'>. How can I convert it to a Python list or tuple with every row as an element? Thanks!
I figured it out. I hope this helps if you are looking for an answer:
output = [tuple([df[j][i] for j in range(df.ncol)]) for i in range(df.nrow)]
I stumbled recently over one potential problem. Given a data frame from R:
| | a | c | b | d |
|---|-------|---|---|-----|
| 1 | info1 | 2 | 1 | op1 |
| 2 | info2 | 3 | 2 | 3 |
| 3 | info3 | 4 | 3 | 3 |
| 4 | info4 | 5 | 4 | 3 |
| 5 | info5 | 6 | 5 | 3 |
| 6 | info6 | 7 | 6 | 3 |
| 7 | 9 | 8 | 7 | 3 |
(yes I know - mixed data types in one column i.e. str and float is maybe not realistic but the same holds true for factors only columns)
The conversion will show the index for columns a and d and not the real values usually intended. The issue is as stated in the rpy2 manual:
R’s factors are somewhat peculiar: they aim at representing a memory-efficient vector of labels, and in order to achieve it are implemented as vectors of integers to which are associated a (presumably shorter) vector of labels. Each integer represents the position of the label in the associated vector of labels.
The following rough draft code is a step towards handling this case:
colnames = list(dataframe.colnames)
rownames=list(dataframe.rownames)
col2data = []
for cn,col in dataframe.items():
if isinstance(col,robjects.vectors.FactorVector) is True:
colevel = tuple(col.levels)
col = tuple(col)
ncol = []
for i in col:
k=i-1
ncol.append(colevel[k])
else:
ncol = tuple(col)
col2data.append((cn,ncol))
col2data.append(('rownames',rownames))
col2data = dict(col2data)
The output is a dict with columnames to values mapping. Using a loop and transposing the list of lists will generate the output as needed.
I have noticed that the sum of squares in my models can change fairly radically with even the slightest adjustment to my models???? Is this normal???? I'm using SPSS 16, and both models presented below used the same data and variables with only one small change - categorizing one of the variables as either a 2 level or 3 level variable.
Details - using a 2 x 2 x 6 mixed model ANOVA with the 6 being the repeated measure i get the following in the between group analysis
------------------------------------------------------------
Source | Type III SS | df | MS | F | Sig
------------------------------------------------------------
intercept | 4086.46 | 1 | 4086.46 | 104.93 | .000
X | 224.61 | 1 | 224.61 | 5.77 | .019
Y | 2.60 | 1 | 2.60 | .07 | .80
X by Y | 19.25 | 1 | 19.25 | .49 | .49
Error | 2570.40 | 66 | 38.95 |
Then, when I use the exact same data but a slightly different model in which variable Y has 3 levels instead of 2 levels I get the following
------------------------------------------------------------
Source | Type III SS | df | MS | F | Sig
------------------------------------------------------------
intercept | 3603.88 | 1 | 3603.88 | 90.89 | .000
X | 171.89 | 1 | 171.89 | 4.34 | .041
Y | 19.23 | 2 | 9.62 | .24 | .79
X by Y | 17.90 | 2 | 17.90 | .80 | .80
Error | 2537.76 | 64 | 39.65 |
I don't understand why variable X would have a different sum of squares simply because variable Y gets devided up into 3 levels instead of 2. This is also the case in the within groups analysis too.
Please help me understand :D
Thank you in advance
Pat
The type III Sum-of-Squares for X tells you how much you gain when you add X to a model including all the other terms. It appears that the 3-level Y variable is a much better predictor than the 2-level one: its SS went from 2.6 to 19.23. (this can happen, for example, if the effect of Y is quadratic: a cut at the vertex is not very predictive, but cutting into three groups would be better). Thus there is less left for X to explain - its SS decreases.
Just adding to what Aniko has said, the reason why variable X has a different sum of squares simply because variable Y gets divided up into 3 levels instead of 2, is that the SS formula for each factor depends on the number of samples in each treatment. When you change the number of levels in one factor, you actually change the number of samples for each treatment and this has an impact on the SS value for all the other factors.