How to create a table of random 0's and 1's (with weights) in Excel? - excel

I would like to create a random, square table of 0's and 1's like so
0 1 0
0 0 1
0 1 0
but only bigger (around 14 x 14). The diagonal should be all 0's and the table should be symmetric across the diagonal. For example, this would not be good:
0 1 1
0 0 0
0 1 0
I would also like to have control over the number of 1 's that appear, or at least the probability that 1's appear.
I only need to make a few such tables, so this does not have to be fully automated by any means. I do not mind at all doing a lot of the work by hand.
If possible, I would greatly prefer doing this without coding in VBA (small code in cells is okay of course) since I do not know it at all.

Edit: amended so as to return a symmetrical array, as requested by the OP.
=LET(λ,RANDARRAY(ξ,ξ),IF(1-MUNIT(ξ),GESTEP(MOD(MMULT(λ,TRANSPOSE(λ)),1),1-ζ),0))
where ξ is the side length of the returned square array and ζ is an approximation as to the probability of a non-diagonal entry within that array being unity.
As ξ increases, so does the accuracy of ζ.

Related

What if my dataset contains only 0 and 1? Can I check for correlation for them and get the significant results in Excel?

My data set looks like this:
P T O
1 1 0
1 0 1
1 1 1
0 1 0
1 1 0
My doubt is that we only have two values i.e. zero and one. That would logically mean that correlation can not compute the level of significance. My assumption is in this case it would be than calculating the coefficient based on occurrence of 4 combination i.e. {(1,1),(1,0),(0,1),(0,0)} rather than calculating in the magnitude of change in variables. But conversely if coefficient works only on magnitude of change, than is this the right method for my data set?
Could anyone tell me if I am on right track of thoughts or calculating such coefficient yields no significance?

What does fifth column of feColorMatrix stands for exactly

In mozilla's doc for feColorMatrix it is stated that
The SVG filter element changes colors based on a
transformation matrix. Every pixel's color value (represented by an
[R,G,B,A] vector) is matrix multiplied to create a new color.
However in feColorMatrix there are 5 columns, not 4.
In an excellent article that can be considered as a classical reference it is stated that:
The matrix here is actually calculating a final RGBA value in its
rows, giving each RGBA channel its own RGBA channel. The last number
is a multiplier.
But that does not explain a lot. As far as I understand, since after applying filter we basically modify exactly R, G, B and A channels and nothing else there's no need in this additional parameter. Indirectly there's an evidence for that in the article itself - all numerous examples of feColorMatrix-based filters provided - all have zeroes as fifth component. Also, why it's a multiplier?
In another famous article it is stated that:
For the other rows, you are creating each of the rgba output values as
the sum of the rgba input values multiplied by the corresponding
matrix value, plus a constant.
Calling it a constant added makes more sense than calling it a multiplier, however it's still unclear what does fifth component in feColor matrix stands for and what is unachievable without it - so that would be my question.
My last hope was the w3c reference but it's surprisingly vague as well.
The specification is clear although you do need to understand matrix math. The fifth column is a fixed offset. It's useful because if you want to add a specific amount of R/G/B/A to your output, that column is the only way to do it. Or if you want to recolor something to a specific color, that's also the way to do it.
For example - if you have multiple opaque colors in your input, but you want to recolor everything to rgba(255,51,0,1) then this is the matrix you would use.
0 0 0 0 1.0
0 0 0 0 0.2
0 0 0 0 0
0 0 0 1 0
aka
<feColorMatrix type="matrix" values="0 0 0 0 1 0 0 0 0 0.5 0 0 0 0 0 0 0 0 1 0"/>
Try out these sliders for yourself:
https://codepen.io/mullany/pen/qJCDk

measure difference between two distribution

I have a distance vector of a sample program. I am trying to quantify how similar they are. I was using Euclidean distance between sample groups (each value belongs to a bucket, we compare bucket by bucket), which works fine. But there are too many comparisons that needs to be done for large number of samples.
I was wondering if there is a efficient way to build an index to compare the samples. The samples look like this--
Sample:1 = {25 0 17 3 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0}
Sample:2 = {25 1 16 2 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0}
Sample:3 = {25 3 16 2 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0}
There exist many ways to characterise the "difference between two distributions". A specific and targeted answer requires more details concerning e.g. the underlying probability distribution(s).
It all depends on how you define a difference between two distributions. To give you two ideas:
A Kolmogorov-Smirnov test is a non-parametric test, that measures the "distance" between two cumulative/empirical distribution functions.
The Kullback-Leibler divergence measures the "distance" between two distributions in the language of information theory as a change in entropy.
Update [a year later]
Upon revisiting this post it might be important to emphasise a few things:
The standard two-sample Kolmogorov-Smirnov (KS) test assumes the underlying distribution to be continuous. For discrete data (which the data from the original post seems to be), an alternative may be to use a bootstrap version of the two-sample KS test, as in Matching::ks.boot. More details can be found on e.g. Cross Validated: Can I use Kolmogorov-Smirnov to compare two empirical distributions? and on Wikipedia: Two-sample Kolmogorov–Smirnov test.
Provided the sample data in the original post is representative, I don't think there will be a very meaningful answer from either a KS-statistic based test or the KL divergence (or really any other test for that matter). The reason being that the values from every sample are essentially all zero (to be precise, >80% of the values are zero). That in combination with the small sample size of 21 values per sample means that there is really not a lot "left" to characterise any underlying distribution.
In more general terms (and ignoring the limitations pointed out in the previous point), to calculate the KL divergence for all pairwise combinations one could do the following
library(entropy)
library(tidyverse)
expand.grid(1:length(lst), 1:length(lst)) %>%
rowwise() %>%
mutate(KL = KL.empirical(lst[[Var1]], lst[[Var2]]))
Since the KL divergence is not symmetric, we will need to calculate both the upper and lower triangular parts of the pairwise KL divergence matrix. In the interest of reducing compute time one could instead make use of a symmetrised KL divergence, which requires calculating the KL divergence only for the upper or lower triangular parts of the pairwise KL divergence matrix (although the symmetrised KL divergence versions themselves require calculating both KL divergences, i.e. KL(1->2) and KL(2->1) but this may be done through an optimised routine).

Plotting Different values in a line in Excel

Scenario: I am trying to plot values in a line: I have Max, Min, lower bound 1, upper bound 1, median value and my "Ret" value (which will change at each row, and each row would have its own line "graph").Each of these data point (max, min, bounds...) do have a numerical value.
Problem: I already tried all the graphing options in excel, but can't seem to find any way to get the wanted outcome.
Question: Is there a direct way to do that in excel?
This is what I am trying to achieve (each row will have one of these graphs, once I find out how to do it, I will write a VBA macro to automate this):
Apparently, the best way to do this is to assign a second value to all the rows and instead of plotting as a single column of values, plot each row as a coordinate. This answer came as an advice from a user in another forum, to the same question posted here (goo.gl/icL38d).
For a sample data:
Value X Y
min -5 0
max 5 0
median 0 0
lowb1 -2 0
lowb2min -4 0
upb1 2 0
upb2 4 0
Target 3 0
I plotted this as a scatterplot with the coordinates, and configured the target data point to stand out. The result was very close to the originally intended one.

EXCEL Count number of weeks in month based on date

I am trying to look up a value in a matrix based on a given date. The matrix has the first day of the week along the vertical axis, and the first day of the month along the horizontal axis.
For a given day, e.g. 31/08/15 I would like to match the exact date to the vertical axis of the matrix (i.e. 31/08/15), and the month to the horizontal axis (1/08/15).
So in the example below, an input of 31/08/15 should provide an output of 3.
01/06/2015 01/07/2015 01/08/2015 01/09/2015
03/08/2015 1 0 0 0
10/08/2015 0 2 0 0
17/08/2015 0 0 3 0
24/08/2015 0 0 0 4
31/08/2015 0 0 3 0
I am trying and failing with index and match formulae.
I have tried the following:
=index(area where to look, match(31/08/15,first column,0),match(and(month(31/08/15),year(31/08/15)),(and(month(first row),year(first row)),0)
Hope this is clear, thanks!
You can use an INDEX function with two MATCH functions top supply both the row and column.
    
The formula in D8 is,
=INDEX($B$2:$E$6,MATCH(C8,$A$2:$A$6,0),MATCH(DATE(YEAR(C8),MONTH(C8),1),$B$1:$E$1,0))
I'm a little concerned about the dates matching exactly down column A but a little maths manipulation with the WEEKDAY function would take care of that.
=INDEX($B$2:$E$6,MATCH(C9-WEEKDAY(C9, 2)+1,$A$2:$A$6,0),MATCH(DATE(YEAR(C9),MONTH(C9),1),$B$1:$E$1,0))
Here you go:
=INDEX($B$2:$E$6,MATCH(DATE(2015,8,31),$A$2:$A$6,),MATCH(DATE(2015,8,1),$B$1:$E$1,))

Resources