Create a matrix with random numbers when required shape is derived from other variables - j

I want to create a matrix with random numbers in J programming language when the required shape is derived from other variables.
I could create such a matrix with ? 3 5 $ 0 if i specify its shape using literal integers. But I am struggling to find a way to create such a matrix when the shape is # y and # x instead of 3 and 5 shown in above example.
I have tried ? 0 $~ # y, # x and it has not worked.
I think I need some way to apply # over a list of variables and return a list of numbers which should be placed after $~, somewhat like map functionality of other languages. Is there a way to do this?

I think that ?#:$ is what you are looking for
3 5 ?#:$ 0
0.031974 0.272734 0.792653 0.439747 0.136448
0.332198 0.00904103 0.7896 0.78304 0.682833
0.27289 0.855249 0.0922516 0.185466 0.257876
The general structure for this is x u#:v y <-> (u (x v y)) where u and v are the verbs and the arguments are x and y.
Hope this helps.
Rereading your question it looks as if you want the shape to be based on the number of items in the arguments. Here I would use # to count the items in each argument, then use , to create the left argument for $&0 and apply ? to the result.
3 4 5 (?#:($&0 #:,))&# 5 3 3 4 5
0.179395 0.456545 0.805514 0.471521 0.0967092
0.942029 0.30713 0.228288 0.693909 0.338689
0.632752 0.618275 0.100224 0.959804 0.517927
Is this closer to what you had in mind?
And as often the case, I thought of another approach overnight
3 4 5 ?#0:"0/ 1 2 3 4 5
0.271366 0.291846 0.0493541 0.72488 0.47988
0.50287 0.980205 0.58541 0.778901 0.0755205
0.0114588 0.523955 0.535905 0.5333 0.984908

Related

retrieve original factor loadings using Factor Analysis in Scikit learn

So I am trying to do a toy example where I know factors in advance and I want to back them out using FactorAnalysis or PCA using SciKit learn.
Lets say I have defined 4 random X factors and 10 Y dependent variable:
# number of obs
N=10000
n_factors=4
n_variables=10
# 4 Random Factors ~N(0,1)
X=np.random.normal(size=(N,n_factors))
# Loadings for 10 Y dependent variables
loadings=pd.DataFrame(np.round(np.random.normal(0,2,size=(n_factors,n_variables)),2))
# Y without unique variance
Y_hat=X.dot(loadings)
There is no random noise here so if I run the PCA it will show that 4 factors explain all the variance as one would expect:
pca=PCA(n_components=n_factors)
pca.fit(Y_hat)
np.cumsum(pca.explained_variance_ratio_)
array([0.47940185, 0.78828548, 0.93573719, 1. ])
so far so good. In the next step I have ran the FA analysis and reconstituted the Y from the calculated loadings and factor scores:
fa=FactorAnalysis(n_components=n_factors, random_state=0,rotation=None)
X_fa = fa.fit_transform(Y_hat)
loadings_fa=pd.DataFrame(fa.components_)
Y_hat_fa=X_fa.dot(loadings_fa)+np.mean(Y_hat,axis=0)
print((Y_hat_fa-Y_hat).max())
print((Y_hat_fa-Y_hat).min())
6.039613253960852e-13
-5.577760475716786e-13
So the the original variables and reconstituted variables from FA match almost exactly.
However,
The loadings don't match at all and neither do factors:
loadings_fa-loadings
0 1 2 3 4 5 6 7 8 9
0 1.70402 -3.37357 3.62861 -0.85049 -6.10061 11.63636 3.06843 -6.89921 4.17525 3.90106
1 -1.38336 5.00735 0.04610 1.50830 0.84080 -0.44424 -1.52718 3.53620 3.06496 7.13725
2 1.63517 -1.95932 2.71208 -2.34872 -2.10633 4.50955 3.45529 -1.44261 0.03151 0.37575
3 0.27463 3.89216 2.00659 -2.18016 1.99597 -1.85738 2.34128 6.40504 -0.55935 4.13107
From quick calculations the factors from FA are not even well correlated with the original factors.
I am looking for a good theoretical explanation why I can't back out the original Factors and loadings, and not necessarily looking for code example

Reduce operation in Spark with constant values gives a constant result irrespective of input

ser = sc.parallelize([1,2,3,4,5])
freq = ser.reduce(lambda x,y : 1+2)
print(freq). #answer is 3
If I run reduce operation by giving constant values, it just gives the sum of those 2 numbers. So in this case, the answer is just 3. While I was expecting it would be (3+3+3+3=12) as there are 5 elements and the summation would happen 4 times. Not able to understand the internals of reduce here. Any help please?
You're misunderstanding what reduce does. It does not apply an aggregation operation (which you assume to be sum for some reason) to a mapping of all elements (which you suppose is what you do with lambda x,y : 1+2)
Reducing that RDD will, roughly speaking, do something like this:
call your lambda with 1, 2 -> lambda returns 3
carry 3 and call lambda with 3, 3 -> lambda returns 3
carry 3 and call lambda with 3, 4 -> lambda returns 3
carry 3 and call lambda with 3, 5 -> lambda returns 3
The reduce method returns the last value, which is 3.
If your intention is to compute 1 + 2 for each element in the RDD, then you need to map and then reduce, something like:
freq = ser.map(lambda x: 1 + 2).reduce(lambda a,b: a+b) #see how reduce works
#which you can rewrite as
freq = ser.map(lambda x: 1 + 2).sum()
But the result of this is 15, not 12 (as there are 5 elements). I don't know any operation that computes a mapping value for each "reduction" step and allows further reduction.
It's likely that is the wrong question to ask, but you can possibly do that by using the map & reduce option above, skipping just one element, although I strongly doubt this is intentional (because the commutative and associative operation of reduce can be called an arbitrary number of times depending on how the RDD is partitioned).

Mutable variables in Haskell?

I'm starting to wrap my head around Haskell and do some exciting experiments. And there's one thing I just seem to be unable to comprehend (previous "imperativist" experience talks maybe).
Recently, I was yearning to implement integer division function as if there where no multiply/divide operations. An immensely interesting brain-teaser which led to great confusion.
divide x y =
if x < y then 0
else 1 + divide (x - y) y
I compiled it and it.. works(!). That's mind-blowing. However, I was told, I was sure that variables are immutable in Haskell. How comes that with each recursive step variable x keeps it's value from previous step? Or is my glorious compiler lying to me? Why does it work at all?
Your x here doesn't change during one function call (i.e., after creation) - that's exactly what immutable means. What does change is value of x during multiple (recursive) calls. In a single stack frame (function call) the value of x is constant.
An example of execution of your code, for a simple case
call divide 8 3 -- (x = 8, y = 3), stack: divide 8 3
step 1: x < y ? NO
step 2: 1 + divide 5 3
call: divide 5 3 -- (x = 5, y = 3), stack: divide 8 3, divide 5 3
step 1: x < y ? NO
step 2: 1 + divide 2 3
call divide 2 3 -- (x = 2, y = 3), stack: divide 8 3, divide 5 3, divide 2 3
step 1: x < y ? YES
return: 0 -- unwinding bottom call
return 1 + 0 -- stack: divide 8 3, divide 5 3, unwinding middle call
return 1 + 1 + 0 -- stack: divide 8 3
I am aware that the above notation is not anyhow formalized, but I hope it helps to understand what recursion is about and that x might have different values in different calls, because it's simply a different instance of whole call, thus also different instance of x.
x is actually not a variable, but a parameter, and isn't that different from parameters in imperative languages.
Maybe it'd look more obvious with explicit return statements?
-- for illustrative purposes only, doesn't actually work
divide x y =
if x < y
then return 0
else return 1 + divide (x - y) y
You're not mutating x, just stacking up several function calls to calculate your desired result with the values they return.
Here's the same function in Python:
def divide(x, y):
if x < y:
return 0
else:
return 1 + divide(x - y, y)
Looks familiar, right? You can translate this to any language that allows recursion, and none of them would require you to mutate a variable.
Other than that, yes, your compiler is lying to you. Because you're not allowed to directly mutate values, the compiler can make a lot of extra assumptions based on your code, which helps translating it to efficient machine code, and at that level, there's no escaping mutability. The major benefit is that compilers are way less likely to introduce mutability-related bugs than us mortals.

counting results from a defined matrix

So I am very new to programming and Haskell is the first language that I'm learning. The problem I'm having is probably a very simple one but I simply can not find an answer, no matter how much I search.
So basically what I have is a 3x3-Matrix and each of the elements has a number from 1 to 3. This Matrix is predefined, now all I need to do is create a function which when I input 1, 2 or 3 tells me how many elements there are in this matrix with this value.
I've been trying around with different things but none of them appear to be allowed, for example I've defined 3 variables for each of the possible numbers and tried to define them by
value w =
let a=0
b=0
c=0
in
if matrix 1 1==1 then a=a+1 else if matrix 1 1==2 then b=b+1
etc. etc. for every combination and field.
<- ignoring the wrong syntax which I'm really struggling with, the fact that I can't use a "=" with "if, then" is my biggest problem. Is there a way to bypass this or maybe a way to use "stored data" from previously defined functions?
I hope I made my question somewhat clear, as I said I've only been at programming for 2 days now and I just can't seem to find a way to make this work!
By default, Haskell doesn't use updateable variables. Instead, you typically make a new value, and pass it somewhere else (e.g., return it from a function, add it into a list, etc).
I would approach this in two steps: get a list of the elements from your matrix, then count the elements with each value.
-- get list of elements using list comprehension
elements = [matrix i j | i <- [1..3], j <- [1..3]]
-- define counting function
count (x,y,z) (1:tail) = count (x+1,y,z) tail
count (x,y,z) (2:tail) = count (x,y+1,z) tail
count (x,y,z) (3:tail) = count (x,y,z+1) tail
count scores [] = scores
-- use counting function
(a,b,c) = count (0,0,0) elements
There are better ways of accumulating scores, but this seems closest to what your question is looking for.
Per comments below, an example of a more idiomatic counting method, using foldl and an accumulation function addscore instead of the count function above:
-- define accumulation function
addscore (x,y,z) 1 = (x+1,y,z)
addscore (x,y,z) 2 = (x,y+1,z)
addscore (x,y,z) 3 = (x,y,z+1)
-- use accumulation function
(a,b,c) = foldl addscore (0,0,0) elements

Get quadratic equation term of a graph in R

I need to find the quadratic equation term of a graph I have plotted in R.
When I do this in excel, the term appears in a text box on the chart but I'm unsure how to move this to a cell for subsequent use (to apply to values requiring calibrating) or indeed how to ask for it in R. If it is summonable in R, is it saveable as an object to do future calculations with?
This seems like it should be a straightforward request in R, but I can't find any similar questions. Many thanks in advance for any help anyone can provide on this.
All the answers provide aspects of what you appear at want to do, but non thus far brings it all together. Lets consider Tom Liptrot's answer example:
fit <- lm(speed ~ dist + I(dist^2), cars)
This gives us a fitted linear model with a quadratic in the variable dist. We extract the model coefficients using the coef() extractor function:
> coef(fit)
(Intercept) dist I(dist^2)
5.143960960 0.327454437 -0.001528367
So your fitted equation (subject to rounding because of printing is):
\hat{speed} = 5.143960960 + (0.327454437 * dist) + (-0.001528367 * dist^2)
(where \hat{speed} is the fitted values of the response, speed).
If you want to apply this fitted equation to some data, then we can write our own function to do it:
myfun <- function(newdist, model) {
coefs <- coef(model)
res <- coefs[1] + (coefs[2] * newdist) + (coefs[3] * newdist^2)
return(res)
}
We can apply this function like this:
> myfun(c(21,3,4,5,78,34,23,54), fit)
[1] 11.346494 6.112569 6.429325 6.743024 21.386822 14.510619 11.866907
[8] 18.369782
for some new values of distance (dist), Which is what you appear to want to do from the Q. However, in R we don't do things like this normally, because, why should the user have to know how to form fitted or predicted values from all the different types of model that can be fitted in R?
In R, we use standard methods and extractor functions. In this case, if you want to apply the "equation", that Excel displays, to all your data to get the fitted values of this regression, in R we would use the fitted() function:
> fitted(fit)
1 2 3 4 5 6 7 8
5.792756 8.265669 6.429325 11.608229 9.991970 8.265669 10.542950 12.624600
9 10 11 12 13 14 15 16
14.510619 10.268988 13.114445 9.428763 11.081703 12.122528 13.114445 12.624600
17 18 19 20 21 22 23 24
14.510619 14.510619 16.972840 12.624600 14.951557 19.289106 21.558767 11.081703
25 26 27 28 29 30 31 32
12.624600 18.369782 14.057455 15.796751 14.057455 15.796751 17.695765 16.201008
33 34 35 36 37 38 39 40
18.688450 21.202650 21.865976 14.951557 16.972840 20.343693 14.057455 17.340416
41 42 43 44 45 46 47 48
18.038887 18.688450 19.840853 20.098387 18.369782 20.576773 22.333670 22.378377
49 50
22.430008 21.93513
If you want to apply your model equation to some new data values not used to fit the model, then we need to get predictions from the model. This is done using the predict() function. Using the distances I plugged into myfun above, this is how we'd do it in a more R-centric fashion:
> newDists <- data.frame(dist = c(21,3,4,5,78,34,23,54))
> newDists
dist
1 21
2 3
3 4
4 5
5 78
6 34
7 23
8 54
> predict(fit, newdata = newDists)
1 2 3 4 5 6 7 8
11.346494 6.112569 6.429325 6.743024 21.386822 14.510619 11.866907 18.369782
First up we create a new data frame with a component named "dist", containing the new distances we want to get predictions for from our model. It is important to note that we include in this data frame a variable that has the same name as the variable used when we created our fitted model. This new data frame must contain all the variables used to fit the model, but in this case we only have one variable, dist. Note also that we don't need to include anything about dist^2. R will handle that for us.
Then we use the predict() function, giving it our fitted model and providing the new data frame just created as argument 'newdata', giving us our new predicted values, which match the ones we did by hand earlier.
Something I glossed over is that predict() and fitted() are really a whole group of functions. There are versions for lm() models, for glm() models etc. They are known as generic functions, with methods (versions if you like) for several different types of object. You the user generally only need to remember to use fitted() or predict() etc whilst R takes care of using the correct method for the type of fitted model you provide it. Here are some of the methods available in base R for the fitted() generic function:
> methods(fitted)
[1] fitted.default* fitted.isoreg* fitted.nls*
[4] fitted.smooth.spline*
Non-visible functions are asterisked
You will possibly get more than this depending on what other packages you have loaded. The * just means you can't refer to those functions directly, you have to use fitted() and R works out which of those to use. Note there isn't a method for lm() objects. This type of object doesn't need a special method and thus the default method will get used and is suitable.
You can add a quadratic term in the forumla in lm to get the fit you are after. You need to use an I()around the term you want to square as in the example below:
plot(speed ~ dist, cars)
fit1 = lm(speed ~ dist, cars) #fits a linear model
abline(fit1) #puts line on plot
fit2 = lm(speed ~ I(dist^2) + dist, cars) #fits a model with a quadratic term
fit2line = predict(fit2, data.frame(dist = -10:130))
lines(-10:130 ,fit2line, col=2) #puts line on plot
To get the coefficients from this use:
coef(fit2)
I dont think it is possible in Excel, as they only provide functions to get coefficients for a linear regression (SLOPE, INTERCEPT, LINEST) or for a exponential one (GROWTH, LOGEST), though you may have more luck by using Visual Basic.
As for R you can extract model coefficients using the coef function:
mdl <- lm(y ~ poly(x,2,raw=T))
coef(mdl) # all coefficients
coef(mdl)[3] # only the 2nd order coefficient
I guess you mean that you plot X vs Y values in Excel or R, and in Excel use the "Add trendline" functionality. In R, you can use the lm function to fit a linear function to your data, and this also gives you the "r squared" term (see examples in the linked page).

Resources