In J, how do we generate a list of numbers that come from a normal distribution with a specified mean and variance - j

In J, I know that we can generate a list of uniform random numbers, and use some sort of inverse function to have a list of normal distribution number. But is there a quick way to achieve this?

How to generate, say, a 3*4-matrix B with elements b distributed as b ~ N(5,0.9^2)
A way #1
load 'stats/distribs'
B=. 5 0.9 rnorm 3 4
A way #2
load 'math/mt'
NB. real b ~ N(5,0.9^2)
B=. 5 0.9 randnf_mt_ 3 4
NB. complex b ~ N(5+i*6,0.9^2)
B=. 5j6 0.9 randnc_mt_ 3 4
NB. quaternion b ~ N(5+i*6+j*7+k*8,0.9^2)
B=. 5j6 7j8 0.9 randnq_mt_ 3 4
Both uses Box-Muller approach.

Related

Excel: Combine multiple matrices into single matrix

I'm organizing an Excel sheet containing various matrix calculations.
In order to obtain elegant code, I would need to glue two matrices of dimension n by 1 (n a natural number), based on previous data, together into an 2n by 1 matrix and use this new matrix inside a formula.
I wish to visualize neither the n by 1 nor the 2n by 1 matrices, as their existence is only required inside my calculations and serves no intuitional purpose.
I wish to fabricate a solution using only existing Excel-functions, within Office 365 (version 1902).
Conceptually, I would like to use column F, the result of E1:E5 below D1:D5 in a subsequent formula, where D and E are i.e. COS(A1:A5 *C1) and SIN(A1:A5 *C1)
A B C D E F
1 28,98 4392 0,66 0,75 0,66
2 30,00 -0,03 1,00 -0,03
3 29,99 -0,76 -0,65 -0,76
4 28,43 0,51 -0,86 0,51
5 15,04 -0,31 0,95 -0,31
6 0,75
7 1,00
8 -0,65
9 -0,86
10 0,95
Note: The numbers serve merely as an illustration.
PS: Forgive me my inexperience in formulating questions. Any advice is welcome.

x-.y And what about intersection?

x-.y includes all items of x except for those that are cells of y
But what if I want to get all items that are cells of x and of y?
I can achieve this by
x -.^:2 y
But it require running expensive operation twice.
Is there a better solution?
e. is often useful when working with sets.
x e. y
gives a list of matches:
for each item of x return 1 if it exists in the "set" y, 0 otherwise.
1 2 3 4 e. 5 9 2
0 1 0 0
Then,
x (e. # [) y
selects those elements that do exist in both lists.
1 2 3 4 (e. # [) 5 9 2
2
5 8 (e. # [) i.12
5 8
Doing -. twice is the classic way of implementing intersection in J.
The inefficiency is minor (a constant factor - and, in general, you should not concern yourself with efficiency issues in J unless they exceed a factor of 2 - when you have resource problems you're generally going to want to focus on the factor of 1000 or greater issues).
Put differently, if ([-.-.) or -.^:2 is too slow for you then -. would also be too slow for you. (This can happen on extremely large data sets where the underlying implementation has been inefficient. Recent versions of J have had some work done, to correct this issue.)
Disappointing, perhaps, but practical.

Can I specify order of evaluation, and how a matrix gets multuplied in python?

I'm taking a numerical linear algebra course and I've chosen to use python as my language of choice (want to be employable). Is there a way to evaluate (AB)C vs A(BC), where A,B,C are conformable matrices? I want to check cpu time and operation count for each of these. In addition, is there a way to force python to calculate AB as a sum of outer products and as a Matrix whose entries are the inner product of the rows and columns of A, and B respectively. I'm new to python and haven't had any luck with a google search, which is rare. Im using python 3.5* that uses # for matrix multiplication. I searched for resources where numerical linear algebra is done using python but haven't found anything useful. Thanks for your help.
The answer to the first question is trivial: write (A # B) # C versus A # (B # C). CPython is smart enough that know that it is too dumb to try to rewrite expressions. The language definitiion does not require that the special methods (such as __add__ for +) have any particular properties. (In fact, float add is not associative.)
>>> from dis import dis
>>> dis('(a + b) + c')
1 0 LOAD_NAME 0 (a)
3 LOAD_NAME 1 (b)
6 BINARY_ADD
7 LOAD_NAME 2 (c)
10 BINARY_ADD
11 RETURN_VALUE
>>> dis('a + (b + c)')
1 0 LOAD_NAME 0 (a)
3 LOAD_NAME 1 (b)
6 LOAD_NAME 2 (c)
9 BINARY_ADD
10 BINARY_ADD
11 RETURN_VALUE
Replace + with # and the output is the same with ADD changed to MATRIX_MULTIPLY.
I do not understand the details of the second question, but you should be able to define your own Matrix class with a __matmul__ method that does what you want. It could either be 'pure' python or build on numpy.

Excel replace numbers with custom unequal ranges

Suppose I have the following data
Name Output
A 0.1
B 7
C 0.4
D 0.9
E 1.1
F 12
G 22
I would like replace the output variable by custom ranges:
Name Output Output_2
A 0.1 0m-0.3m
B 7 6y-10y
C 0.4 0.4m-0.6m
D 0.9 0.7m-1y
E 1.1 1y-5y
F 12 11y-20y
G 22 21y-40y
Right now, I am doing this (a long list of nested IFs)
=IF([#Tenor]<= 0.25, "0m-3m", IF([#Tenor]<=0.5, "4m-6m", IF([#Tenor] <= 1, "7m-1y",IF([#Tenor]<=5,"2y-5y",IF([#Tenor]<=10,"6y-10y",IF([#Tenor]<20,"11y-20y",IF([#Tenor]<40,"20y-40y")))))))
and it works but I am concerned that as the number of ranges increases, this will be painful to write. I was hoping I could write down a range somewhere and ask excel to look it up and do some case type thing.
Let's assume your first three columns are A, B and C (like in the second code block you posted). Add the following data into columns E and F (this will be your mapping data):
Output Output2
0 0m-3m
0.25 4m-6m
0.5 7m-1Y
1 2y-5y
5 6y-10y
10 11y-20y
20 20y-40y
40 20y-40y
Then write the following formula into C2 cell and drag it down:
=INDEX($F:$F,MATCH(B2,$E:$E,1))
UPDATE: you can do this even simpler with approximate VLOOKUP:
=VLOOKUP(B2,$E:$F,2,1)

loop rolling algorithm

I have come up with the term loop rolling myself with the hope that it does
not overlap with an existing term. Basically I'm trying to come up with an
algorithm to find loops in a printed text.
Some examples from simple to complicated
Example1
Given:
a a a a a b c d
I want to say:
5x(a) b c d
or algorithmically:
for 1 .. 5
print a
end
print b
print c
print d
Example2
Given:
a b a b a b a b c d
I want to say:
4x(a b) c d
or algorithmically:
for 1 .. 4
print a
print b
end
print c
print d
Example3
Given:
a b c d b c d b c d b c e
I want to say:
a 3x(b c d) b c e
or algorithmically:
print a
for 1 .. 3
print b
print c
print d
end
print b
print c
print d
It didn't remind me of any algorithm that I know of. I feel like some of the
problems can be ambiguous but finding one of the solutions is enough to me for
now. Efficiency is always welcome but not mandatory. How can I do this?
EDIT
First of all, thanks for all the discussion. I have adapted an LZW algorithm
from rosetta and ran it on my
input:
abcdbcdbcdbcdef
which gave me:
a
b
c
d
8 => bc
10 => db
9 => cd
11 => bcd
e
f
where I have a dictionary of:
a a
c c
b b
e e
d d
f f
8 bc
9 cd
10 db
11 bcd
12 dbc
13 cdb
14 bcde
15 ef
7 ab
It looks good for compression but it's not quite what I wanted. What I need
is more like compression in the algorithmic representation from my examples
which would have:
subsequent sequences (if a sequence is repeating, there would be no other
sequence in between)
no dictionary but only loops
irreducable
with maximum sequence sizes (which would minimize the algorithmic
representation)
and let's say nested loops are allowed (contrary to what I said before in
the comment)
I start with an algorithm, which gives maximum sequence sizes. Though it would not always minimize the algorithmic representation, it may be used as an approximation algorithm. Or it may be extended to optimal algorithm.
Start with constructing Suffix array for your text along with LCP array.
Sort an array of indexes of LCP array, indexes of larger elements of LCP array come first. This groups together repeating sequences of the same length and allows to process sequences in greedy manner, starting from maximum sequence sizes.
Extract suffix array entries, grouped by LCP value (by group I mean all the entries with selected LCP value as well as all entries with larger LCP values), and sort them by position in the text.
Filter out entries with positional difference not equal to LCP. For remaining entries, get prefixes of length, equal to LCP. This gives all possible sequences in the text.
Add sequences, sorted by starting position, to ordered collection (for example, binary search tree). Sequences are added in order of appearance in sorted LCP, so longer sequences are added first. Sequences are added only if they are independent or if one of them is completely nested inside the other one. Intersecting intervals are ignored. For example, in caba caba bab sequence ab intersects with caba and so it is ignored. But in cababa cababa babab one instance of ab is dropped, 2 instances are completely inside larger sequence, and 2 instances are completely outside of it.
At the end, this ordered collection contains all the information, needed to produce the algorithmic representation.
Example:
Text ababcabab
Suffix array ab abab ababcabab abcabab b bab babcabab bcabab cabab
LCP array 2 4 2 0 1 3 1 0
Sorted LCP 4 3 2 2 1 1 0 0
Positional difference 5 5 2 2 2 2 - -
Filtered LCP - - 2 2 - - - -
Filtered prefixes (ab ab) (ab ab)
Sketch of an algorithm, producing the minimal algorithmic representation.
Start with the first 4 steps of previous algorithm. Fifth step should be modified. Now it is not possible to ignore intersecting intervals, so every sequence is added to the collection. Since the collection now contains intersecting intervals, it is better to implement it as some advanced data structure, for example, Interval tree.
Then recursively determine the length of algorithmic representation for all sequences, that contain any nested sequences, starting from the smallest ones. When every sequence is evaluated, compute optimal algorithmic representation for whole text. Algorithm for processing either a sequence or whole text uses dynamic programming: allocate a matrix with number of columns, equal to text/sequence length and number of rows, equal to the length of algorithmic representation; doing in-order traversal of interval tree, update this matrix with all sequences, possible for each text position; when more than one value for some cell is possible, either choose any of them, or give preference to longer or shorter sub-sequences.

Resources