All list items up to, and including, the first repeated item - j

Consider:
x =. 0 1 2 3 4 1 3 4 99
v =. [ {.~ (>: # i.&1 # (##~. = #\))
v x NB. => 0 1 2 3 4 1
The behavior is correct. But as you can see, v is shamefully verbose. Is there a better solution?

You want the monad ~: (nub sieve):
v =: {.~ 1 + 0 i.~ ~:
x =: 0 1 2 3 4 1 3 4 99
v x
0 1 2 3 4 1
Code review:
Outside code-golf contexts, don't use #\ in place of i.##. It's too cutesy, hard to maintain, and won't be recognized by the special-code optimizer.
Don't assign to the names x, y, u, v, m, or n (except in special circumstances, and always locally in an explicit context).

Related

Use # (Copy) as selection or filter on 2-d array

The J primitive Copy (#) can be used as a filter function, such as
k =: i.8
(k>3) # k
4 5 6 7
That's essentially
0 0 0 0 1 1 1 1 # i.8
The question is if the right-hand side of # is 2-d or higher rank shaped array, how to make a selection using #, if possible. For example:
k =: 2 4 $ i.8
(k > 3) # k
I got length error
What is the right way to make such a selection?
You can use the appropriate verb rank to get something like a 2d-selection:
(2 | k) #"1 1 k
1 3
5 7
but the requested axes have to be filled with 0s (or !.) to keep the correct shape:
(k > 3) #("1 1) k
0 0 0 0
4 5 6 7
(k > 2) #("1 1) k
3 0 0 0
4 5 6 7
You have to better define select for dimensions > 1 because now you have a structure. How do you discard values? Do you keep empty "cells"? Do you replace with 0s? Is structure important for the result?
If, for example, you only need the "values where" then just ravel , the array:
(,k > 2) # ,k
3 4 5 6 7
If you need to "replace where", then you can use amend }:
u =: 5 :'I. , 5 > y' NB. indices where 5 > y
0 u } k
0 0 0 0
0 5 6 7
z =: 3 2 4 $ i.25
u =: 4 :'I. , (5 > y) +. (0 = 3|y)' NB. indices where 5>y or 3 divides y
_999 u } z
_999 _999 _999 _999
_999 5 _999 7
8 _999 10 11
_999 13 14 _999
16 17 _999 19
20 _999 22 23

List of all permutations

Verbs C. A. is related to permutations.
And they have very complicated documentation.
I want just get all possible permutations (n!)
For example for elements 1 2 3
1 2 3
1 3 2
2 1 3
2 3 1
3 1 2
3 2 1
Left argument of A. is a list of permutation indeces.
Right argument of A. is the list to be permuted.
The initial (unpermuted) list has index 0 and it goes on from there lexicographically [*].
Egs:
(0) A. 'a';'b';'c'
┌─┬─┬─┐
│a│b│c│
└─┴─┴─┘
(1 0) A. 1 2 3
1 3 2
1 2 3
(0 1 2) A. 5 1 2
5 1 2
5 2 1
1 5 2
To get all permutations of a list, you request all (! #y) (factorial of number of elements of list y to be permuted) of them, by requesting all indeces 0 ... (n-1): i. (! # y):
(i.!#y) A. y
[*]: Lexicographically by the implied list i. # y. That is, A. always permutes the simple list 0 ... n and then applies this permutation to your initial list: permutation { initial_list.

Combining pairs in a string (Matlab)

I have a string:
sup_pairs = 'BA CE DF EF AE FC GD DA CG EA AB BG'
How can I combine pairs which have the last character of 1 pair is the first character of the follow pairs into strings? And the new strings must contain all of the character 'A','B','C','D','E','F' , 'G', those characters are appeared in the sup_pairs string.
The expected output should be:
S1 = 'BAEFCGD' % because BA will be followed by AE in sup_pairs string, so we combine BAE, and so on...we continue the rule to generate S1
S2 = 'DFCEABG'
If I have AB, BC and BD, the generated strings should be both : ABC and ABD .
If there is any repeated character in the pairs like : AB BC CA CE . We will skip the second A , and we get ABCE .
This, like all good things in life, is a graph problem. Each letter is a node, and each pair is an edge.
First we must transform your string of pairs into a numeric format so we can use the letters as subscripts. I will use A=2, B=3, ..., G=8:
sup_pairs = 'BA CE DF EF AE FC GD DA CG EA AB BG';
p=strsplit(sup_pairs,' ');
m=cell2mat(p(:));
m=m-'?';
A=sparse(m(:,1),m(:,2),1);
The sparse matrix A is now the adjacency matrix (actually, more like an adjacency list) representing our pairs. If you look at the full matrix of A, it looks like this:
>> full(A)
ans =
0 0 0 0 0 0 0 0
0 0 1 0 0 1 0 0
0 1 0 0 0 0 0 1
0 0 0 0 0 1 0 1
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
As you can see, the edge BA, which translates to subscript (3,2) is equal to 1.
Now you can use your favorite implementation of Depth-first Search (DFS) to perform a traversal of the graph from your starting node of choice. Each path from the root to a leaf node represents a valid string. You then transform the path back into your letter sequence:
treepath=[3,2,6,7,4,8,5];
S1=char(treepath+'?');
Output:
S1 = BAEFCGD
Here's a recursive implementation of DFS to get you going. Normally in MATLAB you have to worry about not hitting the default limitation on recursion depth, but you're finding Hamiltonian paths here, which is NP-complete. If you ever get anywhere near the recursion limit, the computation time will be so huge that increasing the depth will be the least of your worries.
function full_paths = dft_all(A, current_path)
% A - adjacency matrix of graph
% current_path - initially just the start node (root)
% full_paths - cell array containing all paths from initial root to a leaf
n = size(A, 1); % number of nodes in graph
full_paths = cell(1,0); % return cell array
unvisited_mask = ones(1, n);
unvisited_mask(current_path) = 0; % mask off already visited nodes (path)
% multiply mask by array of nodes accessible from last node in path
unvisited_nodes = find(A(current_path(end), :) .* unvisited_mask);
% add restriction on length of paths to keep (numel == n)
if isempty(unvisited_nodes) && (numel(current_path) == n)
full_paths = {current_path}; % we've found a leaf node
return;
end
% otherwise, still more nodes to search
for node = unvisited_nodes
new_path = dft_all(A, [current_path node]); % add new node and search
if ~isempty(new_path) % if this produces a new path...
full_paths = {full_paths{1,:}, new_path{1,:}}; % add it to output
end
end
end
This is a normal Depth-first traversal except for the added condition on the length of the path in line 15:
if isempty(unvisited_nodes) && (numel(current_path) == n)
The first half of the if condition, isempty(unvisited_nodes) is standard. If you only use this part of the condition you'll get all paths from the start node to a leaf, regardless of path length. (Hence the cell array output.) The second half, (numel(current_path) == n) enforces the length of the path.
I took a shortcut here because n is the number of nodes in the adjacency matrix, which in the sample case is 8 rather than 7, the number of characters in your alphabet. But there are no edges into or out of node 1 because I was apparently planning on using a trick that I never got around to telling you about. Rather than run DFS starting from each of the nodes to get all of the paths, you can make a dummy node (in this case node 1) and create an edge from it to all of the other real nodes. Then you just call DFS once on node 1 and you get all the paths. Here's the updated adjacency matrix:
A =
0 1 1 1 1 1 1 1
0 0 1 0 0 1 0 0
0 1 0 0 0 0 0 1
0 0 0 0 0 1 0 1
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
If you don't want to use this trick, you can change the condition to n-1, or change the adjacency matrix not to include node 1. Note that if you do leave node 1 in, you need to remove it from the resulting paths.
Here's the output of the function using the updated matrix:
>> dft_all(A, 1)
ans =
{
[1,1] =
1 2 3 8 5 7 4 6
[1,2] =
1 3 2 6 7 4 8 5
[1,3] =
1 3 8 5 2 6 7 4
[1,4] =
1 3 8 5 7 4 6 2
[1,5] =
1 4 6 2 3 8 5 7
[1,6] =
1 5 7 4 6 2 3 8
[1,7] =
1 6 2 3 8 5 7 4
[1,8] =
1 6 7 4 8 5 2 3
[1,9] =
1 7 4 6 2 3 8 5
[1,10] =
1 8 5 7 4 6 2 3
}

How can I implement a grouping algorithm in J?

I'm trying to implement A006751 in J. It's pretty easy to do in Haskell, something like:
concat . map (\g -> concat [show $ length g, [g !! 0]]) . group . show
(Obviously that's not complete, but it's the basic heart of it. I spent about 10 seconds on that, so treat it accordingly.) I can implement any of this fairly easily in J, but the part that eludes me is a good, idiomatic J algorithm that corresponds to Haskell's group function. I can write a clumsy one, but it doesn't feel like good J.
Can anyone implement Haskell's group in good J?
Groups are usually done with the /. adverb.
1 1 2 1 </. 'abcd'
┌───┬─┐
│abd│c│
└───┴─┘
As you can see, it's not sequential. Just make your key sequential like so (essentially determining if an item is different from the next, and do a running sum of the resulting 0's and 1's):
neq =. 13 : '0, (}. y) ~: (}: y)'
seqkey =. 13 : '+/\neq y'
(seqkey 1 1 2 1) </. 'abcd'
┌──┬─┬─┐
│ab│c│d│
└──┴─┴─┘
What I need then is a function which counts the items (#), and tells me what they are ({. to just pick the first). I got some inspiration from nubcount:
diffseqcount =. 13 : ',(seqkey y) (#,{.)/. y'
diffseqcount 2
1 2
diffseqcount 1 2
1 1 1 2
diffseqcount 1 1 1 2
3 1 1 2
If you want the nth result, just use power:
diffseqcount(^:10) 2 NB. 10th result
1 3 2 1 1 3 2 1 3 2 2 1 1 3 3 1 1 2 1 3 2 1 2 3 2 2 2 1 1 2
I agree that /. ( Key ) is the best general method for applying verbs to groups in J. An alternative in this case, where we need to group consecutive numbers that are the same, is dyadic ;. (Cut):
1 1 0 0 1 0 1 <(;.1) 3 1 1 1 2 2 3
┌─┬─────┬───┬─┐
│3│1 1 1│2 2│3│
└─┴─────┴───┴─┘
We can form the frets to use as the left argument as follows:
1 , 2 ~:/\ 3 1 1 1 2 2 3 NB. inserts ~: in the running sets of 2 numbers
1 1 0 0 1 0 1
Putting the two together:
(] <;.1~ 1 , 2 ~:/\ ]) 3 1 1 1 2 2 3
┌─┬─────┬───┬─┐
│3│1 1 1│2 2│3│
└─┴─────┴───┴─┘
Using the same mechanism as suggested previously:
,#(] (# , {.);.1~ 1 , 2 ~:/\ ]) 3 1 1 1 2 2 3
1 3 3 1 2 2 1 3
If you are looking for a nice J implementation of the look-and-say sequence then I'd suggest the one on Rosetta Code:
las=: ,#((# , {.);.1~ 1 , 2 ~:/\ ])&.(10x&#.inv)#]^:(1+i.#[)
5 las 1 NB. left arg is sequence length, right arg is starting number
11 21 1211 111221 312211

How to count the frequency of a element in APL or J without loops

Assume I have two lists, one is the text t, one is a list of characters c. I want to count how many times each character appears in the text.
This can be done easily with the following APL code.
+⌿t∘.=c
However it is slow. It take the outer product, then sum each column.
It is a O(nm) algorithm where n and m are the size of t and c.
Of course I can write a procedural program in APL that read t character by character and solve this problem in O(n+m) (assume perfect hashing).
Are there ways to do this faster in APL without loops(or conditional)? I also accept solutions in J.
Edit:
Practically speaking, I'm doing this where the text is much shorter than the list of characters(the characters are non-ascii). I'm considering where text have length of 20 and character list have length in the thousands.
There is a simple optimization given n is smaller than m.
w ← (∪t)∩c
f ← +⌿t∘.=w
r ← (⍴c)⍴0
r[c⍳w] ← f
r
w contains only the characters in t, therefore the table size only depend on t and not c. This algorithm runs in O(n^2+m log m). Where m log m is the time for doing the intersection operation.
However, a sub-quadratic algorithm is still preferred just in case someone gave a huge text file.
NB. Using "key" (/.) adverb w/tally (#) verb counts
#/.~ 'abdaaa'
4 1 1
NB. the items counted are the nub of the string.
~. 'abdaaa'
abd
NB. So, if we count the target along with the string
#/.~ 'abc','abdaaa'
5 2 1 1
NB. We get an extra one for each of the target items.
countKey2=: 4 : '<:(#x){.#/.~ x,y'
NB. This subtracts 1 (<:) from each count of the xs.
6!:2 '''1'' countKey2 10000000$''1234567890'''
0.0451088
6!:2 '''1'' countKey2 1e7$''1234567890'''
0.0441849
6!:2 '''1'' countKey2 1e8$''1234567890'''
0.466857
NB. A tacit version
countKey=. [: <: ([: # [) {. [: #/.~ ,
NB. appears to be a little faster at first
6!:2 '''1'' countKey 1e8$''1234567890'''
0.432938
NB. But repeating the timing 10 times shows they are the same.
(10) 6!:2 '''1'' countKey 1e8$''1234567890'''
0.43914
(10) 6!:2 '''1'' countKey2 1e8$''1234567890'''
0.43964
Dyalog v14 introduced the key operator (⌸):
{⍺,⍴⍵}⌸'abcracadabra'
a 5
b 2
c 2
r 2
d 1
The operand function takes a letter as ⍺ and the occurrences of that letter (vector of indices) as ⍵.
I think this example, written in J, fits your request. The character list is longer than the text (but both are kept short for convenience during development.) I have not examined timing but my intuition is that it will be fast. The tallying is done only with reference to characters that actually occur in the text, and the long character set is looked across only to correlate characters that occur in the text.
c=: 80{.43}.a.
t=: 'some {text} to examine'
RawIndicies=: c i. ~.t
Mask=: RawIndicies ~: #c
Indicies=: Mask # RawIndicies
Tallies=: Mask # #/.~ t
Result=: Tallies Indicies} (#c)$0
4 20 $ Result
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 4 0
0 0 1 0 0 0 2 1 2 0 0 0 1 3 0 0 0 2 0 0
4 20 $ c
+,-./0123456789:;<=>
?#ABCDEFGHIJKLMNOPQR
STUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz
As noted in other answers, the key operator does this directly. However the classic APL way of solving this problem is still worth knowing.
The classic solution is "sort, shift, and compare":
c←'missippi'
t←'abcdefghijklmnopqrstuvwxyz'
g←⍋c
g
1 4 7 0 5 6 2 3
s←c[g]
s
iiimppss
b←s≠¯1⌽s
b
1 0 0 1 1 0 1 0
n←b/⍳⍴b
n
0 3 4 6
k←(1↓n,⍴b)-n
k
3 1 2 2
u←b/s
u
imps
And for the final answer:
z←(⍴t)⍴0
z
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z[t⍳u]←k
z
0 0 0 0 0 0 0 0 3 0 0 0 1 0 0 2 0 0 2 0 0 0 0 0 0 0
This code is off the top of my head, not ready for production. Have to look for empty cases - the boolean shift is probably not right for all cases....
"Brute force" in J:
count =: (i.~~.) ({,&0) (]+/"1#:=)
Usage:
'abc' count 'abdaaa'
4 1 0
Not sure how it's implemented internally, but here are the timings for different input sizes:
6!:2 '''abcdefg'' count 100000$''abdaaaerbfqeiurbouebjkvwek''' NB: run time for #t = 100000
0.00803909
6!:2 '''abcdefg'' count 1000000$''abdaaaerbfqeiurbouebjkvwek'''
0.0845451
6!:2 '''abcdefg'' count 10000000$''abdaaaerbfqeiurbouebjkvwek''' NB: and for #t = 10^7
0.862423
We don't filter input date prior to 'self-classify' so:
6!:2 '''1'' count 10000000$''1'''
0.244975
6!:2 '''1'' count 10000000$''1234567890'''
0.673034
6!:2 '''1234567890'' count 10000000$''1234567890'''
0.673864
My implementation in APL (NARS2000):
(∪w),[0.5]∪⍦w←t∩c
Example:
c←'abcdefg'
t←'abdaaaerbfqeiurbouebjkvwek'
(∪w),[0.5]∪⍦w←t∩c
a b d e f
4 4 1 4 1
Note: showing only those characters in c that exist in t
My initial thought was that this was a case for the Find operator:
T←'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
C←'MISSISSIPPI'
X←+/¨T⍷¨⊂C
The used characters are:
(×X)/T
IMPS
Their respective frequencies are:
X~0
4 1 2 4
I've only run toy cases so I have no idea what the performance is, but my intuition tells me it should be cheaper that the outer product.
Any thoughts?

Resources