Use # (Copy) as selection or filter on 2-d array - j

The J primitive Copy (#) can be used as a filter function, such as
k =: i.8
(k>3) # k
4 5 6 7
That's essentially
0 0 0 0 1 1 1 1 # i.8
The question is if the right-hand side of # is 2-d or higher rank shaped array, how to make a selection using #, if possible. For example:
k =: 2 4 $ i.8
(k > 3) # k
I got length error
What is the right way to make such a selection?

You can use the appropriate verb rank to get something like a 2d-selection:
(2 | k) #"1 1 k
1 3
5 7
but the requested axes have to be filled with 0s (or !.) to keep the correct shape:
(k > 3) #("1 1) k
0 0 0 0
4 5 6 7
(k > 2) #("1 1) k
3 0 0 0
4 5 6 7
You have to better define select for dimensions > 1 because now you have a structure. How do you discard values? Do you keep empty "cells"? Do you replace with 0s? Is structure important for the result?
If, for example, you only need the "values where" then just ravel , the array:
(,k > 2) # ,k
3 4 5 6 7
If you need to "replace where", then you can use amend }:
u =: 5 :'I. , 5 > y' NB. indices where 5 > y
0 u } k
0 0 0 0
0 5 6 7
z =: 3 2 4 $ i.25
u =: 4 :'I. , (5 > y) +. (0 = 3|y)' NB. indices where 5>y or 3 divides y
_999 u } z
_999 _999 _999 _999
_999 5 _999 7
8 _999 10 11
_999 13 14 _999
16 17 _999 19
20 _999 22 23

Related

Print values and it's associated index when working with vectors / matrix in J

If I want to check how many values in a vector or matrix are less than a given value
I can use +/ (a < 20). But what if I wanted to know both the specific value and it's index.
Something like (2(value) 5(index)) as a table. I looked at i., i: (which give first and last position) and I. Does sorting first help?
A very common pattern in J is the creation of a mask from a filter and applying an action on and/or using the masked data in a hook or fork:
((actions) (filter)) (data)
For example:
NB. Random array
a =: ? 20 $ 10
6 3 9 0 3 3 0 6 2 9 2 4 6 8 7 4 6 1 7 1
NB. Filter and mask
f =: 5 < ]
m =: f a
1 0 1 0 0 0 0 1 0 1 0 0 1 1 1 0 1 0 1 0
NB. Values of a on m
m # a
6 9 6 9 6 8 7 6 7
NB. Indices of a on m
I. m
0 2 7 9 12 13 14 16 18
NB. Joint results
(I.m) ,: (m # a)
0 2 7 9 12 13 14 16 18
6 9 6 9 6 8 7 6 7
In other words, in this case you have m&# and f acting on a and I. acting on m. Notice that the final result can be derived from an action on m alone by commuting the arguments of copy #~:
(I. ,: (a #~ ]) m
0 2 7 9 12 13 14 16 18
6 9 6 9 6 8 7 6 7
and a can be pulled out from the action on m like so:
a ( (]I.) ,: (#~ ])) m
But since m itself is derived from an action (f) on a, we can write:
a ( (]I.) ,: (#~ ])) (f a)
which is a simple monadic hook y v (f y) → (v f) y.
Therefore:
action =: (]I.) ,: (#~ ])
filter =: 5 < ]
data =: a
(action filter) data
0 2 7 9 12 13 14 16 18
6 9 6 9 6 8 7 6 7

How to take mean of 3 values before flag change 0 to 1python

I have dataframe with columns A,B and flag. I want to calculate mean of 2 values before flag change from 0 to 1 , and record value when flag change from 0 to 1 and record value when flag changes from 1 to 0.
# Input dataframe
df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,8,11,1,19,20,15,16,87],
'flag':[0,0,0,0,1,1,1,0,0,0,0,0]})
# Expected output
df_out=df=pd.DataFrame({'A_mean_before_flag_change':[5.5],
'B_mean_before_flag_change':[5],
'A_value_before_change_flag':[7],
'B_value_before_change_flag':[6]})
I try to create more general solution:
df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,8,11,1,19,20,15,16,87],
'flag':[0,0,0,0,1,1,1,0,0,1,0,1]})
print (df)
A B flag
0 1 1 0
1 3 3 0
2 4 4 0
3 7 6 0
4 8 8 1
5 11 11 1
6 1 1 1
7 15 19 0
8 20 20 0
9 15 15 1
10 16 16 0
11 87 87 1
First create groups by mask for 0 with next 1 values of flag:
m1 = df['flag'].eq(0) & df['flag'].shift(-1).eq(1)
df['g'] = m1.iloc[::-1].cumsum()
print (df)
A B flag g
0 1 1 0 3
1 3 3 0 3
2 4 4 0 3
3 7 6 0 3
4 8 8 1 2
5 11 11 1 2
6 1 1 1 2
7 15 19 0 2
8 20 20 0 2
9 15 15 1 1
10 16 16 0 1
11 87 87 1 0
then filter out groups with size less like N:
N = 4
df1 = df[df['g'].map(df['g'].value_counts()).ge(N)].copy()
print (df1)
A B flag g
0 1 1 0 3
1 3 3 0 3
2 4 4 0 3
3 7 6 0 3
4 8 8 1 2
5 11 11 1 2
6 1 1 1 2
7 15 19 0 2
8 20 20 0 2
Filter last N rows:
df2 = df1.groupby('g').tail(N)
And aggregate last with mean:
d = {'mean':'_mean_before_flag_change', 'last': '_value_before_change_flag'}
df3 = df2.groupby('g')['A','B'].agg(['mean','last']).sort_index(axis=1, level=1).rename(columns=d)
df3.columns = df3.columns.map(''.join)
print (df3)
A_value_before_change_flag B_value_before_change_flag \
g
2 20 20
3 7 6
A_mean_before_flag_change B_mean_before_flag_change
g
2 11.75 12.75
3 3.75 3.50
I'm assuming that this needs to work for cases with more than one rising edge and that the consecutive values and averages get appended to the output lists:
# the first step is to extract the rising and falling edges using diff(), identify sections and length
df['flag_diff'] = df.flag.diff().fillna(0)
df['flag_sections'] = (df.flag_diff != 0).cumsum()
df['flag_sum'] = df.flag.groupby(df.flag_sections).transform('sum')
# then you can get the relevant indices by checking for the rising edges
rising_edges = df.index[df.flag_diff==1.0]
val_indices = [i-1 for i in rising_edges]
avg_indices = [(i-2,i-1) for i in rising_edges]
# and finally iterate over the relevant sections
df_out = pd.DataFrame()
df_out['A_mean_before_flag_change'] = [df.A.loc[tpl[0]:tpl[1]].mean() for tpl in avg_indices]
df_out['B_mean_before_flag_change'] = [df.B.loc[tpl[0]:tpl[1]].mean() for tpl in avg_indices]
df_out['A_value_before_change_flag'] = [df.A.loc[idx] for idx in val_indices]
df_out['B_value_before_change_flag'] = [df.B.loc[idx] for idx in val_indices]
df_out['length'] = [df.flag_sum.loc[idx] for idx in rising_edges]
df_out.index = rising_edges

Replace items in an array with J verb `I.`

Here is a simple replace for a rank-1 list using the I. verb:
y=: _3 _2 _1 1 2 3
0 (I. y<0) } y
The result is
0 0 0 1 2 3
How do I do such a replacement for a rank-2 matrix?
For example,
y2 =: 2 3 $ _3 _2 _1 1 2 3
0 (I. y2<0) } y2
I got (J806)
|index error
| 0 (I.y2<2)}y2
The reason seems to be
(I. y2 < 0)
gives
0 1 2
0 0 0
which isn't taken well by }.
The simplest answer for this problem is to use dyadic >. (Larger of) ...
0 >. y2
0 0 0
1 2 3
If you want to use a more general conditional replacement criteria, then the following form may be useful:
(0 > y2)} y2 ,: 0
0 0 0
1 2 3
If you want it as a verb then you can use the gerund form (v1`v2)} y ↔ (v1 y)} (v2 y) :
(0 > ])`(0 ,:~ ])} y2
0 0 0
1 2 3
If your question is more about scatter index replacement then that is possible too. You need to get the 2D indices of positions you want to replace, for example:
4 $. $. 0 > y2
0 0
0 1
0 2
Now box those indices and use dyadic }:
0 (<"1 (4 $. $. 0 > y2)) } y2
0 0 0
1 2 3
Again you can turn this into a verb using a gerund left argument to dyadic } (x (v0`v1`v2)} y ↔ (x v0 y) (x v1 y)} (x v2 y)) like this:
0 [`([: (<"1) 4 $. [: $. 0 > ])`]} y2
0 0 0
1 2 3
Or
100 101 102 [`([: (<"1) 4 $. [: $. 0 > ])`]} y2
100 101 102
1 2 3
To tidy this up a bit you could define getIdx as separate verb...
getIdx=: 4 $. $.
0 [`([: <"1#getIdx 0 > ])`]} y2
0 0 0
1 2 3
This is not a good solution. My original approach was to change the rank of the test so that it looks at each row separately, but that does not work in the general case (see comments below).
[y2 =: 2 3 $ _3 _2 _1 1 2 3
_3 _2 _1
1 2 3
I. y2<0
0 1 2
0 0 0
0 (I. y2<0)"1 } y2 NB. Rank of 1 applies to each row of y2
0 0 0
1 2 3

Understanding J From

In J:
a =: 2 3 $ 1 2 3 4 5 6
Gives:
1 2 3
4 5 6
Which is a 2 3 shaped array.
If I do:
0 1 { a
I (noting that 0 1 is a 2 shaped list) expected to have back:
1 2 3 4 5 6
But got the following instead:
1 2 3
4 5 6
Reading the documentation I was expecting the shape of the index to kinda govern the shape of the answer.
Can someone clarify what I am missing here?
Higher-dimensional arrays may help make this clear. An array with n dimensions has items with n-1 dimensions. When you select an item from ({) a three-dimensional array, your result is a two-dimensional array:
1 { i. 5 3 4
12 13 14 15
16 17 18 19
20 21 22 23
When you select multiple items from an array, the items are assembled into a new array, using each atom of x to select a item of y. This might be where you picked up the idea that the shape of x affects the shape of the result.
2 1 0 2 { 'set'
test
$ 2 1 0 2
4
$ 'test'
4
The dimensions of the result is equal to the dimensions of x plus the dimensions of the items of y. So, if you have a two-dimensional x taking two-dimensional items from a three-dimensional y, you will have a four-dimensional result:
(2 2 $ 1 1 0 1) { i. 5 3 4
12 13 14 15
16 17 18 19
20 21 22 23
12 13 14 15
16 17 18 19
20 21 22 23
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
$ (2 2 $ 1 1 0 1) { i. 5 3 4
2 2 3 4
One final note: the monadic Ravel (,) will reduce the result to a list (one-dimensional array).
, 0 1 { 2 3 $ 1 2 3 4 5 6
1 2 3 4 5 6
, i. 2 2 2 2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
From ({) selects the items of a noun. For 2 3 $ 1 2 3 4 5 6 the items are the two rows because items are the components that make up the noun.
[ a=. 2 3 $ 1 2 3 4 5
1 2 3
4 5 1
0 { a
1 2 3
If you just had 1 2 3 then the items would be the individual atoms.
[ b=. 1 2 3
1 2 3
0 { b
1
If you used 1 3 $ 1 2 3 then there is only one item and the result would be
[ c=. 1 3 $ 1 2 3
1 2 3
0 { c
1 2 3
The number of items can be found with Tally (#), and is the lead dimension of the Shape ($) of the noun.
$ a
2 3
$ b
3
$ c
1 3
# a
2
# b
3
# c
1

Combining pairs in a string (Matlab)

I have a string:
sup_pairs = 'BA CE DF EF AE FC GD DA CG EA AB BG'
How can I combine pairs which have the last character of 1 pair is the first character of the follow pairs into strings? And the new strings must contain all of the character 'A','B','C','D','E','F' , 'G', those characters are appeared in the sup_pairs string.
The expected output should be:
S1 = 'BAEFCGD' % because BA will be followed by AE in sup_pairs string, so we combine BAE, and so on...we continue the rule to generate S1
S2 = 'DFCEABG'
If I have AB, BC and BD, the generated strings should be both : ABC and ABD .
If there is any repeated character in the pairs like : AB BC CA CE . We will skip the second A , and we get ABCE .
This, like all good things in life, is a graph problem. Each letter is a node, and each pair is an edge.
First we must transform your string of pairs into a numeric format so we can use the letters as subscripts. I will use A=2, B=3, ..., G=8:
sup_pairs = 'BA CE DF EF AE FC GD DA CG EA AB BG';
p=strsplit(sup_pairs,' ');
m=cell2mat(p(:));
m=m-'?';
A=sparse(m(:,1),m(:,2),1);
The sparse matrix A is now the adjacency matrix (actually, more like an adjacency list) representing our pairs. If you look at the full matrix of A, it looks like this:
>> full(A)
ans =
0 0 0 0 0 0 0 0
0 0 1 0 0 1 0 0
0 1 0 0 0 0 0 1
0 0 0 0 0 1 0 1
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
As you can see, the edge BA, which translates to subscript (3,2) is equal to 1.
Now you can use your favorite implementation of Depth-first Search (DFS) to perform a traversal of the graph from your starting node of choice. Each path from the root to a leaf node represents a valid string. You then transform the path back into your letter sequence:
treepath=[3,2,6,7,4,8,5];
S1=char(treepath+'?');
Output:
S1 = BAEFCGD
Here's a recursive implementation of DFS to get you going. Normally in MATLAB you have to worry about not hitting the default limitation on recursion depth, but you're finding Hamiltonian paths here, which is NP-complete. If you ever get anywhere near the recursion limit, the computation time will be so huge that increasing the depth will be the least of your worries.
function full_paths = dft_all(A, current_path)
% A - adjacency matrix of graph
% current_path - initially just the start node (root)
% full_paths - cell array containing all paths from initial root to a leaf
n = size(A, 1); % number of nodes in graph
full_paths = cell(1,0); % return cell array
unvisited_mask = ones(1, n);
unvisited_mask(current_path) = 0; % mask off already visited nodes (path)
% multiply mask by array of nodes accessible from last node in path
unvisited_nodes = find(A(current_path(end), :) .* unvisited_mask);
% add restriction on length of paths to keep (numel == n)
if isempty(unvisited_nodes) && (numel(current_path) == n)
full_paths = {current_path}; % we've found a leaf node
return;
end
% otherwise, still more nodes to search
for node = unvisited_nodes
new_path = dft_all(A, [current_path node]); % add new node and search
if ~isempty(new_path) % if this produces a new path...
full_paths = {full_paths{1,:}, new_path{1,:}}; % add it to output
end
end
end
This is a normal Depth-first traversal except for the added condition on the length of the path in line 15:
if isempty(unvisited_nodes) && (numel(current_path) == n)
The first half of the if condition, isempty(unvisited_nodes) is standard. If you only use this part of the condition you'll get all paths from the start node to a leaf, regardless of path length. (Hence the cell array output.) The second half, (numel(current_path) == n) enforces the length of the path.
I took a shortcut here because n is the number of nodes in the adjacency matrix, which in the sample case is 8 rather than 7, the number of characters in your alphabet. But there are no edges into or out of node 1 because I was apparently planning on using a trick that I never got around to telling you about. Rather than run DFS starting from each of the nodes to get all of the paths, you can make a dummy node (in this case node 1) and create an edge from it to all of the other real nodes. Then you just call DFS once on node 1 and you get all the paths. Here's the updated adjacency matrix:
A =
0 1 1 1 1 1 1 1
0 0 1 0 0 1 0 0
0 1 0 0 0 0 0 1
0 0 0 0 0 1 0 1
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
If you don't want to use this trick, you can change the condition to n-1, or change the adjacency matrix not to include node 1. Note that if you do leave node 1 in, you need to remove it from the resulting paths.
Here's the output of the function using the updated matrix:
>> dft_all(A, 1)
ans =
{
[1,1] =
1 2 3 8 5 7 4 6
[1,2] =
1 3 2 6 7 4 8 5
[1,3] =
1 3 8 5 2 6 7 4
[1,4] =
1 3 8 5 7 4 6 2
[1,5] =
1 4 6 2 3 8 5 7
[1,6] =
1 5 7 4 6 2 3 8
[1,7] =
1 6 2 3 8 5 7 4
[1,8] =
1 6 7 4 8 5 2 3
[1,9] =
1 7 4 6 2 3 8 5
[1,10] =
1 8 5 7 4 6 2 3
}

Resources