Print values and it's associated index when working with vectors / matrix in J - search

If I want to check how many values in a vector or matrix are less than a given value
I can use +/ (a < 20). But what if I wanted to know both the specific value and it's index.
Something like (2(value) 5(index)) as a table. I looked at i., i: (which give first and last position) and I. Does sorting first help?

A very common pattern in J is the creation of a mask from a filter and applying an action on and/or using the masked data in a hook or fork:
((actions) (filter)) (data)
For example:
NB. Random array
a =: ? 20 $ 10
6 3 9 0 3 3 0 6 2 9 2 4 6 8 7 4 6 1 7 1
NB. Filter and mask
f =: 5 < ]
m =: f a
1 0 1 0 0 0 0 1 0 1 0 0 1 1 1 0 1 0 1 0
NB. Values of a on m
m # a
6 9 6 9 6 8 7 6 7
NB. Indices of a on m
I. m
0 2 7 9 12 13 14 16 18
NB. Joint results
(I.m) ,: (m # a)
0 2 7 9 12 13 14 16 18
6 9 6 9 6 8 7 6 7
In other words, in this case you have m&# and f acting on a and I. acting on m. Notice that the final result can be derived from an action on m alone by commuting the arguments of copy #~:
(I. ,: (a #~ ]) m
0 2 7 9 12 13 14 16 18
6 9 6 9 6 8 7 6 7
and a can be pulled out from the action on m like so:
a ( (]I.) ,: (#~ ])) m
But since m itself is derived from an action (f) on a, we can write:
a ( (]I.) ,: (#~ ])) (f a)
which is a simple monadic hook y v (f y) → (v f) y.
Therefore:
action =: (]I.) ,: (#~ ])
filter =: 5 < ]
data =: a
(action filter) data
0 2 7 9 12 13 14 16 18
6 9 6 9 6 8 7 6 7

Related

pandas fill 0s with mean based on rows that match a condition in another column

I have a dataframe like below in which I need to replace the 0s with the mean of the rows where the parent_key matches the self_key.
Input DataFrame: df= pd.DataFrame ({'self_key':['a','b','c','d','e','e','e','f','f','f'],'parent_key':[np.nan,'a','b','b','c','c','c','d','d','d'], 'value':[0,0,0,0,4,6,14,12,8,22],'level':[1,2,3,3,4,4,4,4,4,4]})
The row 3 has self_key of 'd' so I would need to replace its 0 value in column 'value' with the mean of rows 7,8,9 to fill with the correct value of 14. Since the lower levels feed into the higher levels I would need to do it from lowest level to highest to fill out the dataframe as well but when I do the below code it doesn't work and I get the error "ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional". How can I fill in the 0s with the means from lowest level to highest?
df['value']=np.where((df['value']==0) & (df['level']==3), df['value'].groupby(df.where(df['parent_key']==df['self_key'])).transform('mean'), df['value'])
Input
self_key parent_key value level
0 a NaN 0 1
1 b a 0 2
2 c b 0 3
3 d b 0 3
4 e c 4 4
5 e c 6 4
6 e c 14 4
7 f d 12 4
8 f d 8 4
9 f d 22 4
My approach is to repeat the above code 3 times and change the level from 3 to 2 to 1, but its not working for even level 3.
Expected Ouput:
self_key parent_key value level
0 a NaN 11 1
1 b a 11 2
2 c b 8 3
3 d b 14 3
4 e c 4 4
5 e c 6 4
6 e c 14 4
7 f d 12 4
8 f d 8 4
9 f d 22 4
If I understand your problem correctly, you are trying to compute mean in a bottom-up fashion by filtering dataframe on certain keys. If so, then following should solve it:
for l in range(df["level"].max()-1, 0, -1):
df_sub = df[(df["level"] == l) & (df["value"] == 0)]
self_keys = df_sub["self_key"].tolist()
for k in self_keys:
df.loc[df_sub[df_sub["self_key"] == k].index, "value"] = df[df["parent_key"] == k]["value"].mean()
[Out]:
self_key parent_key value level
0 a 11 1
1 b a 11 2
2 c b 8 3
3 d b 14 3
4 e c 4 4
5 e c 6 4
6 e c 14 4
7 f d 12 4
8 f d 8 4
9 f d 22 4

Pandas Dataframe show Count with Group by and Aggregate

I have this data
ID Value1 Value2 Type Type2
1 3 1 A X
2 2 2 A X
3 5 3 B Y
4 2 4 B Z
5 6 8 C Z
6 7 9 C Z
7 8 0 C L
8 3 2 D M
9 4 3 D M
10 6 5 D M
11 8 7 D M
Right now i am able to generate this output using this code
pandabook.groupby(['Type','Type2'],as_index=False)['Value1', 'Value2'].agg({'Value1': 'sum','Value2': 'sum'})
ID Value 1 Value2 Type Type2
1 5 3 A X
2 5 3 B Y
3 2 5 B Z
4 13 17 C Z
5 8 0 C L
6 21 17 D M
I want to show the Aggregated count as well, as show in this example
How can i achieve this output ?
Add new value to dictionary with size function, remove as_index=False for prevent:
ValueError: cannot insert Type, already exists
and last rename with reset_index:
df = pandabook.groupby(['Type','Type2']).agg({'Value1': 'sum','Value2': 'sum', 'Type':'size'})
df = df.rename(columns={'Type':'Count'}).reset_index()
print (df)
Type Type2 Value1 Value2 Count
0 A X 5 3 2
1 B Y 5 3 1
2 B Z 2 4 1
3 C L 8 0 1
4 C Z 13 17 2
5 D M 21 17 4

Use # (Copy) as selection or filter on 2-d array

The J primitive Copy (#) can be used as a filter function, such as
k =: i.8
(k>3) # k
4 5 6 7
That's essentially
0 0 0 0 1 1 1 1 # i.8
The question is if the right-hand side of # is 2-d or higher rank shaped array, how to make a selection using #, if possible. For example:
k =: 2 4 $ i.8
(k > 3) # k
I got length error
What is the right way to make such a selection?
You can use the appropriate verb rank to get something like a 2d-selection:
(2 | k) #"1 1 k
1 3
5 7
but the requested axes have to be filled with 0s (or !.) to keep the correct shape:
(k > 3) #("1 1) k
0 0 0 0
4 5 6 7
(k > 2) #("1 1) k
3 0 0 0
4 5 6 7
You have to better define select for dimensions > 1 because now you have a structure. How do you discard values? Do you keep empty "cells"? Do you replace with 0s? Is structure important for the result?
If, for example, you only need the "values where" then just ravel , the array:
(,k > 2) # ,k
3 4 5 6 7
If you need to "replace where", then you can use amend }:
u =: 5 :'I. , 5 > y' NB. indices where 5 > y
0 u } k
0 0 0 0
0 5 6 7
z =: 3 2 4 $ i.25
u =: 4 :'I. , (5 > y) +. (0 = 3|y)' NB. indices where 5>y or 3 divides y
_999 u } z
_999 _999 _999 _999
_999 5 _999 7
8 _999 10 11
_999 13 14 _999
16 17 _999 19
20 _999 22 23

Avoiding a repeated verb name in a train

Consider a dyadic verb g, defined in terms of a dyadic verb f:
g=. [ f&.|: f
Is it possible to rewrite g so that the f term appears only once, but the behavior is unchanged?
UPDATE: Local Context
This question came up as part of my solution to this problem, which "expanding" a matrix in both directions like so:
Original Matrix
1 2 3
4 5 6
7 8 9
Expanded Matrix
1 1 1 1 2 3 3 3 3
1 1 1 1 2 3 3 3 3
1 1 1 1 2 3 3 3 3
1 1 1 1 2 3 3 3 3
4 4 4 4 5 6 6 6 6
7 7 7 7 8 9 9 9 9
7 7 7 7 8 9 9 9 9
7 7 7 7 8 9 9 9 9
7 7 7 7 8 9 9 9 9
My solution was to expand the matrix rows first using:
f=. ([ # ,:#{.#]) , ] , [ # ,:#{:#]
And then to apply that same solution under the transpose to expand the columns of the already row-expanded matrix:
3 ([ f&.|: f) m
And I noticed that it wasn't possible to write my solution with making the temporary verb f, or repeating its definition inline...
Try it online!
Knowing the context helps. You can also approach this using (|:#f)^:(+: x) y. A tacit (and golfed) solution would be 0&(|:{.,],{:)~+:.
(>: i. 3 3) (0&(|:{.,],{:)~+:) 2
1 1 1 2 3 3 3
1 1 1 2 3 3 3
1 1 1 2 3 3 3
4 4 4 5 6 6 6
7 7 7 8 9 9 9
7 7 7 8 9 9 9
7 7 7 8 9 9 9
I don't think it is possible. The right tine is going to be the result of x f y and the left tine is x The middle tine will transpose and apply f to the arguments and then transpose the result back. If you take the right f out then there is not a way to have x f y and if the middle f is removed then you do not have f applied to the transpose.
My guess is that you are looking for a primitive that will accomplish the same result with only one mention of f, but I don't know of one.
Knowing the J community someone will prove me wrong!

Understanding J From

In J:
a =: 2 3 $ 1 2 3 4 5 6
Gives:
1 2 3
4 5 6
Which is a 2 3 shaped array.
If I do:
0 1 { a
I (noting that 0 1 is a 2 shaped list) expected to have back:
1 2 3 4 5 6
But got the following instead:
1 2 3
4 5 6
Reading the documentation I was expecting the shape of the index to kinda govern the shape of the answer.
Can someone clarify what I am missing here?
Higher-dimensional arrays may help make this clear. An array with n dimensions has items with n-1 dimensions. When you select an item from ({) a three-dimensional array, your result is a two-dimensional array:
1 { i. 5 3 4
12 13 14 15
16 17 18 19
20 21 22 23
When you select multiple items from an array, the items are assembled into a new array, using each atom of x to select a item of y. This might be where you picked up the idea that the shape of x affects the shape of the result.
2 1 0 2 { 'set'
test
$ 2 1 0 2
4
$ 'test'
4
The dimensions of the result is equal to the dimensions of x plus the dimensions of the items of y. So, if you have a two-dimensional x taking two-dimensional items from a three-dimensional y, you will have a four-dimensional result:
(2 2 $ 1 1 0 1) { i. 5 3 4
12 13 14 15
16 17 18 19
20 21 22 23
12 13 14 15
16 17 18 19
20 21 22 23
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
$ (2 2 $ 1 1 0 1) { i. 5 3 4
2 2 3 4
One final note: the monadic Ravel (,) will reduce the result to a list (one-dimensional array).
, 0 1 { 2 3 $ 1 2 3 4 5 6
1 2 3 4 5 6
, i. 2 2 2 2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
From ({) selects the items of a noun. For 2 3 $ 1 2 3 4 5 6 the items are the two rows because items are the components that make up the noun.
[ a=. 2 3 $ 1 2 3 4 5
1 2 3
4 5 1
0 { a
1 2 3
If you just had 1 2 3 then the items would be the individual atoms.
[ b=. 1 2 3
1 2 3
0 { b
1
If you used 1 3 $ 1 2 3 then there is only one item and the result would be
[ c=. 1 3 $ 1 2 3
1 2 3
0 { c
1 2 3
The number of items can be found with Tally (#), and is the lead dimension of the Shape ($) of the noun.
$ a
2 3
$ b
3
$ c
1 3
# a
2
# b
3
# c
1

Resources