I have a table like the following
0 1 2 3
4 5 6 7
8 9 10 11
and I want to make the following structure.
┌──────┬──┐
│0 1 2│ 3│
│4 5 6│ 7│
│8 9 10│11│
└──────┴──┘
Could anyone please help me?
And in J there is always another way!
]a=. i. 3 4
0 1 2 3
4 5 6 7
8 9 10 11
('' ;1 0 0 1) <;.1 a
┌──────┬──┐
│0 1 2│ 3│
│4 5 6│ 7│
│8 9 10│11│
└──────┴──┘
This uses the dyadic cut conjunction (;.) with the general form of x u ;. n y
y is the argument that we would like to partition, x specifies where the partitions are to be put, n is positive if we would like the frets (the partition positions) included in the result and a value of 1 means that we work from left to right, and u is the verb that we would like to apply to the partition.
One tricky point:
x is ('';1 0 0 1) because we want the entire first dimension of the array (rows) after which the 1's indicate the partition start. In this case we take all the rows and make the first partition the first 3 columns, and the final 1 makes the last partition its own column.
There is much going on in this solution, and that allows it to be used in many different ways, depending on the needs of the programmer.
The title of your question ("slicing table into two parts and box it afterwards") suggests that the example you sketch may not reflect what you want to learn.
My impression is that you think of your resulting noun as a two-axis table boxed into two sections. The main problem with that interpretation is that boxes divide their contents very thoroughly. It takes special effort to make the numbers in your second box look like they've been trimmed from the structure in the first box. Such effort is rarely worthwhile.
If it is natural to need to take the 3 7 11 and remove it as a unit from the structure in which it occurs, there is an advantage to making it a row of the table, rather than a column. A 2-axis table is always a list of 1-axis lists. If your problem is a matter of segregating items, this orientation of the atoms makes it simpler to do.
Putting this into practice, here we deal with rows instead of columns:
aa=: |:i.3 4
aa
0 4 8
1 5 9
2 6 10
3 7 11
(}: ; {:) aa
+------+------+
|0 4 8|3 7 11|
|1 5 9| |
|2 6 10| |
+------+------+
The program, in parentheses, can be read literally as "curtail link tail". This is the sort of program I'd expect from the title of your question.
Part of effective J programming is orienting the data (nouns) so that they are more readily manipulated by the programs (verbs).
Here is one way:
]a=: i. 3 4
0 1 2 3
4 5 6 7
8 9 10 11
3 ({."1 ; }."1) a
┌──────┬──┐
│0 1 2│ 3│
│4 5 6│ 7│
│8 9 10│11│
└──────┴──┘
In other words "take the first 3 items in each row of a and Link (;) with the result of dropping the first 3 items in each row of a"
Other methods and/or structures may be more appropriate depending on the exact use case.
Related
I have 2 large DataFrames with the same set of columns but different values. I need to combine the values in respective columns (A and B here, maybe be more in actual data) into single values in the same columns (see required output below). I have a quick way of implementing this using np.vectorize and df.to_numpy() but I am looking for a way to implement this strictly with pandas. Criteria here is first readability of code then time complexity.
df1 = pd.DataFrame({'A':[1,2,3,4,5], 'B':[5,4,3,2,1]})
print(df1)
A B
0 1 5
1 2 4
2 3 3
3 4 2
4 5 1
and,
df2 = pd.DataFrame({'A':[10,20,30,40,50], 'B':[50,40,30,20,10]})
print(df2)
A B
0 10 50
1 20 40
2 30 30
3 40 20
4 50 10
I have one way of doing it which is quite fast -
#This function might change into something more complex
def conc(a,b):
return str(a)+'_'+str(b)
conc_v = np.vectorize(conc)
required = pd.DataFrame(conc_v(df1.to_numpy(), df2.to_numpy()), columns=df1.columns)
print(required)
#Required Output
A B
0 1_10 5_50
1 2_20 4_40
2 3_30 3_30
3 4_40 2_20
4 5_50 1_10
Looking for an alternate way (strictly pandas) of solving this.
Criteria here is first readability of code
Another simple way is using add and radd
df1.astype(str).add(df2.astype(str).radd('-'))
A B
0 1-10 5-50
1 2-20 4-40
2 3-30 3-30
3 4-40 2-20
4 5-50 1-10
I am trying to find a way to get which place a certain value comes in in a range of other values. Or another way to look at it is I want to turn the SMALL function inside out. The SMALL function looks like this, "=SMALL(array, k)" where it looks for the k place in an array. I need the find what the k is. I hope that makes sense, but if not here is one more example.
A B C D
1 Kate 12.25 =PLACE($B$1:$B$13,B1) 4
2 Cindy 15.80 =PLACE($B$1:$B$13,B2) 10
3 Mark 11.85 =PLACE($B$1:$B$13,B3) 3
4 Thomas 12.98 =PLACE($B$1:$B$13,B4) 5
5 George 13.58 =PLACE($B$1:$B$13,B5) 7
6 Kim 14.52 =PLACE($B$1:$B$13,B6) 9
7 Tim 11.54 =PLACE($B$1:$B$13,B7) 2
8 Frank 12.99 =PLACE($B$1:$B$13,B8) 6
9 Fran 17.85 =PLACE($B$1:$B$13,B9) 11
10 Caroline 14.25 =PLACE($B$1:$B$13,B10) 8
11 Alex 19.20 =PLACE($B$1:$B$13,B11) 12
12 Lilly 25.20 =PLACE($B$1:$B$13,B12) 13
13 Peter 11.22 =PLACE($B$1:$B$13,B13) 1
Also I can easily do it with VBA, but I do not want to. I want to do it in cell and I do not want to have to sort them first.
RANK function does this, i.e. in your case this formula in row 1 copied down
=RANK(B1,B$1:B$13,1)
The 1 at the end indicates the order of ranking (ascending here). Leave that out and you get descending ranking
Unboxing or opening boxes with different sizes causes padding with 0 for numerals and a space with literals:
v=.1 4 8 ; 2 6 4 ; 6 8 4 5; 7 8 9; 6 3 7 4 9
>v
1 4 8 0 0
2 6 4 0 0
6 8 4 5 0
7 8 9 0 0
6 3 7 4 9
The fit (!.) conjunction is usually the thing to use for these things, but
>!. _1 v
Is not supported and throws a domain error.
I've got this, but with very large arrays it's not very fast:
(>./ # every y) {.!. _1 every y
Is there an efficient way to define the padding value for opening boxes?
Setting
f =: 3 :'(>./ # every y) {.!. _1 every y'
g =: _1&paddedOpen
and (in the same spirit as your f):
h =: 3 : '((>./# &> y)&($!._1))#> y'
I get the following performances for time and space:
(100&(6!:2) ,: 7!:2) &.> 'f L';'g L';'h L'
┌─────────┬─────────┬─────────┐
│ 0.045602│0.0832403│0.0388146│
│4.72538e6│1.76356e7│4.72538e6│
└─────────┴─────────┴─────────┘
where L is a large array:
L =. (<#(+i.)/)"1 ? 50000 2 $ 10
You can slightly improve f by making it terse; for example:
f =: ] {.!._1&>~ >./#:(#&>)
I don't think that there is much room for more improvements.
My guess is that doing the padding directly will be the path to efficiency, especially if the need is restricted to a specific structure of data (as perhaps suggested by your example.) This solution has not been subjected to performance analysis, but it shows one way to do the padding yourself.
Here I'm making the assumption that the task involves always going from boxed lists to a table, and that the data is always numeric. Additional assert. statements may be worth adding to qualify that the right argument is as expected.
v=.1 4 8 ; 2 6 4 ; 6 8 4 5; 7 8 9; 6 3 7 4 9 NB. example data
paddedOpen=: dyad define
assert. 0 = # $ x
Lengths=. #&> y
PadTo=. >./ Lengths
Padding=. x #~&.> PadTo - Lengths
y ,&> Padding
)
_1 paddedOpen v
1 4 8 _1 _1
2 6 4 _1 _1
6 8 4 5 _1
7 8 9 _1 _1
6 3 7 4 9
It is only important to first pad with a customized value when the default value cannot be used as an intermediary. If the default value can be used in passing, it will be faster to let the default padding occur then replace all default values with the preferred value. From the nature of your question I assume the default value has meaning in the main domain, so simple replacement won't serve.
Please leave comments informing us of the relative performance of different techniques, or at least whether one does or does not prove fast enough for your purposes.
The i. primitive produces a list of integers:
i. 10
0 1 2 3 4 5 6 7 8 9
If I want to produce several short lists in a row, I do this:
;i."0 each [ 2 3 4
0 1 0 1 2 0 1 2 3
(the result I want)
Boxing (that each) is a crutch here, because without it, i."0 produces a matrix.
i."0 [ 2 3 4
0 1 0 0
0 1 2 0
0 1 2 3
(the result I don't want)
Is there a better way to not have i."0 format the output to a matrix, but an array?
No, I believe you can't do any better than your current solution. There is no way for i."0 to return a vector.
The "0 adverb forces i. to accept scalars, and i. returns vectors. i. has no way of knowing that your input was a vector rather than a scalar. According to The J primer the result shape is the concatenation of the frame of the argument and the result.
The shortest "box-less" solution I've found so far is
(*#$"0~#&,i."0) 2 3 4
which is still longer than just using ;i. each 2 3 4
Imagine that I want to take the numbers from 1 to 3 and form a matrix such that each possible pairing is represented, e.g.,
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
Here is the monadic verb I formulated in J to do this:
($~ (-:## , 2:)) , ,"0/~ 1+i.y
Originally I had thought that ,"0/~ 1+i.y would be sufficient, but unfortunately that produces the following output:
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
In other words, its shape is 3 3 2 and I want something whose shape is 9 2. The only way I could think of to fix it is to pour all of the data into a new shape. I'm convinced there must be a more concise way to do this. Anyone know?
Reshaping your intermediate result can be simplified. Removing the topmost axis is commonly done with ,/ so in your case the completed phrase could be ,/ ,"0/~ 1+i.y
One way (which uses { as a monad in its capacity for permutation cataloguing):
>,{ 2#<1+i.y
EDIT:
Some fun to be had with this scheme:
All possible permutations:
>,{ y#<1+i.y
Configurable number in sequence:
>,{ x#<1+i.y
I realize this question is old, but there is a simpler way to do it: count to 9 in trinary, and add 1.
1 + 3 3 #: i.9
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
The 3 3 & #: gives you two digits. The general 'base 3' verb is 3 & #.^:_1.