Unboxing or opening boxes with different sizes causes padding with 0 for numerals and a space with literals:
v=.1 4 8 ; 2 6 4 ; 6 8 4 5; 7 8 9; 6 3 7 4 9
>v
1 4 8 0 0
2 6 4 0 0
6 8 4 5 0
7 8 9 0 0
6 3 7 4 9
The fit (!.) conjunction is usually the thing to use for these things, but
>!. _1 v
Is not supported and throws a domain error.
I've got this, but with very large arrays it's not very fast:
(>./ # every y) {.!. _1 every y
Is there an efficient way to define the padding value for opening boxes?
Setting
f =: 3 :'(>./ # every y) {.!. _1 every y'
g =: _1&paddedOpen
and (in the same spirit as your f):
h =: 3 : '((>./# &> y)&($!._1))#> y'
I get the following performances for time and space:
(100&(6!:2) ,: 7!:2) &.> 'f L';'g L';'h L'
┌─────────┬─────────┬─────────┐
│ 0.045602│0.0832403│0.0388146│
│4.72538e6│1.76356e7│4.72538e6│
└─────────┴─────────┴─────────┘
where L is a large array:
L =. (<#(+i.)/)"1 ? 50000 2 $ 10
You can slightly improve f by making it terse; for example:
f =: ] {.!._1&>~ >./#:(#&>)
I don't think that there is much room for more improvements.
My guess is that doing the padding directly will be the path to efficiency, especially if the need is restricted to a specific structure of data (as perhaps suggested by your example.) This solution has not been subjected to performance analysis, but it shows one way to do the padding yourself.
Here I'm making the assumption that the task involves always going from boxed lists to a table, and that the data is always numeric. Additional assert. statements may be worth adding to qualify that the right argument is as expected.
v=.1 4 8 ; 2 6 4 ; 6 8 4 5; 7 8 9; 6 3 7 4 9 NB. example data
paddedOpen=: dyad define
assert. 0 = # $ x
Lengths=. #&> y
PadTo=. >./ Lengths
Padding=. x #~&.> PadTo - Lengths
y ,&> Padding
)
_1 paddedOpen v
1 4 8 _1 _1
2 6 4 _1 _1
6 8 4 5 _1
7 8 9 _1 _1
6 3 7 4 9
It is only important to first pad with a customized value when the default value cannot be used as an intermediary. If the default value can be used in passing, it will be faster to let the default padding occur then replace all default values with the preferred value. From the nature of your question I assume the default value has meaning in the main domain, so simple replacement won't serve.
Please leave comments informing us of the relative performance of different techniques, or at least whether one does or does not prove fast enough for your purposes.
Related
The rather verbose fork I came up with is
({. , (>:#[ }. ]))
E.g.,
3 ({. , (>:#[ }. ])) 0 1 2 3 4 5
0 1 2 4 5
Works great, but is there a more idiomatic way? What is the usual way to do this in J?
Yes, the J-way is to use a 3-level boxing:
(<<<5) { i.10
0 1 2 3 4 6 7 8 9
(<<<1 3) { i.10
0 2 4 5 6 7 8 9
It's a small note in the dictionary for {:
Note that the result in the very last dyadic example, that is, (<<<_1){m , is all except the last item.
and a bit more in Learning J: Chapter 6 - Indexing: 6.2.5 Excluding Things.
Another approach is to use the monadic and dyadic forms of # (Tally and Copy). This idiom of using Copy to remove an item is something that I use frequently.
The hook (i. i.##) uses Tally (monadic #) and monadic and dyadic i. (Integers and Index of) to generate the filter string:
2 (i. i.##) 'abcde'
1 1 0 1 1
which Copy (dyadic #) uses to omit the appropriate item.
2 ((i. i.##) # ]) 0 1 2 3 4 5
0 1 3 4 5
2 ((i. i.##) # ]) 'abcde'
abde
I have a table like the following
0 1 2 3
4 5 6 7
8 9 10 11
and I want to make the following structure.
┌──────┬──┐
│0 1 2│ 3│
│4 5 6│ 7│
│8 9 10│11│
└──────┴──┘
Could anyone please help me?
And in J there is always another way!
]a=. i. 3 4
0 1 2 3
4 5 6 7
8 9 10 11
('' ;1 0 0 1) <;.1 a
┌──────┬──┐
│0 1 2│ 3│
│4 5 6│ 7│
│8 9 10│11│
└──────┴──┘
This uses the dyadic cut conjunction (;.) with the general form of x u ;. n y
y is the argument that we would like to partition, x specifies where the partitions are to be put, n is positive if we would like the frets (the partition positions) included in the result and a value of 1 means that we work from left to right, and u is the verb that we would like to apply to the partition.
One tricky point:
x is ('';1 0 0 1) because we want the entire first dimension of the array (rows) after which the 1's indicate the partition start. In this case we take all the rows and make the first partition the first 3 columns, and the final 1 makes the last partition its own column.
There is much going on in this solution, and that allows it to be used in many different ways, depending on the needs of the programmer.
The title of your question ("slicing table into two parts and box it afterwards") suggests that the example you sketch may not reflect what you want to learn.
My impression is that you think of your resulting noun as a two-axis table boxed into two sections. The main problem with that interpretation is that boxes divide their contents very thoroughly. It takes special effort to make the numbers in your second box look like they've been trimmed from the structure in the first box. Such effort is rarely worthwhile.
If it is natural to need to take the 3 7 11 and remove it as a unit from the structure in which it occurs, there is an advantage to making it a row of the table, rather than a column. A 2-axis table is always a list of 1-axis lists. If your problem is a matter of segregating items, this orientation of the atoms makes it simpler to do.
Putting this into practice, here we deal with rows instead of columns:
aa=: |:i.3 4
aa
0 4 8
1 5 9
2 6 10
3 7 11
(}: ; {:) aa
+------+------+
|0 4 8|3 7 11|
|1 5 9| |
|2 6 10| |
+------+------+
The program, in parentheses, can be read literally as "curtail link tail". This is the sort of program I'd expect from the title of your question.
Part of effective J programming is orienting the data (nouns) so that they are more readily manipulated by the programs (verbs).
Here is one way:
]a=: i. 3 4
0 1 2 3
4 5 6 7
8 9 10 11
3 ({."1 ; }."1) a
┌──────┬──┐
│0 1 2│ 3│
│4 5 6│ 7│
│8 9 10│11│
└──────┴──┘
In other words "take the first 3 items in each row of a and Link (;) with the result of dropping the first 3 items in each row of a"
Other methods and/or structures may be more appropriate depending on the exact use case.
I have n sets, each having n1,n2,n3...nN distinct members.
How do I generate n1*n2*n3...*nN possible combinations from them
e.g
[6] [4 5] [1 2 3 4]
will give
6 4 1
6 4 2
6 4 3
6 4 4
6 5 1
6 5 2
6 5 3
6 5 4
I want to do this in matlab, but a normal algorithm would also be fine
An easy solution is to simulate a sum !
Start with a list of indices 0 0 0, corresponding to the indices of your values. That leads you to the value 6 4 1 in your example.
then add 1.
You now have indices 001, so 642
and so on.
at 004, you overflow, so your indices become 010, having 6 5 1
Keep doing that, and keep a counter of the visited possibilites. There are 1 * 2 * 4 possibilities, so it's easy to know when you are done.
I think you're looking for Cartesian product of sets:
This should help:
cartprod(N1,N2,N3, ...)
http://www.mathworks.com/matlabcentral/fileexchange/5475-cartprod-cartesian-product-of-multiple-sets
There's another one here
set = {n1, n2, n3, ...}
allcomb(set{:})
The i. primitive produces a list of integers:
i. 10
0 1 2 3 4 5 6 7 8 9
If I want to produce several short lists in a row, I do this:
;i."0 each [ 2 3 4
0 1 0 1 2 0 1 2 3
(the result I want)
Boxing (that each) is a crutch here, because without it, i."0 produces a matrix.
i."0 [ 2 3 4
0 1 0 0
0 1 2 0
0 1 2 3
(the result I don't want)
Is there a better way to not have i."0 format the output to a matrix, but an array?
No, I believe you can't do any better than your current solution. There is no way for i."0 to return a vector.
The "0 adverb forces i. to accept scalars, and i. returns vectors. i. has no way of knowing that your input was a vector rather than a scalar. According to The J primer the result shape is the concatenation of the frame of the argument and the result.
The shortest "box-less" solution I've found so far is
(*#$"0~#&,i."0) 2 3 4
which is still longer than just using ;i. each 2 3 4
This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 6 years ago.
I've imported a test file and tried to make a histogram
pichman <- read.csv(file="picman.txt", header=TRUE, sep="/t")
hist <- as.numeric(pichman$WS)
However, I get different numbers from values in my dataset. Originally I thought that this because I had text, so I deleted the text:
table(pichman$WS)
ws <- pichman$WS[pichman$WS!="Down" & pichman$WS!="NoData"]
However, I am still getting very high numbers does anyone have an idea?
I suspect you are having a problem with factors. For example,
> x = factor(4:8)
> x
[1] 4 5 6 7 8
Levels: 4 5 6 7 8
> as.numeric(x)
[1] 1 2 3 4 5
> as.numeric(as.character(x))
[1] 4 5 6 7 8
Some comments:
You mention that your vector contains the characters "Down" and "NoData". What do expect/want as.numeric to do with these values?
In read.csv, try using the argument stringsAsFactors=FALSE
Are you sure it's sep="/t and not sep="\t"
Use the command head(pitchman) to check the first fews rows of your data
Also, it's very tricky to guess what your problem is when you don't provide data. A minimal working example is always preferable. For example, I can't run the command pichman <- read.csv(file="picman.txt", header=TRUE, sep="/t") since I don't have access to the data set.
As csgillespie said. stringsAsFactors is default on TRUE, which converts any text to a factor. So even after deleting the text, you still have a factor in your dataframe.
Now regarding the conversion, there's a more optimal way to do so. So I put it here as a reference :
> x <- factor(sample(4:8,10,replace=T))
> x
[1] 6 4 8 6 7 6 8 5 8 4
Levels: 4 5 6 7 8
> as.numeric(levels(x))[x]
[1] 6 4 8 6 7 6 8 5 8 4
To show it works.
The timings :
> x <- factor(sample(4:8,500000,replace=T))
> system.time(as.numeric(as.character(x)))
user system elapsed
0.11 0.00 0.11
> system.time(as.numeric(levels(x))[x])
user system elapsed
0 0 0
It's a big improvement, but not always a bottleneck. It gets important however if you have a big dataframe and a lot of columns to convert.