What exactly is #^:_1 - j

So I came across this bit of code on the j website:
mask #!.fill^:_1 lst
where mask is a bit list.
Makes sense as far as it goes. The result is the obverse of mask&#, applied to lst, with the unknown values replaced with fill.
However, it doesn't seem to generalize:
2 2 (#!._^:_1) 3 3 4 4
yields a domain error, rather than "3 4", as you might expect.
What exactly is #^:_1, and why isn't it a proper obverse of #?

I believe (#!._^:_1) spreads out the right argument by either taking the indexed value if the position has a one or filling in with the fill value if it is a zero.
(1 1 0 1 0 1) (#!._^:_1) 3 3 4 4
3 3 _ 4 _ 4
It doesn't generalize completely because values other 1 or 0 will result in the domain error that you see. See case 6 on this dictionary page. http://www.jsoftware.com/help/dictionary/d202n.htm
You might also look at the way that complex numbers interact with the standard (non-obverse) version of #, as this seems more generalizable.
2j1 #!._ 3 3 4 4
3 3 _ 3 3 _ 4 4 _ 4 4 _
2j1 1j2 3j0 1j1 #!._ 3 3 4 4
3 3 _ 3 _ _ 4 4 4 4 _
In this case the real component of the complex argument mjn makes m copies of the corresponding right item and the imaginary component inserts n fill values.
http://www.jsoftware.com/help/dictionary/d400.htm

Related

Use a split function in every row of one column of a data frame

I have a rather big pandas data frame (more than 1 million rows) with columns containing either strings or numbers. Now I would like to split the strings in one column before the expression "is applied".
An example to explain what I mean:
What I have:
a b description
2 4 method A is applied
10 5 titration is applied
3 1 computation is applied
What I am looking for:
a b description
2 4 method A
10 5 titration
3 1 computation
I tried the following,
df.description = df.description.str.split('is applied')[0]
But this didn't bring the desired result.
Any ideas how to do it? :-)
You are close, need str[0]:
df.description = df.description.str.split(' is applied').str[0]
Alternative solution:
df.description = df.description.str.extract('(.*)\s+is applied')
print (df)
a b description
0 2 4 method A
1 10 5 titration
2 3 1 computation
But for better performance use list comprehension:
df.description = [x.split(' is applied')[0] for x in df.description]
you can use replace
df.description = df.description.str.replace(' is applied','')
df
a b description
0 2 4 method A
1 10 5 titration
2 3 1 computation

How can the AVERAGEIFS function be translated into MATLAB?

I am working at moving my data over from Excel to Matlab. I have some data that I want to average based on multiple criteria. I can accomplish this by looping, but want to do it with matrix operations only, if possible.
Thus far, I have managed to do so with a single criterion, using accumarray as follows:
data=[
1 3
1 3
1 3
2 3
2 6
2 9];
accumarray(data(:,1),data(:,2))./accumarray(data(:,1),1);
Which returns:
3
6
Corresponding to the averages of items 1 and 2, respectively. I have at least three other columns that I need to include in this averaging but don't know how I can add that in. Any help is much appreciated.
For your single column, you don't need to call accumarray twice, you can provide a function handle to mean as the fourth input
mu = accumarray(data(:,1), data(:,2), [], #mean);
For multiple columns, you can use the row indices as the second input to accumarray and then use those from within the anonymous function to access the rows of your data to operate on.
data = [1 3 5
1 3 10
1 3 8
2 3 7
2 6 9
2 9 12];
tmp = accumarray(data(:,1), 1:size(data, 1), [], #(rows){mean(data(rows,2:end), 1)});
means = cat(1, tmp{:});
% 3.0000 7.6667
% 6.0000 9.3333

Understanding solution to online test

The question is in the following link:
http://www.spoj.com/problems/AEROLITE/
Input:
1 1 1 1
0 0 6 3
1 1 1 2
[and 7 test cases more]
Output:
6
57
8
[and 7 test cases more]
How does the output come from the input?
Consider the outputs corresponding to the following letters:
a. 1 1 1 1 = 6
b. 0 0 6 3 = 57
c. 1 1 1 2 = 8
Restating the definitions from the problem in a more tactical way, the 4 inputs correspond to the following:
The number of "{}" pairs
The number of "[]" pairs
The number of "()" pairs
The max depth when generating the output
The output is a single number representing the number of regular expressions that match the input parameters (how much depth can be used with the pairs) and how many combinations of the 3 pairs can be generating matching the prioritization rules that "()" cannot contain "{}" or "[]" and "[]" cannot contain "{}".
The walkthrough below shows how to arrive at the outputs, but it doesn't try to break the sub-problems or anything down. Hopefully, it will at least help you connect the numbers and start to find the problems to break down.
Taking those examples explicitly, start with "a" for 1 1 1 1 = 6:
The inputs mean that only do a depth of 1 and use 1 pair each of "{}", "[]", "()". This is a permutation how many arrangements of 3 can be made as permutations, so 3! = 6.
Actual: {}, {}()[], []{}(), {}, (){}[], ()[]{}
Then go to "b" for 1 1 1 2 = 8
This is just like "a" with exception that we must now allow for another level of depth (d = 2 instead of 1)
Therefore, this is 6 from "a" + any additional combinations of depth = 2
** Additional = {[()]}, {} (only 2 additional cases meet the rules)
"a" + (additional for d = 2) = 8
Finally, consider "b" where we are exploring only the d = 3 of 6 "()".
We must break down and add the depth (d) of 1, 2, and 3
Because only parenthesis exist here, this is just a Catalan number Cn where n = 6, but limited to a depth of no more than 3 levels of parenthesis (For more on this: https://en.wikipedia.org/wiki/Catalan_number) C(6) = 132, but once you exclude all the Catalan numbers for depths more than 3, you are left with 57 matches.
Alternatively and much more tediously, you can iterate over all the combinations of parenthesis that are depth of 3 or less to get to 57 records:
** Start with d = 1, so just ()()()()()()
** Then d = 2, so examples like (())()()()(), ()(())()()(), ()()(())()(), ()()()(())(), ()()()()(()), and so on
** Then d = 3, so examples like ((()))()()(), ()((()))()(), ()()((()))(), ()()()((())), and so on

rearranging data in excel

I'm not sure how to ask this question without illustrating it, so here goes:
I have results from a test which has tested peoples attitudes (scores 1-5) to different voices on a 16 different scales. The data set looks like this (where P1,P2,P3 are participants, A, B, C are voices)
Aformal Apleasant Acool Bformal etc
P1 2 3 1 4
P2 5 4 2 4
P3 1 2 4 3
However, I want to rearrange my data to look like this:
formal pleasant cool
P1A 3 3 5
P1B 2 1 6
P1C etc
P1D
This would mean a lot more rows (multiple rows per participant), and a lot fewer columns. Is it doable without having to manually reenter all the scores in a new excel file?
Sure, no problem. I just hacked this solution:
L M N O P Q
person# voice# formal pleasant cool
1 1 P1A 2 3 1
1 2 P1B 4 5 2
1 3 P1C 9 9 9
2 1 P2A 5 4 2
2 2 P2B 4 4 1
2 3 P2C 9 9 9
3 1 P3A 1 2 4
3 2 P3B 3 3 2
3 3 P3C 9 9 9
Basically, in columns L and M, I made two columns with index numbers. Voice numbers go from 1 to 3 and repeat every 3 rows because there are nv=3 voices (increase this if you have voices F, G, H...). Person numbers are also repeated for 3 rows each, because there are nv=3 voices.
To make the row headers (P1A, etc.), I used this formula: ="P" & L2 & CHAR(64+M2) at P1A and copied down.
To make the new table of values, I used this formula: =OFFSET(B$2,$L2-1,($M2-1)*3) at P1A-formal, and copied down and across. Note that B$2 corresponds to the cell address for P1-Aformal in the original table of values (your example).
I've used this indexing trick I don't know how many times to quickly rearrange tables of data inherited from other people.
EDIT: Note that the index columns are also made (semi)automatically using a formula. The first nv=3 rows are made manually, and then subsequent rows refer to them. For example, the formula in L5 is =L2+1 and the formula in M5 is =M2. Both are copied down to the end of the table.

How to filter a list in J?

I'm currently learning the fascinating J programming language, but one thing I have not been able to figure out is how to filter a list.
Suppose I have the arbitrary list 3 2 2 7 7 2 9 and I want to remove the 2s but leave everything else unchanged, i.e., my result would be 3 7 7 9. How on earth do I do this?
The short answer
2 (~: # ]) 3 2 2 7 7 2 9
3 7 7 9
The long answer
I have the answer for you, but before you should get familiar with some details. Here we go.
Monads, dyads
There are two types of verbs in J: monads and dyads. The former accept only one parameter, the latter accept two parameters.
For example passing a sole argument to a monadic verb #, called tally, counts the number of elements in the list:
# 3 2 2 7 7 2 9
7
A verb #, which accepts two arguments (left and right), is called copy, it is dyadic and is used to copy elements from the right list as many times as specified by the respective elements in the left list (there may be a sole element in the list also):
0 0 0 3 0 0 0 # 3 2 2 7 7 2 9
7 7 7
Fork
There's a notion of fork in J, which is a series of 3 verbs applied to their arguments, dyadically or monadically.
Here's the diagram of a kind of fork I used in the first snippet:
x (F G H) y
G
/ \
F H
/ \ / \
x y x y
It describes the order in which verbs are applied to their arguments. Thus these applications occur:
2 ~: 3 2 2 7 7 2 9
1 0 0 1 1 0 1
The ~: (not equal) is dyadic in this example and results in a list of boolean values which are true when an argument doesn't equal 2. This was the F application according to diagram.
The next application is H:
2 ] 3 2 2 7 7 2 9
3 2 2 7 7 2 9
] (identity) can be a monad or a dyad, but it always returns the right argument passed to a verb (there's an opposite verb, [ which returns.. Yes, the left argument! :)
So far, so good. F and H after application returned these values accordingly:
1 0 0 1 1 0 1
3 2 2 7 7 2 9
The only step to perform is the G verb application.
As I noted earlier, the verb #, which is dyadic (accepts two arguments), allows us to duplicate the items from the right argument as many times as specified in the respective positions in the left argument. Hence:
1 0 0 1 1 0 1 # 3 2 2 7 7 2 9
3 7 7 9
We've just got the list filtered out of 2s.
Reference
Slightly different kind of fork, hook and other primitves (including abovementioned ones) are described in these two documents:
A Brief J Reference (175 KiB)
Easy-J. An Introduction to the World's most Remarkable Programming Language (302 KiB)
Other useful sources of information are the Jsoftware site with their wiki and a few mail list archives in internets.
Just to be sure it's clear, the direct way - to answer the original question - is this:
3 2 2 7 7 2 9 -. 2
This returns
3 7 7 9
The more elaborate method - generating the boolean and using it to compress the vector - is more APLish.
To answer the other question in the very long post, to return the first element and the number of times it occurs, is simply this:
({. , {. +/ .= ]) 1 4 1 4 2 1 3 5
1 3
This is a fork using "{." to get the first item, "{. +/ . = ]" to add up the number of times the first item equals each element, and "," as the middle verb to concatenate these two parts.
Also:
2 ( -. ~ ]) 3 2 2 7 7 2 9
3 7 7 9
There are a million ways to do this - it bothers me, vaguely, that these these things don't evaluate strictly right to left, I'm an old APL programmer and I think of things as right to left even when they ain't.
If it were a thing that I was going to put into a program where I wanted to pull out some number and the number was a constant, I would do the following:
(#~ 2&~:) 1 3 2 4 2 5
1 3 4 5
This is a hook sort of thing, I think. The right half of the expression generates the truth vector regarding which are not 2, and then the octothorpe on the left has its arguments swapped so that the truth vector is the left argument to copy and the vector is the right argument. I am not sure that a hook is faster or slower than a fork with an argument copy.
+/3<+/"1(=2&{"1)/:~S:_1{;/5 6$1+i.6
156
This above program answers the question, "For all possible combinations of Yatzee dice, how many have 4 or 5 matching numbers in one roll?" It generates all the permutations, in boxes, sorts each box individually, unboxing them as a side effect, and extracts column 2, comparing the box against their own column 2, in the only successful fork or hook I've ever managed to write. The theory is that if there is a number that appears in a list of 5, three or more times, if you sort the list the middle number will be the number that appears with the greatest frequency. I have attempted a number of other hooks and/or forks and every one has failed because there is something I just do not get. Anyway that truth table is reduced to a vector, and now we know exactly how many times each group of 5 dice matched the median number. Finally, that number is compared to 3, and the number of successful compares (greater than 3, that is, 4 or 5) are counted.
This program answers the question, "For all possible 8 digit numbers made from the symbols 1 through 5, with repetition, how many are divisible by 4?"
I know that you need only determine how many within the first 25 are divisible by 4 and multiply, but the program runs more or less instantly. At one point I had a much more complex version of this program that generated the numbers in base 5 so that individual digits were between 0 and 4, added 1 to the numbers thus generated, and then put them into base 10. That was something like 1+(8$5)#:i.5^8
+/0=4|,(8$10)#. >{ ;/ 8 5$1+i.5
78125
As long as I have solely verb trains and selection, I don't have a problem. When I start having to repeat my argument within the verb so that I'm forced to use forks and hooks I start to get lost.
For example, here is something I can't get to work.
((1&{~+/)*./\(=1&{))1 1 1 3 2 4 1
I always get Index Error.
The point is to output two numbers, one that is the same as the first number in the list, the second which is the same as the number of times that number is repeated.
So this much works:
*./\(=1&{)1 1 1 3 2 4 1
1 1 1 0 0 0 0
I compare the first number against the rest of the list. Then I do an insertion of an and compression - and this gives me a 1 so long as I have an unbroken string of 1's, once it breaks the and fails and the zeros come forth.
I thought that I could then add another set of parens, get the lead element from the list again, and somehow record those numbers, the eventual idea would be to have another stage where I apply the inverse of the vector to the original list, and then use $: to get back for a recursive application of the same verb. Sort of like the quicksort example, which I thought I sort of understood, but I guess I don't.
But I can't even get close. I will ask this as a separate question so that people get proper credit for answering.

Resources