x-.y And what about intersection? - j

x-.y includes all items of x except for those that are cells of y
But what if I want to get all items that are cells of x and of y?
I can achieve this by
x -.^:2 y
But it require running expensive operation twice.
Is there a better solution?

e. is often useful when working with sets.
x e. y
gives a list of matches:
for each item of x return 1 if it exists in the "set" y, 0 otherwise.
1 2 3 4 e. 5 9 2
0 1 0 0
Then,
x (e. # [) y
selects those elements that do exist in both lists.
1 2 3 4 (e. # [) 5 9 2
2
5 8 (e. # [) i.12
5 8

Doing -. twice is the classic way of implementing intersection in J.
The inefficiency is minor (a constant factor - and, in general, you should not concern yourself with efficiency issues in J unless they exceed a factor of 2 - when you have resource problems you're generally going to want to focus on the factor of 1000 or greater issues).
Put differently, if ([-.-.) or -.^:2 is too slow for you then -. would also be too slow for you. (This can happen on extremely large data sets where the underlying implementation has been inefficient. Recent versions of J have had some work done, to correct this issue.)
Disappointing, perhaps, but practical.

Related

TRUE/FALSE ← VLOOKUP ← Identify the ROW! of the first negative value within a column

Firstly, we have an array of predetermined factors, ie. V-Z;
their attributes are 3, the first two (•xM) multiplied giving the 3rd.
f ... factors
• ... cap, the values in the data set may increase max
m ... fixed multiplier
p ... let's call it power
This is a separate, standalone array .. we'd access with eg. VLOOKUP
f • m pwr
V 1 9 9
W 2 8 16
X 3 7 21
Y 4 6 24
Z 5 5 25
—————————————————————————————————————————————
Then we have 6 columns, in which the actual data to be processed is in, & thereof derive the next-level result, based on the interaction of both samples introduced.
In addition, there are added two columns, for balance & profit.
Here's a short, 6-row data sample:
f • m bal profit
V 2 3 377 1
Y 2 3 156 7
Y 1 1 122 0
X 1 2 -27 2
Z 3 3 223 3
—————————————————————————————————————————————
Ultimately, starting at the end, we are comparing IF -27 inverted → so 27 is within the X's power range ie. 21 (as per the first sample) .. which is then fed into a bigger formula, beyond the scope of this post.
This can be done with VLOOKUP, all fine by now.
—————————————————————————————————————————————
To get to that .. for the working example, we are focusing coincidentally on row5, since that's the one with the first negative value in the 'balance' column, so ..
on factorX = which factor exactly is to us unknown &
balance -27 = which we have to locate amongst potentially dozens to hundreds of rows.
Why!?
Once we know that the factor is X, based on the * & multiplier pertaining to it, then we also know which 'power' (top array) to compare -27, as the identified first negative value in the balance column, to.
Is that clear?
I'd like to know the formula on how to achieve that, & (get to) move on with the broader-scope work.
—————————————————————————————————————————————
The main issue for me is not knowing how to identify the first negative or row -27 pertains to, then having that piece of information how to leverage it to get the X or identify the factor type, especially since its positioned left of the latter & to the best of my knowledge I cannot use negative column index number (so, latter even if possible is out of the question anyway).
To recap;
IF(21>27) = IF(-21<-27)
27 → LOCATE ROW with the first negative number (-27)
21 → IDENTIFY the FACTOR TYPE, same row as (-27)
→ VLOOKUP pwr, based on factor type identified (top array, 4th column right)
→ invert either 21 to a negative number or (-27) to the positive number
= TRUE/FALSE
Guessing your columns I'll say your first chart is in columns A to D, and the second in columns G to K
You could find the letter of that factor with something like this:
=INDEX(G:G,XMATCH(TRUE,INDEX(J:J<0)))
INDEX(J:J<0) converts that column to TRUE and FALSE depending on being negative or not and with XMATCH you find the first TRUE. You could then use that in VLOOKUP:
=VLOOKUP(INDEX(G:G,XMATCH(TRUE,INDEX(J:J<0))),A:D,4,0)
That would return the 21. You can use the first concept too to find the the -27 and with ABS have its "positive value"
=VLOOKUP(INDEX(G:G,XMATCH(TRUE,INDEX(J:J<0))),A:D,4,0) > INDEX(J:J,XMATCH(TRUE,INDEX(J:J<0)))
That should return true or false in the comparison

Sum of arrays with repeated indices

How can I add an array of numbers to another array by indices? Especially with repeated indices. Like that
x
1 2 3 4
idx
0 1 0
y
5 6 7
] x add idx;y NB. (1 + 5 + 7) , (2 + 6) , 3 , 4
13 8 3 4
All nouns (x, idx, y) can be millions of items and I need to fast 'add' verb.
UPDATE
Solution (thanks to Dan Bron):
cumIdx =: 1 : 0
:
'i z' =. y
n =. ~. i
x n}~ (n{x) + i u//. z
)
(1 2 3 4) + cumIdx (0 1 0);(5 6 7)
13 8 3 4
For now, a short answer in the "get it done" mode:
data =. 1 2 3 4
idx =. 0 1 0
updat =. 5 6 7
cumIdx =: adverb define
:
n =. ~. m
y n}~ (n{y) + m +//. x
)
updat idx cumIdx data NB. 13 8 3 4
In brief:
Start by grouping the update array (in your post, y¹) where your index array has the same value, and taking the sum of each group
Accomplish this using the adverb key (/.) with sum (+/) as its verbal argument, deriving a dyadic verb whose arguments are idx on the left and the update array (your y, my updat) on the right.
Get the nub (~.) of your index array
Select these (unique) indices from your value array (your x, my data)
This will, by definition, have the same length as the cumulative sums we calculated in (1.)
Add these to the cumulative sum
Now you have your final updates to the data; updat and idx have the same length, so you just merge them into your value array using }, as you did in your code
Since we kept the update array small (never greater than its original length), this should have decent performance on larger inputs, though I haven't run any tests. The only performance drawback is the double computation of the nub of idx (once explicitly with ~. and once implicitly with /.), though since your values are integers, this should be relatively cheap; it's one of J's stronger areas, performance-wise.
¹ I realize renaming your arrays makes this answer more verbose than it needs to be. However, since you named your primary data x rather than y (which is the convention), if I had just kept your naming convention, then when I invoked cumIdx, the names of the nouns inside the definition would have the opposite meanings to the ones outside the definition, which I thought would cause greater confusion. For this reason, it's best to keep "primary data" on the right (y), and "control data" on the left (x).You might also consider constraining your use of the special names x,y,u,v,m and n to where they're already implicitly defined by invoking an explicit definition; definitely never change their nameclasses.
This approach also uses key (/.) but is a bit more simplistic in its approach.
It is likely to use more space especially for big updates than Dan Bron's.
addByIdx=: {{ (m , i.## y) +//. x,y }}
updat idx addByIdx data
13 8 3 4

Haskell multivariable Lambda function for lists

I am confused on how this is computed.
Input: groupBy (\x y -> (x*y `mod` 3) == 0) [1,2,3,4,5,6,7,8,9]
Output: [[1],[2,3],[4],[5,6],[7],[8,9]]
First, does x and y refer to the current and the next element?
Second, is this saying that it will group the elements that equal 0 when it is modded by 3? If so, how come there are elements that are not equal to 0 when modded by 3 in the output?
Found here: http://zvon.org/other/haskell/Outputlist/groupBy_f.html
To answer your second question: We compare two elements by multiplying them and seeing if the result is divisible by 3. "So why are there elements in the output not divisible by 3?" If they aren't divisible, that doesn't filter them out (that's what filter does); rather, when the predicate fails, the element goes into a separate group. When it succeeds, the element goes into the current group.
As to your first question, this took me a little while to figure out... x and y aren't two consecutive elements; rather, y is the current element and x is the first element in the current group. (!)
1 * 2 = 2; 2 `mod` 3 = 2; 1 and 2 go in separate groups.
2 * 3 = 6; 6 `mod` 3 = 0; 2 and 3 go in the same group.
2 * 4 = 8; 8 `mod` 3 = 2; 4 gets put in a different group.
...
Notice, on that last line, we're looking at 2 and 4 — not 3 and 4, as you might reasonably expect.
First, does x and y refer to the current and the next element?
Roughly, yes.
Second, is this saying that it will group the elements that equal 0 when it is modded by 3? If so, how come there are elements that are not equal to 0 when modded by 3 in the output?
The lambda defines a relation between two integers x and y, which holds whenever the product x*y is a multiple of 3. Since 3 is prime, x must be a multiple of 3 or y must be such.
For the input [1,2,3,4,5,6,7,8,9], it is first checked whether 1 is in relation with 2. This is false, so 1 gets its own singleton group [1]. Then, we proceed we 2 and 3: now the relation holds, so 2,3 will share their group. Next, we check whether 2 and 4 are in relation: this is false. So, the group is [2,3] and not any larger. Then we proceed with 4 and 5 ...
I must confess that I do not like this example very much, since the relation is not an equivalence relation (because it is not transitive). Because of this, the exact result of groupBy is not guaranteed: the implementation might test the relation on 3,4 (true) instead of 2,4 (false), and build a group [2,3,4] instead.
Quoting from the docs:
The predicate is assumed to define an equivalence.
So, once this contract is violated, there are no guarantees on what the output of groupBy might be.
The groupBy function takes a list and returns a list of lists such that each sublist in the result contains only equal elements, based on the equality function you provide.
In this case, you are trying to find all subsets where for all sublist elements x and y, mod (x*y) 3 == 0 (and the ones where it doesn't == 0). Slightly weird, but there you go. groupBy only looks at adjacent elements. sort the list to reduce the number of duplicate sets.

Need Hint for ProjectEuler Problem

What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?
I could easily brute force the solution in an imperative programming language with loops. But I want to do this in Haskell and not having loops makes it much harder. I was thinking of doing something like this:
[n | n <- [1..], d <- [1..20], n `mod` d == 0] !! 0
But I know that won't work because "d" will make the condition equal True at d = 1. I need a hint on how to make it so that n mod d is calculated for [1..20] and can be verified for all 20 numbers.
Again, please don't give me a solution. Thanks.
As with many of the Project Euler problems, this is at least as much about math as it is about programming.
What you're looking for is the least common multiple of a set of numbers, which happen to be in a sequence starting at 1.
A likely tactic in a functional language is trying to make it recursive based on figuring out the relation between the smallest number divisible by all of [1..n] and the smallest number divisible by all of [1..n+1]. Play with this with some smaller numbers than 20 and try to understand the mathematical relation or perhaps discern a pattern.
Instead of a search until you find such a number, consider instead a constructive algorithm, where, given a set of numbers, you construct the smallest (or least) positive number that is evenly divisible by (aka "is a common multiple of") all those numbers. Look at the algorithms there, and consider how Euclid's algorithm (which they mention) might apply.
Can you think of any relationship between two numbers in terms of their greatest common divisor and their least common multiple? How about among a set of numbers?
If you look at it, it seems to be a list filtering operation. List of infinite numbers, to be filtered based on case the whether number is divisible by all numbers from 1 to 20.
So what we got is we need a function which takes a integer and a list of integer and tells whether it is divisible by all those numbers in the list
isDivisible :: [Int] -> Int -> Bool
and then use this in List filter as
filter (isDivisible [1..20]) [1..]
Now as Haskell is a lazy language, you just need to take the required number of items (in your case you need just one hence List.head method sounds good) from the above filter result.
I hope this helps you. This is a simple solution and there will be many other single line solutions for this too :)
Alternative answer: You can just take advantage of the lcm function provided in the Prelude.
For efficiently solving this, go with Don Roby's answer. If you just want a little hint on the brute force approach, translate what you wrote back into english and see how it differs from the problem description.
You wrote something like "filter the product of the positive naturals and the positive naturals from 1 to 20"
what you want is more like "filter the positive naturals by some function of the positive naturals from 1 to 20"
You have to get Mathy in this case. You are gonna do a foldl through [1..20], starting with an accumulator n = 1. For each number p of that list, you only proceed if p is a prime. Now for the previous prime p, you want to find the largest integer q such that p^q <= 20. Multiply n *= (p^q). Once the foldl finishes, n is the number you want.
A possible brute force implementation would be
head [n|n <- [1..], all ((==0).(n `mod`)) [1..20]]
but in this case it would take way too long. The all function tests if a predicate holds for all elements of a list. The lambda is short for (\d -> mod n d == 0).
So how could you speed up the calculation? Let's factorize our divisors in prime factors, and search for the highest power of every prime factor:
2 = 2
3 = 3
4 = 2^2
5 = 5
6 = 2 * 3
7 = 7
8 = 2^3
9 = 3^2
10 = 2 * 5
11 = 11
12 = 2^2*3
13 = 13
14 = 2 *7
15 = 3 * 5
16 = 2^4
17 = 17
18 = 2 * 3^2
19 = 19
20 = 2^2 * 5
--------------------------------
max= 2^4*3^2*5*7*11*13*17*19
Using this number we have:
all ((==0).(2^4*3^2*5*7*11*13*17*19 `mod`)) [1..20]
--True
Hey, it is divisible by all numbers from 1 to 20. Not very surprising. E.g. it is divisible by 15 because it "contains" the factors 3 and 5, and it is divisible by 16, because it "contains" the factor 2^4. But is it the smallest possible number? Think about it...

How to filter a list in J?

I'm currently learning the fascinating J programming language, but one thing I have not been able to figure out is how to filter a list.
Suppose I have the arbitrary list 3 2 2 7 7 2 9 and I want to remove the 2s but leave everything else unchanged, i.e., my result would be 3 7 7 9. How on earth do I do this?
The short answer
2 (~: # ]) 3 2 2 7 7 2 9
3 7 7 9
The long answer
I have the answer for you, but before you should get familiar with some details. Here we go.
Monads, dyads
There are two types of verbs in J: monads and dyads. The former accept only one parameter, the latter accept two parameters.
For example passing a sole argument to a monadic verb #, called tally, counts the number of elements in the list:
# 3 2 2 7 7 2 9
7
A verb #, which accepts two arguments (left and right), is called copy, it is dyadic and is used to copy elements from the right list as many times as specified by the respective elements in the left list (there may be a sole element in the list also):
0 0 0 3 0 0 0 # 3 2 2 7 7 2 9
7 7 7
Fork
There's a notion of fork in J, which is a series of 3 verbs applied to their arguments, dyadically or monadically.
Here's the diagram of a kind of fork I used in the first snippet:
x (F G H) y
G
/ \
F H
/ \ / \
x y x y
It describes the order in which verbs are applied to their arguments. Thus these applications occur:
2 ~: 3 2 2 7 7 2 9
1 0 0 1 1 0 1
The ~: (not equal) is dyadic in this example and results in a list of boolean values which are true when an argument doesn't equal 2. This was the F application according to diagram.
The next application is H:
2 ] 3 2 2 7 7 2 9
3 2 2 7 7 2 9
] (identity) can be a monad or a dyad, but it always returns the right argument passed to a verb (there's an opposite verb, [ which returns.. Yes, the left argument! :)
So far, so good. F and H after application returned these values accordingly:
1 0 0 1 1 0 1
3 2 2 7 7 2 9
The only step to perform is the G verb application.
As I noted earlier, the verb #, which is dyadic (accepts two arguments), allows us to duplicate the items from the right argument as many times as specified in the respective positions in the left argument. Hence:
1 0 0 1 1 0 1 # 3 2 2 7 7 2 9
3 7 7 9
We've just got the list filtered out of 2s.
Reference
Slightly different kind of fork, hook and other primitves (including abovementioned ones) are described in these two documents:
A Brief J Reference (175 KiB)
Easy-J. An Introduction to the World's most Remarkable Programming Language (302 KiB)
Other useful sources of information are the Jsoftware site with their wiki and a few mail list archives in internets.
Just to be sure it's clear, the direct way - to answer the original question - is this:
3 2 2 7 7 2 9 -. 2
This returns
3 7 7 9
The more elaborate method - generating the boolean and using it to compress the vector - is more APLish.
To answer the other question in the very long post, to return the first element and the number of times it occurs, is simply this:
({. , {. +/ .= ]) 1 4 1 4 2 1 3 5
1 3
This is a fork using "{." to get the first item, "{. +/ . = ]" to add up the number of times the first item equals each element, and "," as the middle verb to concatenate these two parts.
Also:
2 ( -. ~ ]) 3 2 2 7 7 2 9
3 7 7 9
There are a million ways to do this - it bothers me, vaguely, that these these things don't evaluate strictly right to left, I'm an old APL programmer and I think of things as right to left even when they ain't.
If it were a thing that I was going to put into a program where I wanted to pull out some number and the number was a constant, I would do the following:
(#~ 2&~:) 1 3 2 4 2 5
1 3 4 5
This is a hook sort of thing, I think. The right half of the expression generates the truth vector regarding which are not 2, and then the octothorpe on the left has its arguments swapped so that the truth vector is the left argument to copy and the vector is the right argument. I am not sure that a hook is faster or slower than a fork with an argument copy.
+/3<+/"1(=2&{"1)/:~S:_1{;/5 6$1+i.6
156
This above program answers the question, "For all possible combinations of Yatzee dice, how many have 4 or 5 matching numbers in one roll?" It generates all the permutations, in boxes, sorts each box individually, unboxing them as a side effect, and extracts column 2, comparing the box against their own column 2, in the only successful fork or hook I've ever managed to write. The theory is that if there is a number that appears in a list of 5, three or more times, if you sort the list the middle number will be the number that appears with the greatest frequency. I have attempted a number of other hooks and/or forks and every one has failed because there is something I just do not get. Anyway that truth table is reduced to a vector, and now we know exactly how many times each group of 5 dice matched the median number. Finally, that number is compared to 3, and the number of successful compares (greater than 3, that is, 4 or 5) are counted.
This program answers the question, "For all possible 8 digit numbers made from the symbols 1 through 5, with repetition, how many are divisible by 4?"
I know that you need only determine how many within the first 25 are divisible by 4 and multiply, but the program runs more or less instantly. At one point I had a much more complex version of this program that generated the numbers in base 5 so that individual digits were between 0 and 4, added 1 to the numbers thus generated, and then put them into base 10. That was something like 1+(8$5)#:i.5^8
+/0=4|,(8$10)#. >{ ;/ 8 5$1+i.5
78125
As long as I have solely verb trains and selection, I don't have a problem. When I start having to repeat my argument within the verb so that I'm forced to use forks and hooks I start to get lost.
For example, here is something I can't get to work.
((1&{~+/)*./\(=1&{))1 1 1 3 2 4 1
I always get Index Error.
The point is to output two numbers, one that is the same as the first number in the list, the second which is the same as the number of times that number is repeated.
So this much works:
*./\(=1&{)1 1 1 3 2 4 1
1 1 1 0 0 0 0
I compare the first number against the rest of the list. Then I do an insertion of an and compression - and this gives me a 1 so long as I have an unbroken string of 1's, once it breaks the and fails and the zeros come forth.
I thought that I could then add another set of parens, get the lead element from the list again, and somehow record those numbers, the eventual idea would be to have another stage where I apply the inverse of the vector to the original list, and then use $: to get back for a recursive application of the same verb. Sort of like the quicksort example, which I thought I sort of understood, but I guess I don't.
But I can't even get close. I will ask this as a separate question so that people get proper credit for answering.

Resources