What is the most efficient way to insert a number into an ordered array? - j

Inserting numbers into a sorted array is a very common need.
Mathematica stack exchange for reference. Its priority queue appears extremely fast.
https://mathematica.stackexchange.com/questions/224249/best-way-to-insert-element-into-an-ordered-list-at-the-correct-position
For J:
Using the built in sort /:~y where y is the sorted array -
examples: /:~ y,45.3 or /:~ y,3 6 67.7
would be the first choice.
Quicksort and bubblesort are much slower.
For a single insert rotating the number into the sorted list -
insert=: 4 : 0
NB. insert x into ordered array y
where=: x (< i. 1:) y
z=: (($y)-where)|.(where|.y),x
)
is around 2.5 times faster on y=.i.10000000 than /:~y
anything faster?

Binary search I. is the best you can do to find an element in a sorted array.
[;.0 is the best you can do to extract a subarray.
These are very fast on their own:
y =: i.1e7
100 timespacex'y I. 35'
2.94e_6 896
100 timespacex'(35 ,: _) ];.0 y'
4.85e_6 1856
The bottleneck seems to be appending the new element thus copying the list to make the new list.
100 timespacex'35,~(35 ,: _) ];.0 y'
0.0368233 1.3422e8
insert=: 4 : 0
NB. insert x into ordered array y
where=: x (< i. 1:) y
z=: (($y)-where)|.(where|.y),x
)
insert2=: 4 : 0
where=: y I. x
NB. z =: where ({.,x,}.) y
z =: (where ];.0 y),x,((where ,: _) ];.0 y)
)
y =: i.1e7
(35 insert y) -: (35 insert2 y)
1
100 timespacex'35 insert y'
0.0745025 2.68437e8
100 timespacex'35 insert2 y'
0.0738203 2.68437e8
Other, more complicated, ways may offer better results, depending on what you need and what kind of input you have.

Related

Sum of arrays with repeated indices

How can I add an array of numbers to another array by indices? Especially with repeated indices. Like that
x
1 2 3 4
idx
0 1 0
y
5 6 7
] x add idx;y NB. (1 + 5 + 7) , (2 + 6) , 3 , 4
13 8 3 4
All nouns (x, idx, y) can be millions of items and I need to fast 'add' verb.
UPDATE
Solution (thanks to Dan Bron):
cumIdx =: 1 : 0
:
'i z' =. y
n =. ~. i
x n}~ (n{x) + i u//. z
)
(1 2 3 4) + cumIdx (0 1 0);(5 6 7)
13 8 3 4
For now, a short answer in the "get it done" mode:
data =. 1 2 3 4
idx =. 0 1 0
updat =. 5 6 7
cumIdx =: adverb define
:
n =. ~. m
y n}~ (n{y) + m +//. x
)
updat idx cumIdx data NB. 13 8 3 4
In brief:
Start by grouping the update array (in your post, y¹) where your index array has the same value, and taking the sum of each group
Accomplish this using the adverb key (/.) with sum (+/) as its verbal argument, deriving a dyadic verb whose arguments are idx on the left and the update array (your y, my updat) on the right.
Get the nub (~.) of your index array
Select these (unique) indices from your value array (your x, my data)
This will, by definition, have the same length as the cumulative sums we calculated in (1.)
Add these to the cumulative sum
Now you have your final updates to the data; updat and idx have the same length, so you just merge them into your value array using }, as you did in your code
Since we kept the update array small (never greater than its original length), this should have decent performance on larger inputs, though I haven't run any tests. The only performance drawback is the double computation of the nub of idx (once explicitly with ~. and once implicitly with /.), though since your values are integers, this should be relatively cheap; it's one of J's stronger areas, performance-wise.
¹ I realize renaming your arrays makes this answer more verbose than it needs to be. However, since you named your primary data x rather than y (which is the convention), if I had just kept your naming convention, then when I invoked cumIdx, the names of the nouns inside the definition would have the opposite meanings to the ones outside the definition, which I thought would cause greater confusion. For this reason, it's best to keep "primary data" on the right (y), and "control data" on the left (x).You might also consider constraining your use of the special names x,y,u,v,m and n to where they're already implicitly defined by invoking an explicit definition; definitely never change their nameclasses.
This approach also uses key (/.) but is a bit more simplistic in its approach.
It is likely to use more space especially for big updates than Dan Bron's.
addByIdx=: {{ (m , i.## y) +//. x,y }}
updat idx addByIdx data
13 8 3 4

Handle unknown number of columns while pandas csv read

I got the data from stdin like
x
x
x y
x
x y z
...
and I want to create pandas DataFrame based on that input
df = pd.read_csv(sys.stdin, sep='\t', header=None)
, but the problem here is that my data on third line has more values than the first one and I got
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2
The questions is: How can I handle this error when I don't know the longest elements chain(separated by \t).
The whole task can be performed in a single instruction,
without any counting of elements in each row.
I prepared such an example reading from a string, using io.StringIO:
df = pd.DataFrame([ ln.rstrip().split('\t') for ln in
io.StringIO(txt).readlines() ]).fillna('')
The list comprehension converts each source line into a list
of fragments (between Tabs).
Then this list comprehension is the data parameter to pd.DataFrame
and note that such a list of rows can contain rows of different length.
I added also fillna('') to convert each NaN into an empty string
(you are free to delete it if you wish).
To run the test, I defined the source variable as:
txt = '''x
x
x y
x
x y z
x
x y z v'''
end executed the above code, getting:
0 1 2 3
0 x
1 x
2 x y
3 x
4 x y z
5 x
6 x y z v
In the target version, replace reading from a string with reading from
stdin.

Haskell multivariable Lambda function for lists

I am confused on how this is computed.
Input: groupBy (\x y -> (x*y `mod` 3) == 0) [1,2,3,4,5,6,7,8,9]
Output: [[1],[2,3],[4],[5,6],[7],[8,9]]
First, does x and y refer to the current and the next element?
Second, is this saying that it will group the elements that equal 0 when it is modded by 3? If so, how come there are elements that are not equal to 0 when modded by 3 in the output?
Found here: http://zvon.org/other/haskell/Outputlist/groupBy_f.html
To answer your second question: We compare two elements by multiplying them and seeing if the result is divisible by 3. "So why are there elements in the output not divisible by 3?" If they aren't divisible, that doesn't filter them out (that's what filter does); rather, when the predicate fails, the element goes into a separate group. When it succeeds, the element goes into the current group.
As to your first question, this took me a little while to figure out... x and y aren't two consecutive elements; rather, y is the current element and x is the first element in the current group. (!)
1 * 2 = 2; 2 `mod` 3 = 2; 1 and 2 go in separate groups.
2 * 3 = 6; 6 `mod` 3 = 0; 2 and 3 go in the same group.
2 * 4 = 8; 8 `mod` 3 = 2; 4 gets put in a different group.
...
Notice, on that last line, we're looking at 2 and 4 — not 3 and 4, as you might reasonably expect.
First, does x and y refer to the current and the next element?
Roughly, yes.
Second, is this saying that it will group the elements that equal 0 when it is modded by 3? If so, how come there are elements that are not equal to 0 when modded by 3 in the output?
The lambda defines a relation between two integers x and y, which holds whenever the product x*y is a multiple of 3. Since 3 is prime, x must be a multiple of 3 or y must be such.
For the input [1,2,3,4,5,6,7,8,9], it is first checked whether 1 is in relation with 2. This is false, so 1 gets its own singleton group [1]. Then, we proceed we 2 and 3: now the relation holds, so 2,3 will share their group. Next, we check whether 2 and 4 are in relation: this is false. So, the group is [2,3] and not any larger. Then we proceed with 4 and 5 ...
I must confess that I do not like this example very much, since the relation is not an equivalence relation (because it is not transitive). Because of this, the exact result of groupBy is not guaranteed: the implementation might test the relation on 3,4 (true) instead of 2,4 (false), and build a group [2,3,4] instead.
Quoting from the docs:
The predicate is assumed to define an equivalence.
So, once this contract is violated, there are no guarantees on what the output of groupBy might be.
The groupBy function takes a list and returns a list of lists such that each sublist in the result contains only equal elements, based on the equality function you provide.
In this case, you are trying to find all subsets where for all sublist elements x and y, mod (x*y) 3 == 0 (and the ones where it doesn't == 0). Slightly weird, but there you go. groupBy only looks at adjacent elements. sort the list to reduce the number of duplicate sets.

column and row join a second box data

I want to column join
┌─┬─┬─┐
│1│1│2│
│2│4│4│
│3│9│6│
└─┴─┴─┘
and I'd like to put a=.1 2 3 as the fourth row, and then put b=.1 1 1 1 as the first column to the new boxed data. How can I do this easily? Do I have to ravel the whole thing and compute the dimention on my own in order to box it again?
Also, if I want the data i.8 to be 2 rows, do I have to calculate the other dimension 4(=8/2) in order to form a matrix 2 4$i.8? And then box it ;/2 4$i.8? Can I just specify one dimension, either the number of row or columns and ask automatic boxing or forming the matrix?
The answer to your question will involve learning about &. , the 'Under' conjunction, which is tremendously useful in J.
m
┌─┬─┬─┐
│1│1│2│
│2│2│4│
│3│9│6│
└─┴─┴─┘
a=. 1 2 3
b=. 1 1 1 1
So we want to add each item of a to each boxed column of m . It would be perfect if we could unbox the column using unbox(>), append the item of a to the column using append (,) and then rebox the column using box (<). This undo, act, redo cycle is exactly what Under (&.) does. It undoes both its right and left arguments ( m and a ) using the verb to its right, then applies the verb to its left, then uses the reverse of the verb to its right on the result. In practice,
m , &. > a
┌─┬─┬─┐
│1│1│2│
│2│2│4│
│3│9│6│
│1│2│3│
└─┴─┴─┘
The fact that a is unboxed when it was never boxed to begin with means that it is not changed, while m is unboxed before (,) is applied to each a . In fact this is used so often in J that &. > is assigned the name 'each'.
m , each a
┌─┬─┬─┐
│1│1│2│
│2│2│4│
│3│9│6│
│1│2│3│
└─┴─┴─┘
Prepending a boxed version of b requires first giving it an extra dimension with laminate (,:) then transposing (|:) b and finally boxing (<) the result. The step of adding the extra dimension is required because transposing swaps the indices and b start as a one-dimensional list.
(<#|:#,:b)
┌─┐
│1│
│1│
│1│
│1│
└─┘
The rest is easy as we just use append (,) to join the boxed b with (m, each a)
(<#|:#,: b) , m , each a
┌─┬─┬─┬─┐
│1│1│1│2│
│1│2│2│4│
│1│3│9│6│
│1│1│2│3│
└─┴─┴─┴─┘
Brackets around (<#|:#,: b) are necessary to force the correct order of execution.
For the second question, you can use i. n m to create a n X m array, which may help.
i. 4 2
0 1
2 3
4 5
6 7
i. 2 4
0 1 2 3
4 5 6 7
but perhaps I am misunderstanding your intentions here.
Hope this helps, bob
append a (with rank): ,"x a
You can simply append (,) a to your unboxed (>) input but you have to be careful with the append rank. You want to append each "item" of a, so you have right rank of "0". You want to apend to a 2-cell so you have a left rank of "2". Therefore, the , you need has rank "2 0. After the append, you rebox your data to a 2-cell with <"2.
<"2(>in)(,"2 0) a
┌─┬─┬─┐
│1│1│2│
│2│4│4│
│3│9│6│
│1│2│3│
└─┴─┴─┘
prepend b: b,
If your b has the right shape you prepend it with b,. The shape you seem to use is (boxed) 4 1:
b =: < 4 1$ 1
┌─┐
│1│
│1│
│1│
│1│
└─┘
b,in
┌─┬─┬─┬─┐
│1│1│1│2│
│1│2│4│4│
│1│3│9│6│
│1│1│2│3│
└─┴─┴─┴─┘

J # not working as expected

I'm just starting to try to pick up the J language, and am confused by the following:
1 2 +/#{ i.4
1 2
+/ 1 2 { i.4
3
when in the documentation for # it says: "x u#v y ↔ u x v y"
I assume I'm just mistaking one part of speech for another, but can't figure it out
also, how do I tell what type of speech a name is?
NB. u b. 0 returns the rank of u
NB. the rank of a verb determines the arguments it applies to at a time
NB. monadic + y applies to atoms; dyadic x + y applies to pairs of atoms
+ b. 0
0 0 0
NB. monadic +/ y and dyadic x +/ y apply to anything (unbounded rank)
+/ b. 0
_ _ _
NB. monadic { y applies to arrays of atoms;
NB. dyadic x { y applies to pairs of atoms and anything
{ b. 0
1 0 _
NB. u # v has the rank of v
+/#{ b. 0
1 0 _
NB. since 1 2 { i.4 returns atoms at a time, +/ works on atoms
+/"0 [ 1 2 { i.4
1 2
NB. u #: v has unbounded rank
+/#:{ b. 0
_ _ _
NB. +/ applies to all of 1 2 { i.4 at once
+/"_ [ 1 2 { i.4
3
NB. mechanical translation to tacit form
13 : '+/ x { y'
[: +/ {
Wikipedia has, in my biased opinion, a decent writeup on rank, and what it means in the context of the different parts of "speech" in J.
But to answer the original question, J's trace facility can be useful for understanding how its grammar works:
require'trace'
trace '1 2 +/#{ i.4'
This will take you step by step through the parsing process, showing the words being consumed by each production rule and the result each generates.
Ah, I think I may have figured it out, I need to use #: instead of #
1 2 +/#:{ i.4
3
which is what I wanted. Guess I'm going to have to read up some more on rank, which is the only difference between # and #:

Resources