How to force grouping of monadic verbs?

How to force grouping of monadic verbs? - j

I came up with an incorrect J verb in my head, which would find the proportion of redundant letters in a string. I started with just a bunch of verbs with no precedence defined, and tried grouping inwards:
c=. 'cool' NB. The test data string, 1/4 is redundant.
box =. 5!:2 NB. The verb to show the structure of another verb in a box.
p=.%#~.%# NB. First attempt. Meant to read "inverse of (tally of unique divided by tally)".
box < 'p'
┌─┬─┬────────┐
│%│#│┌──┬─┬─┐│
│ │ ││~.│%│#││
│ │ │└──┴─┴─┘│
└─┴─┴────────┘
p2=.%(#~.%#) NB. The first tally is meant to be in there with the nub sieve, so paren everything after the inverse monad.
box < 'p2'
┌─┬────────────┐
│%│┌─┬────────┐│
│ ││#│┌──┬─┬─┐││
│ ││ ││~.│%│#│││
│ ││ │└──┴─┴─┘││
│ │└─┴────────┘│
└─┴────────────┘
p3=. %((#~.)%#) NB. The first tally is still not grouped with the nub sieve, so paren the two together directly.
box < 'p3'
┌─┬────────────┐
│%│┌──────┬─┬─┐│
│ ││┌─┬──┐│%│#││
│ │││#│~.││ │ ││
│ ││└─┴──┘│ │ ││
│ │└──────┴─┴─┘│
└─┴────────────┘
p3 c NB. Looks about right, so test it!
|length error: p3
| p3 c
(#~.)c NB. Unexpected error, but I guessed as to what part had the problem.
|length error
| (#~.)c
My question is, why did my approach to grouping fail with this length error, and how should I have grouped it to get the desired effect?
(I assume it is something to do with turning it into a hook instead of grouping, or it just not realising it needs to use the monad forms, but I don't know how to verify or get around it if so.)

Fork and compose.
(# ~.) is a hook. This is probably what you're not expecting. (# ~.) 'cool' is applying ~. to 'cool' to give you 'col'. But as it's a monadic hook, it is then attempting 'cool' # 'col', which isn't what you're intending and which gives a length error.
To get 0.25 as the ratio of redundant characters in a string, don't use the reciprocal (%). You just subtract from 1 the ratio of unique characters. This is pretty straightforward with a fork:
(1 - #&~. % #) 'cool'
0.25
p9 =. 1 - #&~. % #
box < 'p9'
┌─┬─┬──────────────┐
│1│-│┌────────┬─┬─┐│
│ │ ││┌─┬─┬──┐│%│#││
│ │ │││#│&│~.││ │ ││
│ │ ││└─┴─┴──┘│ │ ││
│ │ │└────────┴─┴─┘│
└─┴─┴──────────────┘
Compose (&) ensures that you tally (#) the nub (~.) together, so that the fork grabs it as a single verb. The fork is a series of three verbs that applies the first and third verb, and then applies the middle verb to the results. So #&~. % # is the fork, where #&~. is applied to the string, resulting in 3. # is applied, resulting in 4. And then % is applied to those results, as 3 % 4, giving you 0.75. That's our ratio of unique characters.
1 - is just to get us 0.25 instead of 0.75. % 0.75 is the same as 1 % 0.75, which gives you 1.33333.

Related

How to use a vector to cache results within a Haskell function?

I have a computationally expensive vector I want to index into inside a function, but since the table is never used anywhere else, I don't want to pass the vector around, but access the precomputed values like a memoized function.
The idea is:
cachedFunction :: Int -> Int
cachedFunction ix = table ! ix
where table = <vector creation>
One aspect I've noticed is that all memoization examples I've seen deal with recursion, where even if a table is used to memoize, values in the table depend on other values in the table. This is not in my case, where computed values are found using a trial-and-error approach but each element is independent from another.
How do I achieve the cached table in the function?

You had it almost right. The problem is, your example basically scoped like this:
┌────────────────────────────────┐
cachedFunction ix = │ table ! ix │
│where table = <vector creation> │
└────────────────────────────────┘
i.e. table is not shared between different ix. This is regardless of the fact that it happens to not depend on ix (which is obvious in this example, but not in general). Therefore it would not be useful to keep it in memory, and Haskell doesn't do it.
But you can change that by pulling the ix argument into the result with its associated where-block:
cachedFunction = \ix -> table ! ix
where table = <vector creation>
i.e.
┌────────────────────────────────┐
cachedFunction = │ \ix -> table ! ix │
│where table = <vector creation> │
└────────────────────────────────┘
or shorter,
cachedFunction = (<vector creation> !)
In this form, cachedFunction is a constant applicative form, i.e. despite having a function type it is treated by the compiler as a constant value. It's not a value you could ever evaluate to normal form, but it will keep the same table around (which can't depend on ix; it doesn't have it in scope) when using it for evaluating the lambda function inside.

According to this answer, GHC will never recompute values declared at the top-level of a module. So by moving your table up to the top-level of your module, it will be evaluated lazily (once) the first time it's ever needed, and then it will never be requested again. We can see the behavior directly with Debug.Trace (example uses a simple integer rather than a vector, for simplicity)
import Debug.Trace
cachedFunction :: Int -> Int
cachedFunction ix = table + ix
table = traceShow "Computed" 0
main :: IO ()
main = do
print 0
print $ cachedFunction 1
print $ cachedFunction 2
Outputs:
0
"Computed"
1
2
We see that table is not computed until cachedFunction is called, and it's only computed once, even though we call cachedFunction twice.

Will the native buffer owned by BytesMut keep growing?

Suppose I have a BytesMut object, and then keep writing data to its interior. Then split some frames out of it according to the frame segmentation format.
According to my idea, this memory is constantly reallocate out. So the question is, I can make its capacity smaller by constantly splitting, but at what point does it actually allocate contiguous memory to be freed?
If I keep taking data splits from the front, won't this memory usage get larger and larger? Or maybe I have a problem understanding that different BytesMut objects will have different native buffer when split, but how is this done?
#[test]
fn test_bytesmut_growth() {
use bytes::{BufMut, BytesMut};
let mut bm = BytesMut::with_capacity(16);
for i in 0..10000 {
bm.put(&b"1234567890"[..]);
let front = bm.split();
drop(front);
}
//println!("current cap={}, len={}", bm.capacity(), bm.len());
}

If you split a BytesMut object into two, the two objects will still share the underlying reference-counted buffer. Here's an attempt at a visualization, containing a few implementation details. Before splitting:
Underlying buffer, ref count 1
┌────────────────────────────────┐
│0123456789ABCDEFGHIJ │
└▲───────────────────────────────┘
│
│ first
│ ┌────────────┐
├──┤ptr │
│ │len: 20 │
│ │cap: 32 │
└──┤data │
└────────────┘
After calling let second = first.split_off(10), we will get
Underlying buffer, ref count 2
┌────────────────────────────────┐
│0123456789ABCDEFGHIJ │
└▲─────────▲─────────────────────┘
│ │
│ └──────────┐
│ first │ second
│ ┌────────────┐ │ ┌────────────┐
├──┤ptr │ └─────┤ptr │
│ │len: 10 │ │len: 10 │
│ │cap: 10 │ │cap: 22 │
├──┤data │ ┌─────┤data │
│ └────────────┘ │ └────────────┘
│ │
└────────────────────┘
Once we drop first, we will have
Underlying buffer, ref count 1
┌────────────────────────────────┐
│0123456789ABCDEFGHIJ │
└▲─────────▲─────────────────────┘
│ │
│ │
│ │ second
│ │ ┌────────────┐
│ └────────────────┤ptr │
│ │len: 10 │
│ │cap: 22 │
└──────────────────────────┤data │
└────────────┘
If you now call second.reserve(10), or call an operation that implicitly reserves, like writing more than fits in the current capacity, the BytesMut implementation can detect that second actually owns its underlying buffer, since the reference count is one. The implementation then may be able to reuse spare capacity in the buffer by moving the existing buffer contents to the beginning, so after second.reserve(20), the result could look like this:
Underlying buffer, ref count 1
┌────────────────────────────────┐
│ABCDEFGHIJ │
└▲───────────────────────────────┘
│
│ second
│ ┌────────────┐
├──┤ptr │
│ │len: 10 │
│ │cap: 32 │
└──┤data │
└────────────┘
However, the conditions for this optimization to be applied are not guaranteed. The documentation for reserve() states (emphasis mine)
Before allocating new buffer space, the function will attempt to reclaim space in the existing buffer. If the current handle references a view into a larger original buffer, and all other handles referencing part of the same original buffer have been dropped, then the current view can be copied/shifted to the front of the buffer and the handle can take ownership of the full buffer, provided that the full buffer is large enough to fit the requested additional capacity.
This optimization will only happen if shifting the data from the current view to the front of the buffer is not too expensive in terms of the (amortized) time required. The precise condition is subject to change; as of now, the length of the data being shifted needs to be at least as large as the distance that it’s shifted by. If the current view is empty and the original buffer is large enough to fit the requested additional capacity, then reallocations will never happen.
In summary, this optimization is only guaranteed if the reference count is one and the view is empty. This is the case in your example, so your code is guaranteed to reuse the buffer.

According to the documentation, BytesMut::split 'Removes the bytes from the current view, returning them in a new BytesMut handle.
Afterwards, self will be empty, but will retain any additional capacity that it had before the operation.'
This is done by creating a new BytesMut (which is then owned by front) that contains exactly the items of bm, after which bm is modified such that it contains only the remaining empty capacity. This way, BytesMut::split doesn't allocate any new memory.
You then drop the BytesMut (owned by front), making it so that there is no view into the memory owned by the backing Vec, from the start of the Vec until the start of bm. When you then put, the implementation first checks if there is enough space before the view of bm, but still inside the backing Vec, and tries to store the data there.
Because the amount of memory you put is the same as the memory 'freed' by dropping front, the implementation is able to store that data before the view of bm, and no data is allocated.

Are variables used in nested functions considered global?

This is a dumb question, so I apologise if so. This is for Julia, but I guess the question is not language specific.
There is advice in Julia that global variables should not be used in functions, but there is a case where I am not sure if a variable is global or local. I have a variable defined in a function, but is global for a nested function. For example, in the following,
a=2;
f(x)=a*x;
variable a is considered global. However, if we were to wrap this all in another function, would a still be considered global for f? For example,
function g(a)
f(x)=a*x;
end
We don't use a as an input for f, so it's global in that sense, but its still only defined in the scope of g, so is local in that sense. I am not sure. Thank you.

You can check directly that what #DNF commented indeed is the case (i.e. that the variable a is captured in a closure).
Here is the code:
julia> function g(a)
f(x)=a*x
end
g (generic function with 1 method)
julia> v = g(2)
(::var"#f#1"{Int64}) (generic function with 1 method)
julia> dump(v)
f (function of type var"#f#1"{Int64})
a: Int64 2
In this example your function g returns a function. I bind a v variable to the returned function to be able to inspect it.
If you dump the value bound to the v variable you can see that the a variable is stored in the closure.
A variable stored in a closure should not a problem for performance of your code. This is a typical pattern used e.g. when doing optimization of some function conditional on some parameter (captured in a closure).
As you can see in this code:
julia> #code_warntype v(10)
MethodInstance for (::var"#f#1"{Int64})(::Int64)
from (::var"#f#1")(x) in Main at REPL[1]:2
Arguments
#self#::var"#f#1"{Int64}
x::Int64
Body::Int64
1 ─ %1 = Core.getfield(#self#, :a)::Int64
│ %2 = (%1 * x)::Int64
└── return %2
everything is type stable so such code is fast.
There are some situations though in which boxing happens (they should be rare; they happen in cases when your function is so complex that the compiler is not able to prove that boxing is not needed; most of the time it happens if you assign value to the variable captured in a closure):
julia> function foo()
x::Int = 1
return bar() = (x = 1; x)
end
foo (generic function with 1 method)
julia> dump(foo())
bar (function of type var"#bar#6")
x: Core.Box
contents: Int64 1
julia> #code_warntype foo()()
MethodInstance for (::var"#bar#1")()
from (::var"#bar#1")() in Main at REPL[1]:3
Arguments
#self#::var"#bar#1"
Locals
x::Union{}
Body::Int64
1 ─ %1 = Core.getfield(#self#, :x)::Core.Box
│ %2 = Base.convert(Main.Int, 1)::Core.Const(1)
│ %3 = Core.typeassert(%2, Main.Int)::Core.Const(1)
│ Core.setfield!(%1, :contents, %3)
│ %5 = Core.getfield(#self#, :x)::Core.Box
│ %6 = Core.isdefined(%5, :contents)::Bool
└── goto #3 if not %6
2 ─ goto #4
3 ─ Core.NewvarNode(:(x))
└── x
4 ┄ %11 = Core.getfield(%5, :contents)::Any
│ %12 = Core.typeassert(%11, Main.Int)::Int64
└── return %12

replace within boxed structure

I have the following (for example) data
'a';'b';'c';'a';'b';'a'
┌─┬─┬─┬─┬─┬─┐
│a│b│c│a│b│a│
└─┴─┴─┴─┴─┴─┘
and I'd like to replace all 'a' with a number, 3, and 'b' with another number 4, and get back
┌─┬─┬─┬─┬─┬─┐
│3│4│c│3│4│3│
└─┴─┴─┴─┴─┴─┘
how can I do that?
Thanks for help.

rplc
If that was a string (like 'abcaba') there would be the easy solution of rplc:
'abcaba' rplc 'a';'3';'b';'4'
34c343
amend }
If you need to have it like boxed data (if, for example, 'a' represents something more complex than a character or atom), then maybe you can use amend }:
L =: 'a';'b';'c';'a';'b';'a'
p =: I. (<'a') = L NB. positions of 'a' in L
0 3 5
(<'3') p } L NB. 'amend' "3" on those positions
putting the above into a dyad:
f =: 4 :'({.x) (I.({:x) = y) } y' NB. amend '{.x' in positions where '{:x' = y
('3';'a') f L
┌─┬─┬─┬─┬─┬─┐
│3│b│c│3│b│3│
└─┴─┴─┴─┴─┴─┘
which you can use in more complex settings:
]L =: (i.5);'abc';(i.3);'hello world';(<1;2)
┌─────────┬───┬─────┬───────────┬─────┐
│0 1 2 3 4│abc│0 1 2│hello world│┌─┬─┐│
│ │ │ │ ││1│2││
│ │ │ │ │└─┴─┘│
└─────────┴───┴─────┴───────────┴─────┘
((1;2);(i.3)) f L
┌─────────┬───┬─────┬───────────┬─────┐
│0 1 2 3 4│abc│┌─┬─┐│hello world│┌─┬─┐│
│ │ ││1│2││ ││1│2││
│ │ │└─┴─┘│ │└─┴─┘│
└─────────┴───┴─────┴───────────┴─────┘
btw, {.y is the first item of y; {:y is the last item of y

bottom line
Here's a little utility you can put in your toolbox:
tr =: dyad def '(y i.~ ({." 1 x),y) { ({:" 1 x) , y'
] MAP =: _2 ]\ 'a';3; 'b';4
+-+-+
|a|3|
+-+-+
|b|4|
+-+-+
MAP tr 'a';'b';'c';'a';'b';'a'
+-+-+-+-+-+-+
|3|4|c|3|4|3|
+-+-+-+-+-+-+
just above the bottom line
The utility tr is a verb which takes two arguments (a dyad): the right argument is the target, and the left argument is the mapping table. The table must have two columns, and each row represents a single mapping. To make just a single replacement, a vector of two items is acceptable (i.e. 1D list instead of 2D table, so long as the list is two items long).
Note that the table must have the same datatype as the target (so, if you're replacing boxes, it must be a table of boxes; if characters, then a table of characters; numbers for numbers, etc).
And, since we're doing like-for-like mapping, the cells of the mapping table must have the same shape as the items of the target, so it's not suitable for tasks like string substitution, which may require shape-shifting. For example, ('pony';'horse') tr 'I want a pony for christmas' won't work (though, amusingly, 'pony horse' tr&.;: 'I want a pony for christmas' would, for reasons I won't get into).
way above the bottom line
There's no one, standard answer to your question. That said, there is a very common idiom to do translation (in the tr, or mapping 1:1, sense):
FROM =: ;: 'cat dog bird'
TO =: ;: 'tiger wolf pterodactyl'
input=: ;: 'cat bird dog bird bird cat'
(FROM i. input) { TO
+-----+-----------+----+-----------+-----------+-----+
|tiger|pterodactyl|wolf|pterodactyl|pterodactyl|tiger|
+-----+-----------+----+-----------+-----------+-----+
To break this down, the primitive i. is the lookup function and the primitive { is the selection function (mnemonic: i. gives you the *i*ndex of the elements you're looking for).
But the simplistic formulation above only applies when you want to replace literally everything in the input, and FROM is guaranteed to be total (i.e. the items of the input are constrained to whatever is in FROM).
These contraints make the simple formulation appropriate for tasks like case conversion of strings, where you want to replace all the letters, and we know the total universe of letters in advance (i.e. the alphabet is finite).
But what happens if we don't have a finite universe? What should we do with unrecognized items? Well, anything we want. This need for flexibility is the reason that there is no one, single translation function in J: instead, the language gives you the tools to craft a solution specific to your needs.
For example, one very common extension to the pattern above is the concept of substitution-with-default (for unrecognized items). And, because i. is defined to return 1+#input for items not found in the lookup, the extension is surprisingly simple: we just extend the replacement list by one item, i.e. just append the default!
DEFAULT =: <'yeti'
input=: ;: 'cat bird dog horse bird monkey cat iguana'
(FROM i. input) { TO,DEFAULT
+-----+-----------+----+----+-----------+----+-----+----+
|tiger|pterodactyl|wolf|yeti|pterodactyl|yeti|tiger|yeti|
+-----+-----------+----+----+-----------+----+-----+----+
Of course, this is destructive in the sense it's not invertible: it leaves no information about the input. Sometimes, as in your question, if you don't know how to replace something, it's best to leave it alone.
Again, this kind of extension is surprisingly simple, and, once you see it, obvious: you extend the lookup table by appending the input. That way, you're guaranteed to find all the items of the input. And replacement is similarly simple: you extend the replacement list by appending the input. So you end up replacing all unknown items with themselves.
( (FROM,input) i. input) { TO,input
+-----+-----------+----+-----+-----------+------+-----+------+
|tiger|pterodactyl|wolf|horse|pterodactyl|monkey|tiger|iguana|
+-----+-----------+----+-----+-----------+------+-----+------+
This is the strategy embodied in tr.
above the top line: an extension
BTW, when writing utilities like tr, J programmers will often consider the N-dimensional case, because that's the spirit of the language. As it stands, tr requires a 2-dimensional mapping table (and, by accident, will accept a 1-dimensional list of two items, which can be convenient). But there may come a day when we want to replace a plane inside a cube, or a cube inside a hypercube, etc (common in in business intelligence applications). We may wish to extend the utility to cover these cases, should they ever arise.
But how? Well, we know the mapping table must have at least two dimensions: one to hold multiple simultaneous substitutions, and another to hold the rules for replacement (i.e. one "row" per substition and two "columns" to identify an item and its replacement). The key here is that's all we need. To generalize tr, we merely need to say we don't care about what's beneath those dimensions. It could be a Nx2 table of single characters, or an Nx2 table of fixed-length strings, or an Nx2 table of matrices for some linear algebra purpose, or ... who cares? Not our problem. We only care about the frame, not the contents.
So let's say that, in tr:
NB. Original
tr =: dyad def '(y i.~ ({." 1 x),y) { ({:" 1 x) , y'
NB. New, laissez-faire definition
tr =: dyad def '(y i.~ ({."_1 x),y) { ({:"_1 x) , y'
A taxing change, as you can see ;). Less glibly: the rank operator " can take positive or negative arguments. A positive argument lets the verb address the content of its input, whereas a negative argument lets the verb address the frame of its input. Here, "1 (positive) applies {. to the rows of the x, whereas "_1 (negative) applies it to the the "rows" of x, where "rows" in scare-quotes simply means the items along the first dimension, even if they happen to be 37-dimensional hyperrectangles. Who cares?
Well, one guy cares. The original definition of tr let the laziest programmer write ('dog';'cat') tr ;: 'a dog makes the best pet' instead of (,:'dog';'cat') tr ;: 'a dog makes the best pet'. That is, the original tr (completely accidentally) allowed a simple list as a mapping table, which of course isn't a Nx2 table, even in an abstract, virtual sense (because it doesn't have at least two dimensions). Maybe we'd like to retain this convenience. If so, we'd have to promote degenerate arguments on the user's behalf:
tr =: dyad define
x=.,:^:(1=##$) x
(y i.~ ({."_1 x),y) { ({:"_1 x) , y
)
After all, laziness is a prime virtue of a programmer.

Here's the simplest way I can think of to accomplish what you have asked for:
(3;3;3;4;4) 0 3 5 1 4} 'a';'b';'c';'a';'b';'a'
┌─┬─┬─┬─┬─┬─┐
│3│4│c│3│4│3│
└─┴─┴─┴─┴─┴─┘
here's another approach
(<3) 0 3 5} (<4) 1 4} 'a';'b';'c';'a';'b';'a'
┌─┬─┬─┬─┬─┬─┐
│3│4│c│3│4│3│
└─┴─┴─┴─┴─┴─┘
Hypothetically speaking, you might want to be generalizing this kind of expression, or you might want an alternative. I think the other posters here have pointed out ways of doing that. . But sometimes just seeing the simplest form can be interesting?
By the way, here's how I got my above indices (with some but not all of the irrelevancies removed):
I. (<'a') = 'a';'b';'c';'a';'b';'a'
0 3 5
('a') =S:0 'a';'b';'c';'a';'b';'a'
1 0 0 1 0 1
('a') -:S:0 'a';'b';'c';'a';'b';'a'
1 0 0 1 0 1
I.('a') -:S:0 'a';'b';'c';'a';'b';'a'
0 3 5
I.('b') -:S:0 'a';'b';'c';'a';'b';'a'
1 4

How do Suffix Trees work?

I'm going through the data structures chapter in The Algorithm Design Manual and came across Suffix Trees.
The example states:
Input:
XYZXYZ$
YZXYZ$
ZXYZ$
XYZ$
YZ$
Z$
$
Output:
I'm not able to understand how that tree gets generated from the given input string. Suffix trees are used to find a given Substring in a given String, but how does the given tree help towards that? I do understand another given example of a trie shown below, but if the below trie gets compacted to a suffix tree, then what would it look like?

The standard efficient algorithms for constructing a suffix tree are definitely nontrivial. The main algorithm for doing so is called Ukkonen's algorithm and is a modification of the naive algorithm with two extra optmizations. You are probably best off reading this earlier question for details on how to build it.
You can construct suffix trees by using the standard insertion algorithms on radix tries to insert each suffix into the tree, but doing so wlil take time O(n2), which can be expensive for large strings.
As for doing fast substring searching, remember that a suffix tree is a compressed trie of all the suffixes of the original string (plus some special end-of-string marker). If a string S is a substring of the initial string T and you had a trie of all the suffixes of T, then you could just do a search to see if T is a prefix of any of the strings in that trie. If so, then T must be a substring of S, since all its characters exist in sequence somewhere in T. The suffix tree substring search algorithm is precisely this search applied to the compressed trie, where you follow the appropriate edges at each step.
Hope this helps!

I'm not able to understand how that tree gets generated from the given input string.
You essentially create a patricia trie with all the suffixes you've listed. When inserting into a patricia trie, you search the root for a child starting with the first char from the input string, if it exists you continue down the tree but if it doesn't then you create a new node off the root. The root will have as many children as unique characters in the input string ($, a, e, h, i, n, r, s, t, w). You can continue that process for each character in the input string.
Suffix trees are used to find a given Substring in a given String, but how does the given tree help towards that?
If you are looking for substring "hen" then start searching from the root for a child which starts with "h". If the length of the string of in child "h" then continue to process child "h" until you've come to the end of the string or you get a mismatch of characters in input string and child "h" string. If you match all of child "h", i.e. input "hen" matches "he" in child "h" then move on to the children of "h" until you get to "n", if it fail to find a child beginning with "n" then the substring doesn't exist.
Compact Suffix Trie code:
└── (black)
├── (white) as
├── (white) e
│ ├── (white) eir
│ ├── (white) en
│ └── (white) ere
├── (white) he
│ ├── (white) heir
│ ├── (white) hen
│ └── (white) here
├── (white) ir
├── (white) n
├── (white) r
│ └── (white) re
├── (white) s
├── (white) the
│ ├── (white) their
│ └── (white) there
└── (black) w
├── (white) was
└── (white) when
Suffix Tree code:
String = the$their$there$was$when$
End of word character = $
└── (0)
├── (22) $
├── (25) as$
├── (9) e
│ ├── (10) ir$
│ ├── (32) n$
│ └── (17) re$
├── (7) he
│ ├── (2) $
│ ├── (8) ir$
│ ├── (31) n$
│ └── (16) re$
├── (11) ir$
├── (33) n$
├── (18) r
│ ├── (12) $
│ └── (19) e$
├── (26) s$
├── (5) the
│ ├── (1) $
│ ├── (6) ir$
│ └── (15) re$
└── (29) w
├── (24) as$
└── (30) hen$

A suffix tree basically just compacts runs of letters together when there are no choices to be made. For example, if you look at the right side of the trie in your question, after you've seen a w, there are really only two choices: was and when. In the trie, the as in was and the hen in when each still have one node for each letter. In a suffix tree, you'd put those together into two nodes holding as and hen, so the right side of your trie would turn into:

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to force grouping of monadic verbs? - j

Related

How to use a vector to cache results within a Haskell function?

Will the native buffer owned by BytesMut keep growing?

Are variables used in nested functions considered global?

replace within boxed structure

How do Suffix Trees work?

Categories

Resources