IIF function in VBA - excel

I am debugging some VBA code and I have an IIF function which I don't understand what is doing. It's IIF(constant And i, "TRUE", "FALSE"), where I gets values from 0 to n.
I know that this function is equivalent to IF in Excel. But in this case what is the condition? What does the expression number and number means? I run this sample:
Sub Sample()
For i = 0 To 20
Cells(i + 1, 1) = i
Cells(i + 1, 2) = 4 And i
Next i
End Sub
And I get the values below:

The expression 4 And i will do an arithmetic AND-operation. Example:
6: 0 0 0 0 0 1 1 0
4: 0 0 0 0 0 1 0 0
Result: 0 0 0 0 0 1 0 0
11: 0 0 0 0 1 0 1 1
4: 0 0 0 0 0 1 0 0
Result: 0 0 0 0 0 0 0 0
So 6 AND 4 is 4 while 11 AND 4 is 0.
Now 0 is seen as False and everything not 0 is seen as True. Used in an IIF-statement, all numbers that have the 3rd bit (counting from right) set will execute the True-part of the IIF, all others the False.

Your And operator does a bit wise comparison:
result = expression1 And expression2
When applied to numeric values, the And operator performs a bitwise comparison of identically positioned bits in two numeric expressions and sets the corresponding bit in result according to the following table.
If bit in expression1 is
And bit in expression2 is
The bit in result is
1
1
1
1
0
0
0
1
0
0
0
0
Since the logical and bitwise operators have a lower precedence than other arithmetic and relational operators, any bitwise operations should be enclosed in parentheses to ensure accurate results.
So converting the 4 into binary it results in 100. I have done that in the 3ʳᵈ column below, where I converted all numbers of the 1ˢᵗ column to binaries. The other 2 columns are just filled up with zeros for better visibility:
So watching the table above you only get a 1 as result if both bits in the same position are a 1 otherwise you get a 0. That means you always get a 4 as result if 4 and i has a 1 in the 3ʳᵈ bit from the right due to the bitwise comparison (see the red bits).

Related

Generate data following specified pattern in J

I'm dabbling my feet with J and, to get the ball rolling, decided to write a function that:
gets integer N;
spits out a table that follows this pattern:
(example for N = 4)
1
0 1
0 0 1
0 0 0 1
i.e. in each row number of zeroes increases from 0 up to N - 1.
However, being newbie, I'm stuck. My current labored (and incorrect) solution for N = 4 case looks like:
(4 # ,: 0 1) #~/"1 1 (1 ,.~/ i.4)
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
And the problem with it is twofold:
it's not general enough and looks kinda ugly (parens and " usage);
trailing zeroes - as I understand, all arrays in J are homogeneous, so in my case every row should be boxed.
Like that:
┌───────┐
│1 │
├───────┤
│0 1 │
├───────┤
│0 0 1 │
├───────┤
│0 0 0 1│
└───────┘
Or I should use strings (e.g. '0 0 1') which will be padded with spaces instead of zeroes.
So, what I'm kindly asking here is:
please provide an idiomatic J solution for this task with explanation;
criticize my attempt and point out how it could be finished.
Thanks in advance!
Like so many challenges in J, sometimes it is better to keep your focus on your result and find a different way to get there. In this case, what your initial approach is doing is creating an identity matrix. I would use
=/~#:i. 4
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
You have correctly identified the issue with the trailing 0's and the fact that J will pad out with 0's to avoid ragged arrays. Boxing avoids this padding since each row is self contained.
So create your lists first. I would use overtake to get the extra 0's
4{.1
1 0 0 0
The next line uses 1: to return 1 as a verb and boxes the overtakes from 1 to 4
(>:#:i. <#:{."0 1:) 4
+-+---+-----+-------+
|1|1 0|1 0 0|1 0 0 0|
+-+---+-----+-------+
Since we want this as reversed and then made into strings, we add ":#:|.#: to the process.
(>:#:i. <#:":#:|.#:{."0 1:) 4
+-+---+-----+-------+
|1|0 1|0 0 1|0 0 0 1|
+-+---+-----+-------+
Then we unbox
>#:(>:#:i. <#:":#:|.#:{."0 1:) 4
1
0 1
0 0 1
0 0 0 1
I am not sure this is the way everyone would solve the problem, but it works.
An alternative solution that does not use boxing and uses the dyadic j. (Complex) and the fact that
1j4 # 1
1 0 0 0 0
(1 j. 4) # 1
1 0 0 0 0
(1 #~ 1 j. ]) 4
1 0 0 0 0
So, I create a list for each integer in i. 4, then reverse them and make them into strings. Since they are now strings, the extra padding is done with blanks.
(1 ":#:|.#:#~ 1 j. ])"0#:i. 4
1
0 1
0 0 1
0 0 0 1
Taking this step by step as to hopefully explain a little better.
i.4
0 1 2 3
Which is then applied to (1 ":#:|.#:#~ 1 j. ]) an atom at a time, hence the use of "0
Breaking down what is going on within the parenthesis. I first take the right three verbs which form a fork.
( 1 j. ])"0#:i.4
1 1j1 1j2 1j3
Now, effectively that gives me
1 ":#:|.#:#~ 1 1j1 1j2 1j3
The middle tine of the fork becomes the verb acting on the two noun arguments.The ~ swaps the arguments. so it becomes equivalent to
1 1j1 1j2 1j3 ":#:|.#:# 1
which because of the way #: works is the same as
": |. 1 1j1 1j2 1j3 # 1
I haven't shown the results of these components because using the "0 on the fork changes how the arguments that are sent to the middle tine and assembled later. I'm hoping that there is enough here that with some hand waving the explanation may suffice
The jump from tacit to explicit can be a big one, so it may be a better exercise to write the same verb explicitly to see if it makes more sense.
lowerTriangle =: 3 : 0
​rightArg=. i. y
​complexCopy=. 1 j. rightArg
​1 (":#:|.#:#~)"0 complexCopy
​)
lowerTriangle 4
1
0 1
0 0 1
0 0 0 1
lowerTriangle 5
1
0 1
0 0 1
0 0 0 1
0 0 0 0 1
See what happens when you 'get the ball rolling'? I guess the thing about J is that the ball goes down a pretty steep slope no matter where you begin. Exciting, eh?

how to remove outermost logic?

how to remove outermost logic?
such as
input column D result
And(OR(A,B),C)
output column E binary number
OR(A,B)
A B C result(D)after extract(E)
0 0 0 0 0
0 0 1 0 0
0 1 0 0 1
0 1 1 1 1
1 0 0 0 1
1 0 1 1 1
1 1 0 0 1
1 1 1 1 1
i tried in excel
=IF(NOT(AND(D2,C2))=TRUE,1,0)
but can not remove outermost logic
result after extract
0 0 0 =IF(AND(OR(A2,B2),C2)=TRUE,1,0) =IF(OR(A2,B2)=TRUE,1,0) =IF(NOT(AND(D2,C2))=TRUE,1,0)
0 0 1 =IF(AND(OR(A3,B3),C3)=TRUE,1,0) =IF(OR(A3,B3)=TRUE,1,0) =IF(NOT(AND(D3,C3))=TRUE,1,0)
0 1 0 =IF(AND(OR(A4,B4),C4)=TRUE,1,0) =IF(OR(A4,B4)=TRUE,1,0) =IF(NOT(AND(D4,C4))=TRUE,1,0)
0 1 1 =IF(AND(OR(A5,B5),C5)=TRUE,1,0) =IF(OR(A5,B5)=TRUE,1,0) =IF(NOT(AND(D5,C5))=TRUE,1,0)
1 0 0 =IF(AND(OR(A6,B6),C6)=TRUE,1,0) =IF(OR(A6,B6)=TRUE,1,0) =IF(NOT(AND(D6,C6))=TRUE,1,0)
1 0 1 =IF(AND(OR(A7,B7),C7)=TRUE,1,0) =IF(OR(A7,B7)=TRUE,1,0) =IF(NOT(AND(D7,C7))=TRUE,1,0)
1 1 0 =IF(AND(OR(A8,B8),C8)=TRUE,1,0) =IF(OR(A8,B8)=TRUE,1,0) =IF(NOT(AND(D8,C8))=TRUE,1,0)
1 1 1 =IF(AND(OR(A9,B9),C9)=TRUE,1,0) =IF(OR(A9,B9)=TRUE,1,0) =IF(NOT(AND(D9,C9))=TRUE,1,0)
By "remove the outermost logic", I assume you want to remove the IF function.
One thing to note is that in a formula like =IF(AND(OR(A2,B2),C2)=TRUE,1,0) you never need the =TRUE test. =IF(AND(OR(A2,B2),C2),1,0) will work exactly the same.
There are a couple of ways to convert a boolean (i.e. true/false value) into an integer without the explicit IF. One is --AND(OR(A2,B2),C2). Another is int(AND(OR(A2,B2),C2)).

Logical not on a scipy sparse matrix

I have a bag-of-words representation of a corpus stored in an D by W sparse matrix word_freqs. Each row is a document and each column is a word. A given element word_freqs[d,w] represents the number of occurrences of word w in document d.
I'm trying to obtain another D by W matrix not_word_occs where, for each element of word_freqs:
If word_freqs[d,w] is zero, not_word_occs[d,w] should be one.
Otherwise, not_word_occs[d,w] should be zero.
Eventually, this matrix will need to be multiplied with other matrices which might be dense or sparse.
I've tried a number of methods, including:
not_word_occs = (word_freqs == 0).astype(int)
This words for toy examples, but results in a MemoryError for my actual data (which is approx. 18,000x16,000).
I've also tried np.logical_not():
word_occs = sklearn.preprocessing.binarize(word_freqs)
not_word_occs = np.logical_not(word_freqs).astype(int)
This seemed promising, but np.logical_not() does not work on sparse matrices, giving the following error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
Any ideas or guidance would be appreciated.
(By the way, word_freqs is generated by sklearn's preprocessing.CountVectorizer(). If there's a solution that involves converting this to another kind of matrix, I'm certainly open to that.)
The complement of the nonzero positions of a sparse matrix is dense. So if you want to achieve your stated goals with standard numpy arrays you will require quite a bit of RAM. Here's a quick and totally unscientific hack to give you an idea, how many arrays of that sort your computer can handle:
>>> import numpy as np
>>> a = []
>>> for j in range(100):
... print(j)
... a.append(np.ones((16000, 18000), dtype=int))
My laptop chokes at j=1. So unless you have a really good computer even if you can get the complement (you can do
>>> compl = np.ones(S.shape,int)
>>> compl[S.nonzero()] = 0
) memory will be an issue.
One way out may be to not explicitly compute the complement let's call it C = B1 - A, where B1 is the same-shape matrix completely filled with ones and A the adjacency matrix of your original sparse matrix. For example the matrix product XC can be written as XB1 - XA so you have one multiplication with the sparse A and one with B1 which is actually cheap because it boils down to computing row sums. The point here is that you can compute that without computing C first.
A particularly simple example would be multiplication with a one-hot vector. Such a multiplication just selects a column (if multiplying from the right) or row (if multiplying from the left) of the other matrix. Meaning you just need to find that column or row of the sparse matrix and take the complement (for a single slice no problem) and if you do this for a one-hot matrix, as above you needn't compute the complement explicitly.
Make a small sparse matrix:
In [743]: freq = sparse.random(10,10,.1)
In [744]: freq
Out[744]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in COOrdinate format>
the repr(freq) shows the shape, elements and format.
In [745]: freq==0
/usr/local/lib/python3.5/dist-packages/scipy/sparse/compressed.py:213: SparseEfficiencyWarning: Comparing a sparse matrix with 0 using == is inefficient, try using != instead.
", try using != instead.", SparseEfficiencyWarning)
Out[745]:
<10x10 sparse matrix of type '<class 'numpy.bool_'>'
with 90 stored elements in Compressed Sparse Row format>
If do your first action, I get a warning and new array with 90 (out of 100) nonzero terms. That not is no longer sparse.
In general numpy functions do not work when applied to sparse matrices. To work they have to delegate the task to sparse methods. But even if logical_not worked it wouldn't solve the memory issue.
Here is an example of using Pandas.SparseDataFrame:
In [42]: X = (sparse.rand(10, 10, .1) != 0).astype(np.int64)
In [43]: X = (sparse.rand(10, 10, .1) != 0).astype(np.int64)
In [44]: d1 = pd.SparseDataFrame(X.toarray(), default_fill_value=0, dtype=np.int64)
In [45]: d2 = pd.SparseDataFrame(np.ones((10,10)), default_fill_value=1, dtype=np.int64)
In [46]: d1.memory_usage()
Out[46]:
Index 80
0 16
1 0
2 8
3 16
4 0
5 0
6 16
7 16
8 8
9 0
dtype: int64
In [47]: d2.memory_usage()
Out[47]:
Index 80
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
dtype: int64
math:
In [48]: d2 - d1
Out[48]:
0 1 2 3 4 5 6 7 8 9
0 1 1 0 0 1 1 0 1 1 1
1 1 1 1 1 1 1 1 1 0 1
2 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 0 1 1
4 1 1 1 1 1 1 1 1 1 1
5 0 1 1 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1 1 1 1
7 0 1 1 0 1 1 1 0 1 1
8 1 1 1 1 1 1 0 1 1 1
9 1 1 1 1 1 1 1 1 1 1
source sparse matrix:
In [49]: d1
Out[49]:
0 1 2 3 4 5 6 7 8 9
0 0 0 1 1 0 0 1 0 0 0
1 0 0 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 1 0 0
4 0 0 0 0 0 0 0 0 0 0
5 1 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0
7 1 0 0 1 0 0 0 1 0 0
8 0 0 0 0 0 0 1 0 0 0
9 0 0 0 0 0 0 0 0 0 0
memory usage:
In [50]: (d2 - d1).memory_usage()
Out[50]:
Index 80
0 16
1 0
2 8
3 16
4 0
5 0
6 16
7 16
8 8
9 0
dtype: int64
PS if you can't build the whole SparseDataFrame at once (because of memory constraints), you can use an approach similar to one used in this answer

Combining pairs in a string (Matlab)

I have a string:
sup_pairs = 'BA CE DF EF AE FC GD DA CG EA AB BG'
How can I combine pairs which have the last character of 1 pair is the first character of the follow pairs into strings? And the new strings must contain all of the character 'A','B','C','D','E','F' , 'G', those characters are appeared in the sup_pairs string.
The expected output should be:
S1 = 'BAEFCGD' % because BA will be followed by AE in sup_pairs string, so we combine BAE, and so on...we continue the rule to generate S1
S2 = 'DFCEABG'
If I have AB, BC and BD, the generated strings should be both : ABC and ABD .
If there is any repeated character in the pairs like : AB BC CA CE . We will skip the second A , and we get ABCE .
This, like all good things in life, is a graph problem. Each letter is a node, and each pair is an edge.
First we must transform your string of pairs into a numeric format so we can use the letters as subscripts. I will use A=2, B=3, ..., G=8:
sup_pairs = 'BA CE DF EF AE FC GD DA CG EA AB BG';
p=strsplit(sup_pairs,' ');
m=cell2mat(p(:));
m=m-'?';
A=sparse(m(:,1),m(:,2),1);
The sparse matrix A is now the adjacency matrix (actually, more like an adjacency list) representing our pairs. If you look at the full matrix of A, it looks like this:
>> full(A)
ans =
0 0 0 0 0 0 0 0
0 0 1 0 0 1 0 0
0 1 0 0 0 0 0 1
0 0 0 0 0 1 0 1
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
As you can see, the edge BA, which translates to subscript (3,2) is equal to 1.
Now you can use your favorite implementation of Depth-first Search (DFS) to perform a traversal of the graph from your starting node of choice. Each path from the root to a leaf node represents a valid string. You then transform the path back into your letter sequence:
treepath=[3,2,6,7,4,8,5];
S1=char(treepath+'?');
Output:
S1 = BAEFCGD
Here's a recursive implementation of DFS to get you going. Normally in MATLAB you have to worry about not hitting the default limitation on recursion depth, but you're finding Hamiltonian paths here, which is NP-complete. If you ever get anywhere near the recursion limit, the computation time will be so huge that increasing the depth will be the least of your worries.
function full_paths = dft_all(A, current_path)
% A - adjacency matrix of graph
% current_path - initially just the start node (root)
% full_paths - cell array containing all paths from initial root to a leaf
n = size(A, 1); % number of nodes in graph
full_paths = cell(1,0); % return cell array
unvisited_mask = ones(1, n);
unvisited_mask(current_path) = 0; % mask off already visited nodes (path)
% multiply mask by array of nodes accessible from last node in path
unvisited_nodes = find(A(current_path(end), :) .* unvisited_mask);
% add restriction on length of paths to keep (numel == n)
if isempty(unvisited_nodes) && (numel(current_path) == n)
full_paths = {current_path}; % we've found a leaf node
return;
end
% otherwise, still more nodes to search
for node = unvisited_nodes
new_path = dft_all(A, [current_path node]); % add new node and search
if ~isempty(new_path) % if this produces a new path...
full_paths = {full_paths{1,:}, new_path{1,:}}; % add it to output
end
end
end
This is a normal Depth-first traversal except for the added condition on the length of the path in line 15:
if isempty(unvisited_nodes) && (numel(current_path) == n)
The first half of the if condition, isempty(unvisited_nodes) is standard. If you only use this part of the condition you'll get all paths from the start node to a leaf, regardless of path length. (Hence the cell array output.) The second half, (numel(current_path) == n) enforces the length of the path.
I took a shortcut here because n is the number of nodes in the adjacency matrix, which in the sample case is 8 rather than 7, the number of characters in your alphabet. But there are no edges into or out of node 1 because I was apparently planning on using a trick that I never got around to telling you about. Rather than run DFS starting from each of the nodes to get all of the paths, you can make a dummy node (in this case node 1) and create an edge from it to all of the other real nodes. Then you just call DFS once on node 1 and you get all the paths. Here's the updated adjacency matrix:
A =
0 1 1 1 1 1 1 1
0 0 1 0 0 1 0 0
0 1 0 0 0 0 0 1
0 0 0 0 0 1 0 1
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
If you don't want to use this trick, you can change the condition to n-1, or change the adjacency matrix not to include node 1. Note that if you do leave node 1 in, you need to remove it from the resulting paths.
Here's the output of the function using the updated matrix:
>> dft_all(A, 1)
ans =
{
[1,1] =
1 2 3 8 5 7 4 6
[1,2] =
1 3 2 6 7 4 8 5
[1,3] =
1 3 8 5 2 6 7 4
[1,4] =
1 3 8 5 7 4 6 2
[1,5] =
1 4 6 2 3 8 5 7
[1,6] =
1 5 7 4 6 2 3 8
[1,7] =
1 6 2 3 8 5 7 4
[1,8] =
1 6 7 4 8 5 2 3
[1,9] =
1 7 4 6 2 3 8 5
[1,10] =
1 8 5 7 4 6 2 3
}

How to count the frequency of a element in APL or J without loops

Assume I have two lists, one is the text t, one is a list of characters c. I want to count how many times each character appears in the text.
This can be done easily with the following APL code.
+⌿t∘.=c
However it is slow. It take the outer product, then sum each column.
It is a O(nm) algorithm where n and m are the size of t and c.
Of course I can write a procedural program in APL that read t character by character and solve this problem in O(n+m) (assume perfect hashing).
Are there ways to do this faster in APL without loops(or conditional)? I also accept solutions in J.
Edit:
Practically speaking, I'm doing this where the text is much shorter than the list of characters(the characters are non-ascii). I'm considering where text have length of 20 and character list have length in the thousands.
There is a simple optimization given n is smaller than m.
w ← (∪t)∩c
f ← +⌿t∘.=w
r ← (⍴c)⍴0
r[c⍳w] ← f
r
w contains only the characters in t, therefore the table size only depend on t and not c. This algorithm runs in O(n^2+m log m). Where m log m is the time for doing the intersection operation.
However, a sub-quadratic algorithm is still preferred just in case someone gave a huge text file.
NB. Using "key" (/.) adverb w/tally (#) verb counts
#/.~ 'abdaaa'
4 1 1
NB. the items counted are the nub of the string.
~. 'abdaaa'
abd
NB. So, if we count the target along with the string
#/.~ 'abc','abdaaa'
5 2 1 1
NB. We get an extra one for each of the target items.
countKey2=: 4 : '<:(#x){.#/.~ x,y'
NB. This subtracts 1 (<:) from each count of the xs.
6!:2 '''1'' countKey2 10000000$''1234567890'''
0.0451088
6!:2 '''1'' countKey2 1e7$''1234567890'''
0.0441849
6!:2 '''1'' countKey2 1e8$''1234567890'''
0.466857
NB. A tacit version
countKey=. [: <: ([: # [) {. [: #/.~ ,
NB. appears to be a little faster at first
6!:2 '''1'' countKey 1e8$''1234567890'''
0.432938
NB. But repeating the timing 10 times shows they are the same.
(10) 6!:2 '''1'' countKey 1e8$''1234567890'''
0.43914
(10) 6!:2 '''1'' countKey2 1e8$''1234567890'''
0.43964
Dyalog v14 introduced the key operator (⌸):
{⍺,⍴⍵}⌸'abcracadabra'
a 5
b 2
c 2
r 2
d 1
The operand function takes a letter as ⍺ and the occurrences of that letter (vector of indices) as ⍵.
I think this example, written in J, fits your request. The character list is longer than the text (but both are kept short for convenience during development.) I have not examined timing but my intuition is that it will be fast. The tallying is done only with reference to characters that actually occur in the text, and the long character set is looked across only to correlate characters that occur in the text.
c=: 80{.43}.a.
t=: 'some {text} to examine'
RawIndicies=: c i. ~.t
Mask=: RawIndicies ~: #c
Indicies=: Mask # RawIndicies
Tallies=: Mask # #/.~ t
Result=: Tallies Indicies} (#c)$0
4 20 $ Result
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 4 0
0 0 1 0 0 0 2 1 2 0 0 0 1 3 0 0 0 2 0 0
4 20 $ c
+,-./0123456789:;<=>
?#ABCDEFGHIJKLMNOPQR
STUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz
As noted in other answers, the key operator does this directly. However the classic APL way of solving this problem is still worth knowing.
The classic solution is "sort, shift, and compare":
c←'missippi'
t←'abcdefghijklmnopqrstuvwxyz'
g←⍋c
g
1 4 7 0 5 6 2 3
s←c[g]
s
iiimppss
b←s≠¯1⌽s
b
1 0 0 1 1 0 1 0
n←b/⍳⍴b
n
0 3 4 6
k←(1↓n,⍴b)-n
k
3 1 2 2
u←b/s
u
imps
And for the final answer:
z←(⍴t)⍴0
z
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z[t⍳u]←k
z
0 0 0 0 0 0 0 0 3 0 0 0 1 0 0 2 0 0 2 0 0 0 0 0 0 0
This code is off the top of my head, not ready for production. Have to look for empty cases - the boolean shift is probably not right for all cases....
"Brute force" in J:
count =: (i.~~.) ({,&0) (]+/"1#:=)
Usage:
'abc' count 'abdaaa'
4 1 0
Not sure how it's implemented internally, but here are the timings for different input sizes:
6!:2 '''abcdefg'' count 100000$''abdaaaerbfqeiurbouebjkvwek''' NB: run time for #t = 100000
0.00803909
6!:2 '''abcdefg'' count 1000000$''abdaaaerbfqeiurbouebjkvwek'''
0.0845451
6!:2 '''abcdefg'' count 10000000$''abdaaaerbfqeiurbouebjkvwek''' NB: and for #t = 10^7
0.862423
We don't filter input date prior to 'self-classify' so:
6!:2 '''1'' count 10000000$''1'''
0.244975
6!:2 '''1'' count 10000000$''1234567890'''
0.673034
6!:2 '''1234567890'' count 10000000$''1234567890'''
0.673864
My implementation in APL (NARS2000):
(∪w),[0.5]∪⍦w←t∩c
Example:
c←'abcdefg'
t←'abdaaaerbfqeiurbouebjkvwek'
(∪w),[0.5]∪⍦w←t∩c
a b d e f
4 4 1 4 1
Note: showing only those characters in c that exist in t
My initial thought was that this was a case for the Find operator:
T←'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
C←'MISSISSIPPI'
X←+/¨T⍷¨⊂C
The used characters are:
(×X)/T
IMPS
Their respective frequencies are:
X~0
4 1 2 4
I've only run toy cases so I have no idea what the performance is, but my intuition tells me it should be cheaper that the outer product.
Any thoughts?

Resources