Is there a way to prevent heap exhausted in common lisp

Is there a way to prevent heap exhausted in common lisp - garbage-collection

I am not too familiar with garbage collection in lisp, and I wonder how it is possible to manage it in order to avoid the
fatal error: Heap exhausted during garbage collection in *inferior-lisp*.
SLIME 2.20 and SBCL 2.0.1
Heap exhausted during garbage collection: 0 bytes available, 16 requested.
Gen Boxed Code Raw LgBox LgCode LgRaw Pin Alloc Waste Trig WP GCs Mem-age
1 19685 0 1 0 0 0 5 644710992 359856 21474836 19686 0 1.0000
2 29304 0 1 0 0 0 13 960070208 196032 21474836 29305 0 0.0000
3 0 0 0 0 0 0 0 0 0 21474836 0 0 0.0000
4 0 0 0 0 0 0 0 0 0 21474836 0 0 0.0000
5 367 1 130 34 0 15 51 17101008 823088 38575844 547 13 0.0000
6 485 2 221 55 0 10 0 24716944 612720 2000000 773 0 0.0000
7 15224 0 1 0 0 0 0 498860064 32736 2000000 15225 0 0.0000
Total bytes allocated = 2145459216
Dynamic-space-size bytes = 2147483648
GC control variables:
*GC-INHIBIT* = true
*GC-PENDING* = true
*STOP-FOR-GC-PENDING* = false
fatal error encountered in SBCL pid 84761(tid 0x700000026000):
Heap exhausted, game over.
Error opening /dev/tty: Device not configured
Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb>
I am using an algorithm to solve combinatorial issues, and as you can guess, the field of search increase exponentially. It is no point to increase the dynamic-space-size, because this will not solve the issue. So, the idea is to stop the process before heap exhausted. Something like a condition when a memory limit is reached for instance.
Any help is welcome.

It seems that there is no real answer to my question.
I found some guidelines with which I will manage my issue.
Here are the links for the record: https://sourceforge.net/p/sbcl/mailman/message/30184781/ and https://sourceforge.net/p/sbcl/mailman/message/18414749/

Related

dsbulk unload is failing on large table

trying to unload data from a huge table, below is the command used and output.
$ /home/cassandra/dsbulk-1.8.0/bin/dsbulk unload --driver.auth.provider PlainTextAuthProvider --driver.auth.username xxxx --driver.auth.password xxxx --datastax-java-driver.basic.contact-points 123.123.123.123 -query "select count(*) from sometable with where on clustering column and partial pk -- allow filtering" --connector.name json --driver.protocol.compression LZ4 --connector.json.mode MULTI_DOCUMENT -maxConcurrentFiles 1 -maxRecords -1 -url dsbulk --executor.continuousPaging.enabled false --executor.maxpersecond 2500 --driver.socket.timeout 240000
Setting dsbulk.driver.protocol.compression is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.protocol.compression instead.
Setting dsbulk.driver.auth.* is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.auth-provider.* instead.
Operation directory: /home/cassandra/logs/COUNT_20210423-070104-108326
total | failed | rows/s | p50ms | p99ms | p999ms
1 | 1 | 0 | 109,790.10 | 110,058.54 | 110,058.54
Operation COUNT_20210423-070104-108326 completed with 1 errors in 1 minute and 50 seconds.
Here are the dsbulk records --
cassandra#somehost> cd logs
cassandra#somehost> cd COUNT_20210423-070104-108326/
cassandra#somehost> ls
operation.log unload-errors.log
cassandra#somehost> cat operation.log
2021-04-23 07:01:04 WARN Setting dsbulk.driver.protocol.compression is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.protocol.compression instead.
2021-04-23 07:01:04 WARN Setting dsbulk.driver.auth.* is deprecated and will be removed in a future release; please configure the driver directly using --datastax-java-driver.advanced.auth-provider.* instead.
2021-04-23 07:01:04 INFO Operation directory: /home/cassandra/logs/COUNT_20210423-070104-108326
2021-04-23 07:02:55 WARN Operation COUNT_20210423-070104-108326 completed with 1 errors in 1 minute and 50 seconds.
2021-04-23 07:02:55 INFO Records: total: 1, successful: 0, failed: 1
2021-04-23 07:02:55 INFO Memory usage: used: 212 MB, free: 1,922 MB, allocated: 2,135 MB, available: 27,305 MB, total gc count: 4, total gc time: 98 ms
2021-04-23 07:02:55 INFO Reads: total: 1, successful: 0, failed: 1, in-flight: 0
2021-04-23 07:02:55 INFO Throughput: 0 reads/second
2021-04-23 07:02:55 INFO Latencies: mean 109,790.10, 75p 110,058.54, 99p 110,058.54, 999p 110,058.54 milliseconds
2021-04-23 07:02:58 INFO Final stats:
2021-04-23 07:02:58 INFO Records: total: 1, successful: 0, failed: 1
2021-04-23 07:02:58 INFO Memory usage: used: 251 MB, free: 1,883 MB, allocated: 2,135 MB, available: 27,305 MB, total gc count: 4, total gc time: 98 ms
2021-04-23 07:02:58 INFO Reads: total: 1, successful: 0, failed: 1, in-flight: 0
2021-04-23 07:02:58 INFO Throughput: 0 reads/second
2021-04-23 07:02:58 INFO Latencies: mean 109,790.10, 75p 110,058.54, 99p 110,058.54, 999p 110,058.54 milliseconds
cassandra#somehost> cat unload-errors.log
Statement: com.datastax.oss.driver.internal.core.cql.DefaultBoundStatement#1083fef9 [0 values, idempotence: <UNSET>, CL: <UNSET>, serial CL: <UNSET>, timestamp: <UNSET>, timeout: <UNSET>]
SELECT batch_id from .... allow filtering (Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded))
at com.datastax.oss.dsbulk.executor.api.subscription.ResultSubscription.toErrorPage(ResultSubscription.java:534)
at com.datastax.oss.dsbulk.executor.api.subscription.ResultSubscription.lambda$fetchNextPage$1(ResultSubscription.java:372)
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.setFinalError(CqlRequestHandler.java:447) [4 skipped]
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.access$700(CqlRequestHandler.java:94)
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processRetryVerdict(CqlRequestHandler.java:859)
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.processErrorResponse(CqlRequestHandler.java:828)
at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler$NodeResponseCallback.onResponse(CqlRequestHandler.java:655)
at com.datastax.oss.driver.internal.core.channel.InFlightHandler.channelRead(InFlightHandler.java:257)
at java.lang.Thread.run(Thread.java:748) [24 skipped]
Caused by: com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)
Cassandra's system.log snippet ----
DEBUG [ScheduledTasks:1] 2021-04-23 00:01:48,539 MonitoringTask.java:152 - 1 operations timed out in the last 5015 msecs:
<SELECT * FROM my query being run with limit - LIMIT 5000>, total time 10004 msec, timeout 10000 msec/cross-node
INFO [ScheduledTasks:1] 2021-04-23 00:02:38,540 MessagingService.java:1302 - RANGE_SLICE messages were dropped in last 5000 ms: 0 internal and 1 cross node
. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 10299 ms
INFO [ScheduledTasks:1] 2021-04-23 00:02:38,551 StatusLogger.java:114 -
Pool Name Active Pending Completed Blocked All Time Blocked
ReadStage 1 0 1736872997 0 0
ContinuousPagingStage 0 0 586 0 0
RequestResponseStage 0 0 1483193130 0 0
ReadRepairStage 0 0 9079516 0 0
CounterMutationStage 0 0 0 0 0
MutationStage 0 0 351841038 0 0
ViewMutationStage 0 0 0 0 0
CommitLogArchiver 0 0 32961 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 12034828 0 0
MemtableReclaimMemory 0 0 68612 0 0
PendingRangeCalculator 0 0 9 0 0
AntiCompactionExecutor 0 0 0 0 0
GossipStage 0 0 20137208 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 3798 0 0
MigrationStage 0 0 8 0 0
MemtablePostFlush 0 0 338955 0 0
PerDiskMemtableFlushWriter_0 0 0 66297 0 0
ValidationExecutor 0 0 247600 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 41757 0 0
InternalResponseStage 0 0 525242 0 0
AntiEntropyStage 0 0 767527 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 0 0 958717934 0 65
CompactionManager 0 0
MessagingService n/a 0/0
Cache Type Size Capacity KeysToSave
KeyCache 104857216 104857600 all
RowCache 0 0 all

The problem is that you are running an unload command in DSBulk to perform a SELECT COUNT() which means it has to do a full table scan to return a single row.
In addition, ALLOW FILTERING is not recommended unless you restrict the query to a single partition. In any case, the performance of ALLOW FILTERING is very unpredictable even in optimal situations.
I suggest that you instead use the DSBulk count command which is optimised for counting rows or partitions in Cassandra. For details, see the Counting data with DSBulk example.
There's also additional examples in this DSBulk Counting blog post which Alex Ott already linked in his answer. Cheers!

Expand select count(*) from sometable with where on clustering column and partial pk -- allow filtering with an additional condition on the token ranges, like this: and partial pk token(full_pk) > :start and token(full_pk) <= :end - in this case, DSBulk will perform many queries against specific token ranges that are sent to multiple nodes, and won't create the load onto the single node as in your case.
Look into the documentation for -query option, and for 4th blog in this series of blog posts about DSBulk, that could provide more information & examples: 1, 2, 3, 4, 5, 6

Excel: Match and Index based on range

I am stumped with the following problem and not sure how to accomplish it in excel. Here is an example of the data:
A B
1 Date Stock_Return
2 Jan-95 -5.2%
3 Feb-95 2.1%
4 Mar-95 3.7%
5 Apr-95 6.9%
6 May-95 6.5%
7 Jun-95 -5.6%
8 Jul-95 6.6%
9 Aug-95 6.2%
What I would like is to have the dates returned which fall within a certain return range and sorted from low to high.
For example:
1 2 3 4 5
Below -7% 0 0 0 0 0
-7% to -5% Jun-95 Jan-95 0 0 0
-5% to -3% 0 0 0 0 0
-3% to 0% 0 0 0 0 0
0% to 3% Feb-95 0 0 0 0
3% to 5% Mar-95 0 0 0 0
5% to 7% Aug-95 May-95 Jul-95 Apr-95 0
I thought Index and Match might make the most sense but when I drag across columns it doesn't work. Any help is very much appreciated.

You can use AGGREGATE function:
=IFERROR(AGGREGATE(14,6,$A$2:$A$9/(($B$2:$B$9>$D2)*($B$2:$B$9<=$E2)),COLUMN(A1)),"0")

If you have Excel O365, you can use the FILTER function:
F2: =IFERROR(TRANSPOSE(FILTER($A$2:$A$9,(F2<=$B$2:$B$9)*(G2>=$B$2:$B$9))),"")
and fill down.

Cassandra write performance

We have this Cassandra cluster and would like to know if current performance is normal and what we can do to improve it.
Cluster is formed of 3 nodes located on the same datacenter with total capacity of 465GB and 2GB of Heap per node. Each node has 8 cores and 8GB or RAM. Version of different components are cqlsh 5.0.1 | Cassandra 2.1.11.872 | DSE 4.7.4 | CQL spec 3.2.1 | Native protocol v3
The workload is described as follows:
Keyspace use org.apache.cassandra.locator.SimpleStrategy placement strategy and replication factor of 3 (this is very important for us)
Workload consist mainly of write operations into a single table. The table schema is as follows:
CREATE TABLE aiceweb.records (
process_id timeuuid,
partition_key int,
collected_at timestamp,
received_at timestamp,
value text,
PRIMARY KEY ((process_id, partition_key), collected_at, received_at)
) WITH CLUSTERING ORDER BY (collected_at DESC, received_at ASC)
AND read_repair_chance = 0.0
AND dclocal_read_repair_chance = 0.1
AND gc_grace_seconds = 864000
AND bloom_filter_fp_chance = 0.01
AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' }
AND comment = ''
AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' }
AND compression = { 'sstable_compression' : 'org.apache.cassandra.io.compress.LZ4Compressor' }
AND default_time_to_live = 0
AND speculative_retry = '99.0PERCENTILE'
AND min_index_interval = 128
AND max_index_interval = 2048;
Write operations come from a NodeJS based API server. The Nodejs driver provided by Datastax is used (version recently updated from 2.1.1 to 3.2.0). The code in charge of performing the write request will group write operations per Primary Key and additionally it will limit the request size to 500 INSERTs per request. The write operation is performed as a BATCH. The only options explicitly set are prepare:true, logged:false.
OpsCenter reflect a historial level of less than one request per second in the last year using this setup (each write request been a BATCH of up to 500 operations directed to the same table and the same partition). Write request latency has been at 1.6ms for 90% of requests for almost the entire year but lately it has increased up to more than 2.6ms for the 90% of requests. Os Load has been below 2.0 and Disk Utilization has been below 5% most of the time with few peaks at 7%. Average Heap usage has been 1.3GB the entire year with peaks at 1.6GB even though currently this peak is rising during the last month.
The problem with this setup is that API performance has been degrading the entire year. Currently the BATCH operation can take from 300ms up to more than 12s (leading to a operation timeout). In some cases the NodeJS driver report all Cassandra drivers down even when OpsCenter report all nodes alive and healthy.
Compaction Stats shows always 0 on each node and nodetool tpstats is showing something like:
Pool Name Active Pending Completed Blocked All time blocked
CounterMutationStage 0 0 10554 0 0
ReadStage 0 0 687567 0 0
RequestResponseStage 0 0 767898 0 0
MutationStage 0 0 393407 0 0
ReadRepairStage 0 0 411 0 0
GossipStage 0 0 1314414 0 0
CacheCleanupExecutor 0 0 48 0 0
MigrationStage 0 0 0 0 0
ValidationExecutor 0 0 126 0 0
Sampler 0 0 0 0 0
MemtableReclaimMemory 0 0 497 0 0
InternalResponseStage 0 0 126 0 0
AntiEntropyStage 0 0 630 0 0
MiscStage 0 0 0 0 0
CommitLogArchiver 0 0 0 0 0
MemtableFlushWriter 0 0 485 0 0
PendingRangeCalculator 0 0 4 0 0
MemtablePostFlush 0 0 7879 0 0
CompactionExecutor 0 0 263599 0 0
AntiEntropySessions 0 0 3 0 0
HintedHandoff 0 0 8 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
PAGED_RANGE 0
BINARY 0
READ 0
MUTATION 0
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
Any help or suggestion with this problem will be deeply appreciated. Feel free to request any other information you need to analyze it.
Best regards

You are staying with same amount of requests, or the workload is growing?
Looks like the server is overloaded (maybe network).

I would try to find a reproducer and run the reproducer with tracing enabled - hopefully that will help to understand what the issue is (especially if you compare it to a trace in which the latency is good).
There is an example on how to enable query tracing and retrive the output via nodejs driver examples retrieve-query-trace.js (can be found on https://github.com/datastax/nodejs-driver)

Excel Offset is returning a #VALUE! error when the height is >1 (Excel 2010)

I've set up Names with the intention of using them to return data ranges for a line chart. The X values are "GI", "IE" and "EE". The Y value is "DATE".
However, my "DATE" and "GI" names are returning "#VALUE!" errors - whereas IE and EE are not.
So far, I have found that this error occurs when the height value (CountIf below) is more than 1.
The cell range, and beyond to 2000-and-something, are dynamically generated from user selections to form a Date Range. Ergo the use of CountIf rather than CountA.
Any help would be much appreciated. This is the last leg of a difficult workbook!
DATE:
=OFFSET(Graph!$B$8,0,0,COUNTIF(Graph!$B$8:$B$2927,">"&0)-1)
GI:
=OFFSET(Graph!$C$8,0,0,COUNTIF(Graph!$C$8:$C$2927,">"&0)-1)
IE:
=OFFSET(Graph!$D$8,0,0,COUNTIF(Graph!$D$8:$D$2927,">"&0)-1)
EE:
=OFFSET(Graph!$E$8,0,0,COUNTIF(Graph!$E$8:$E$2927,">"&0)-1)
Information:
B C D E
7 DATE GI IE EE
8 25/04/2011 0 0 0
9 26/04/2011 0 0 0
10 27/04/2011 0 0 0
11 28/04/2011 0 0 0
12 29/04/2011 0 0 0
13 30/04/2011 0 0 0
14 01/05/2011 0 0 0
15 02/05/2011 0 0 0
16 03/05/2011 0 0 0
17 04/05/2011 0 0 0
18 05/05/2011 0 0 0
19 06/05/2011 0 0 0
20 07/05/2011 0 0 0
21 08/05/2011 0 0 0
22 09/05/2011 0 0 0
23 10/05/2011 18000 0 0
24 11/05/2011 18000 0 0
25 12/05/2011 18000 0 0
26 13/05/2011 18000 0 0
27 14/05/2011 18000 0 0
28 15/05/2011 18000 0 0
29 16/05/2011 18000 0 0
30 17/05/2011 18000 0 0
31 18/05/2011 18000 0 0
32 19/05/2011 18000 0 0
33 20/05/2011 18000 0 0
34 21/05/2011 18000 0 0
35 22/05/2011 18000 0 0

This formula should create the correct named Range for date:
=OFFSET(Sheet1!$B$8,0,0,MATCH(Sheet1!$D$4,Sheet1!$B$8:$B$2927,0),1)
For GI:
=OFFSET(Sheet1!$B$8,0,1,MATCH(Sheet1!$D$4,Sheet1!$B$8:$B$2927,0),1)
For IE:
=OFFSET(Sheet1!$B$8,0,2,MATCH(Sheet1!$D$4,Sheet1!$B$8:$B$2927,0),1)
For EE:
=OFFSET(Sheet1!$B$8,0,3,MATCH(Sheet1!$D$4,Sheet1!$B$8:$B$2927,0),1)
(D4 contains the end date dropdown.)
In the data selection for the graph, it is important to write the named Range including the sheet it's on, e.g.: =Sheet1!nrDate instead of just =nrDate.
Please let me know if this works for you.

So based on your data, an going a slightly different route than offset (offset route should work) I used the index route.
for the x axis I used
=INDEX($B$9:$B$36,MATCH($C$5,$B$9:$B$36,0)):INDEX($B$9:$B$36,MATCH($D$5,$B$9:$B$36,0))
I used a defined name of X_axis
for the y axis I used
=INDEX($C$9:$C$36,MATCH($C$5,$B$9:$B$36,0)):INDEX($C$9:$C$36,MATCH($D$5,$B$9:$B$36,0))
I used a define name of Y_axis. For your second series on the Y axis, you would need to change the reference range from C9:C36, to the appropriate column that is lined up with your dates.
When defining the series, I had to use the workbook name in conjunction with the named range. so series data looked like this:
Proof of Concept

How to count the frequency of a element in APL or J without loops

Assume I have two lists, one is the text t, one is a list of characters c. I want to count how many times each character appears in the text.
This can be done easily with the following APL code.
+⌿t∘.=c
However it is slow. It take the outer product, then sum each column.
It is a O(nm) algorithm where n and m are the size of t and c.
Of course I can write a procedural program in APL that read t character by character and solve this problem in O(n+m) (assume perfect hashing).
Are there ways to do this faster in APL without loops(or conditional)? I also accept solutions in J.
Edit:
Practically speaking, I'm doing this where the text is much shorter than the list of characters(the characters are non-ascii). I'm considering where text have length of 20 and character list have length in the thousands.
There is a simple optimization given n is smaller than m.
w ← (∪t)∩c
f ← +⌿t∘.=w
r ← (⍴c)⍴0
r[c⍳w] ← f
r
w contains only the characters in t, therefore the table size only depend on t and not c. This algorithm runs in O(n^2+m log m). Where m log m is the time for doing the intersection operation.
However, a sub-quadratic algorithm is still preferred just in case someone gave a huge text file.

NB. Using "key" (/.) adverb w/tally (#) verb counts
#/.~ 'abdaaa'
4 1 1
NB. the items counted are the nub of the string.
~. 'abdaaa'
abd
NB. So, if we count the target along with the string
#/.~ 'abc','abdaaa'
5 2 1 1
NB. We get an extra one for each of the target items.
countKey2=: 4 : '<:(#x){.#/.~ x,y'
NB. This subtracts 1 (<:) from each count of the xs.
6!:2 '''1'' countKey2 10000000$''1234567890'''
0.0451088
6!:2 '''1'' countKey2 1e7$''1234567890'''
0.0441849
6!:2 '''1'' countKey2 1e8$''1234567890'''
0.466857
NB. A tacit version
countKey=. [: <: ([: # [) {. [: #/.~ ,
NB. appears to be a little faster at first
6!:2 '''1'' countKey 1e8$''1234567890'''
0.432938
NB. But repeating the timing 10 times shows they are the same.
(10) 6!:2 '''1'' countKey 1e8$''1234567890'''
0.43914
(10) 6!:2 '''1'' countKey2 1e8$''1234567890'''
0.43964

Dyalog v14 introduced the key operator (⌸):
{⍺,⍴⍵}⌸'abcracadabra'
a 5
b 2
c 2
r 2
d 1
The operand function takes a letter as ⍺ and the occurrences of that letter (vector of indices) as ⍵.

I think this example, written in J, fits your request. The character list is longer than the text (but both are kept short for convenience during development.) I have not examined timing but my intuition is that it will be fast. The tallying is done only with reference to characters that actually occur in the text, and the long character set is looked across only to correlate characters that occur in the text.
c=: 80{.43}.a.
t=: 'some {text} to examine'
RawIndicies=: c i. ~.t
Mask=: RawIndicies ~: #c
Indicies=: Mask # RawIndicies
Tallies=: Mask # #/.~ t
Result=: Tallies Indicies} (#c)$0
4 20 $ Result
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 4 0
0 0 1 0 0 0 2 1 2 0 0 0 1 3 0 0 0 2 0 0
4 20 $ c
+,-./0123456789:;<=>
?#ABCDEFGHIJKLMNOPQR
STUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz

As noted in other answers, the key operator does this directly. However the classic APL way of solving this problem is still worth knowing.
The classic solution is "sort, shift, and compare":
c←'missippi'
t←'abcdefghijklmnopqrstuvwxyz'
g←⍋c
g
1 4 7 0 5 6 2 3
s←c[g]
s
iiimppss
b←s≠¯1⌽s
b
1 0 0 1 1 0 1 0
n←b/⍳⍴b
n
0 3 4 6
k←(1↓n,⍴b)-n
k
3 1 2 2
u←b/s
u
imps
And for the final answer:
z←(⍴t)⍴0
z
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z[t⍳u]←k
z
0 0 0 0 0 0 0 0 3 0 0 0 1 0 0 2 0 0 2 0 0 0 0 0 0 0
This code is off the top of my head, not ready for production. Have to look for empty cases - the boolean shift is probably not right for all cases....

"Brute force" in J:
count =: (i.~~.) ({,&0) (]+/"1#:=)
Usage:
'abc' count 'abdaaa'
4 1 0
Not sure how it's implemented internally, but here are the timings for different input sizes:
6!:2 '''abcdefg'' count 100000$''abdaaaerbfqeiurbouebjkvwek''' NB: run time for #t = 100000
0.00803909
6!:2 '''abcdefg'' count 1000000$''abdaaaerbfqeiurbouebjkvwek'''
0.0845451
6!:2 '''abcdefg'' count 10000000$''abdaaaerbfqeiurbouebjkvwek''' NB: and for #t = 10^7
0.862423
We don't filter input date prior to 'self-classify' so:
6!:2 '''1'' count 10000000$''1'''
0.244975
6!:2 '''1'' count 10000000$''1234567890'''
0.673034
6!:2 '''1234567890'' count 10000000$''1234567890'''
0.673864

My implementation in APL (NARS2000):
(∪w),[0.5]∪⍦w←t∩c
Example:
c←'abcdefg'
t←'abdaaaerbfqeiurbouebjkvwek'
(∪w),[0.5]∪⍦w←t∩c
a b d e f
4 4 1 4 1
Note: showing only those characters in c that exist in t

My initial thought was that this was a case for the Find operator:
T←'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
C←'MISSISSIPPI'
X←+/¨T⍷¨⊂C
The used characters are:
(×X)/T
IMPS
Their respective frequencies are:
X~0
4 1 2 4
I've only run toy cases so I have no idea what the performance is, but my intuition tells me it should be cheaper that the outer product.
Any thoughts?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Is there a way to prevent heap exhausted in common lisp - garbage-collection

It seems that there is no real answer to my question. I found some guidelines with which I will manage my issue. Here are the links for the record: https://sourceforge.net/p/sbcl/mailman/message/30184781/ and https://sourceforge.net/p/sbcl/mailman/message/18414749/

Related

dsbulk unload is failing on large table

Excel: Match and Index based on range

Cassandra write performance

Excel Offset is returning a #VALUE! error when the height is >1 (Excel 2010)

How to count the frequency of a element in APL or J without loops

Categories

Resources