MemSQL code generation has failed: Failed to codegen - singlestore

Ihave workstation of 250 gb Ram and 4 tb SSD. The memsql has a table that contains 1 billion records each of which 44 columns with 500 gb data. When I run the following query on that table
SELECT count(*) ct,name,age FROM research.all_data group by name having count(*) >100 order by ct desc
I got the following error
MemSQL code generation has failed
I made a restart to the server and after that I got another error
Not enough memory available to complete the current request. The request was not processed
I gave the server maximum mermory 220 GB and max_table_memory 190 GB.
why that error could happen?
why memsql consuming 140 gb from memory however I am using column store?

For "MemSQL code generation has failed", check the tracelog (http://docs.memsql.com/docs/trace-log) on the MemSQL node where the error was hit for more details - this can mean a lot of different things.
MemSQL needs memory to process query results, hold some metadata, etc. even though columnstore data lives on disk. Check memsql status info to see what is using memory - https://knowledgebase.memsql.com/hc/en-us/articles/208759276-What-is-using-memory-on-my-leaves-.

Related

Cassandra-2.2.3 : Repeatedly facing "writing large partition error" even after multiple repairs

We have a 6 node each 2 datacenter Cassandra cluster production environment setup. We encounter large partition warning. We ran 2 successful repairs, still this is not getting resolved. How can I analyze and fix this?
BigTableWriter.java:184 - Writing large partition system_distributed/repair_history:rf_key_space:my_table (108140638 bytes)
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 1171896
Mismatch (Blocking): 808
Mismatch (Background): 131
Pool Name Active Pending Completed
Large messages n/a 11 0
Small messages n/a 0 48881938
Gossip messages n/a 0 113659
The system_distributed.repair_history table is not one that you really need to concern yourself with. Unfortunately, this can happen when a lot of repairs have been run. With 2.2, the only real solution is to TRUNCATE that table every now and then.

Java Heap Space issue in Grakn 1.6.0

i have a data of 100 nodes and 165 relations to be inserted into one keyspace. My grakn image have 4 core CPU and 3 GB Memory. While i try to insert the data i am getting an error [grpc-request-handler-4] ERROR grakn.core.server.Grakn - Uncaught exception at thread [grpc-request-handler-4] java.lang.OutOfMemoryError: Java heap space. It was noticed that the image used 346 % CPU and 1.46 GB RAM only. Also a finding for the issue in log was Caused by: com.datastax.oss.driver.api.core.AllNodesFailedException: Could not reach any contact point, make sure you've provided valid addresses (showing first 1, use getErrors() for more: Node(endPoint=/127.0.0.1:9042, hostId=null, hashCode=3cb85440): io.netty.channel.ChannelException: Unable to create Channel from class class io.netty.channel.socket.nio.NioSocketChannel)
Could you please help me with this?
It sounds like Cassandra ran out of memory - currently, Grakn spawns to processes: one for Cassandra and one for Grakn server. You can increase your memory limit with the following flags (unix):
SERVER_JAVAOPTS=-Xms1G STORAGE_JAVAOPTS=-Xms2G ./grakn server start
this would give the server 1GB, and the storage engine (cassandra) 2gb of memory, for instance. 3 GB may be a bit on the low end once your data grows so keep these flags in mind :)

cassandra, secondary indexes building during adding of a new node lasts forever

I'm trying to add new node to our cluster (cassandra 2.1.11, 16 nodes, 32Gb ram, 2x3Tb hdd, 8core cpu, 1 datacenter, 2 racks, about 700Gb of data on each node). After start of new node, data (approx 600Gb total) from 16 existing nodes successfully transfered to new node and building of secondary indexes starts. The process of secondary indexes building looks normal, i see info about successfull completition of some secondary indexes building and some stream tasks:
INFO [StreamReceiveTask:9] 2015-11-22 02:15:23,153 StreamResultFuture.java:180 - [Stream #856adc90-8ddd-11e5-a4be-69bddd44a709] Session with /192.168.21.66 is complete
INFO [StreamReceiveTask:9] 2015-11-22 02:15:23,152 SecondaryIndexManager.java:174 - Index build of [docs.docs_ex_pl_ph_idx, docs.docs_lo_pl_ph_idx, docs.docs_author_login_idx, docs.docs_author_extid_idx, docs.docs_url_idx] complete
Curently 9 out of 16 streams successfully finished, according to logs. Everything looks fine, except one issue: this process already lasts 5 full days. There is no errors in logs, no anything suspicious, except extremely slow progress.
nodetool compactionstats -H
shows
Secondary index build ... docs 882,4 MB 1,69 GB bytes 51,14%
So there is some process of index building and it has some progress, but very slow, 1% in half a hour or so.
The only significant difference between the new node and any of existing nodes is the fact that cassandra java process has 21k open files, in contrast of 300 open files on any existing node, and 80k files in the data dir on new node in contrast of 300-500 files in the data dir on any existing node.
Is it normal? At this speed it looks i'll spend 16 weeks or so to add 16 more nodes.
I know this is an old question, but we ran into this exact issue with 2.1.13 using DTCS. We were able to fix it in our test environment by increasing memtable flush thresholds to 0.7 - which didn't make any sense to us, but may be worth trying.

Memory management scenario with MongoDB & Node.JS

I'm implementing a medium scale marketing e-commerce affiliation site, which has following estimates,
Total Size of Data: 5 - 10 GB
Indexes on Data: 1 GB approx (which I wanted to be in memory)
Disk Size (fast I/O): 20-25 GB
Memory: 2 GB
App development: node.js
Working set estimation of Query: Average 1-2 KB, Maximum 20-30 KB of text base article
I'm trying to understand whether MongoDB would be right choice for database or not. Index is going to be fairly downsize of Memory but I have noticed that after querying that MongoDB, it has occupied the memory (size of result set) for caching query. In 8 hours I'm expecting that all queries' depth would cover almost 95% of data, in that scenario how will MongoDB manage limited memory scenario also app instance of node.js running on same server.
Would a MongoDB a right choice for this scenario or I should go for other JSON based no-SQL Databases.

Timeout cassandra hector

i've started working with cassandra. Therefore I’ve download cassandra (1.1.1) to my windows pc and started it. Everything works fine.
Thus I began to reimplement a old application (in java using hector 1.1) which imports about 200.000.000 for 4 tables, which should insertet into 4 columnfamilies. After importing about 2.000.000 records I get an timeout exception and cassandra doesn't response on requests:
2012-07-03 15:35:43,299 WARN - Could not fullfill request on this host CassandraClient<localhost:9160-16>
2012-07-03 15:35:43,300 WARN - Exception: me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
....
Caused by: TimedOutException()
at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20269)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:922)
at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:908)
at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246)
at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243)
at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
The last entries inside the logfile are:
INFO 15:35:31,678 Writing Memtable-cf2#678837311(7447722/53551072 serialized/live bytes, 262236 ops)
INFO 15:35:32,810 Completed flushing \var\lib\cassandra\data\keySpaceName\cf2\keySpaceName-cf2-hd-205-Data.db (3292685 bytes) for commitlog position ReplayPosition(segmentId=109596147695328, position=131717208)
INFO 15:35:33,282 Compacted to [\var\lib\cassandra\data\keySpaceName\cf3\keySpaceName-cf3-hd-29-Data.db,]. 33.992.615 to 30.224.481 (~88% of original) bytes for 282.032 keys at 1,378099MB/s. Time: 20.916ms.
INFO 15:35:33,286 Compacting [SSTableReader(path='\var\lib\cassandra\data\keySpaceName\cf4\keySpaceName-cf4-hd-8-Data.db'), SSTableReader(path='\var\lib\cassandra\data\keySpaceName\cf4\keySpaceName-cf4-hd-6-Data.db'), SSTableReader(path='\var\lib\cassandra\data\keySpaceName\cf4\keySpaceName-cf4-hd-7-Data.db'), SSTableReader(path='\var\lib\cassandra\data\keySpaceName\cf4\keySpaceName-cf4-hd-5-Data.db')]
INFO 15:35:34,871 Compacted to [\var\lib\cassandra\data\keySpaceName\cf4\keySpaceName-cf4-hd-9-Data.db,]. 4.249.270 to 2.471.543 (~58% of original) bytes for 30.270 keys at 1,489916MB/s. Time: 1.582ms.
INFO 15:35:41,858 Compacted to [\var\lib\cassandra\data\keySpaceName\cf2\keySpaceName-cf2-hd-204-Data.db,]. 48.868.818 to 24.033.164 (~49% of original) bytes for 135.367 keys at 2,019011MB/s. Time: 11.352ms.
I created 4 column families like following:
ColumnFamilyDefinition cf1 = HFactory.createColumnFamilyDefinition(
“keyspacename”,
“cf1”,
ComparatorType.ASCIITYPE);
The column families have following column count:
16 columns
14 columns
7 colmuns
5 columns
The keyspace is created with replication factor 1 and default strategy (simple)
I insert the records (rows) with 'Mutator#AddInsertion'
Any advice avoiding this exception?
Regards
WM
That exception is basically Cassandra saying that it's far enough behind on mutations that it won't complete your requests before they time out. Assuming your PC isn't a beast, you should probably throttle your requests. I suggest sleeping for a while after catching that exception and then retrying; there's no harm in accidentally writing the same row twice, and Cassandra should catch up on write pretty quickly.
If you were in a production environment, I would look more closely at other reasons why the node might be performing poorly.

Resources