Cassandra: too many sstables - node goes down instantly - cassandra

using Cassandra 2.2.8.
I'm in situation where too many SSTables (98,000+) are created for a single table and many more for other CFs. Node keeps crashing complaining insufficient memory for jre. I've tried increasing linux nofile limit to 200K and max_heap_size to 16G but no avail!
Looking for help to know ways as how i can reduce # of SSTables (compaction?) and keep node up so to do the maintenance.
Thanks in advance!
errors:
There is insufficient memory for the Java Runtime Environment to continue.
Out of Memory Error (os_linux.cpp:2627), pid=22667, tid=139622017013504
--------------- T H R E A D ---------------
Current thread (0x00007efc78b83000): JavaThread "MemtableFlushWriter:2" daemon [_thread_in_vm, id=22726, stack(0x00007efc48b61000,0x00007efc48ba2000)]
Stack: [0x00007efc48b61000,0x00007efc48ba2000], sp=0x00007efc48b9f730, free space=249k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xab97ea] VMError::report_and_die()+0x2ba
V [libjvm.so+0x4f9dcb] report_vm_out_of_memory(char const*, int, unsigned long, VMErrorType, char const*)+0x8b
V [libjvm.so+0x91a7c3] os::Linux::commit_memory_impl(char*, unsigned long, bool)+0x103
V [libjvm.so+0x91ad19] os::pd_commit_memory(char*, unsigned long, unsigned long, bool)+0x29
V [libjvm.so+0x91502a] os::commit_memory(char*, unsigned long, unsigned long, bool)+0x2a
JRE version: Java(TM) SE Runtime Environment (8.0_65-b17) (build 1.8.0_65-b17)

I would treat this as a dead node situation:
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceNode.html
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html
After you finish the procedure the node will have way less sstables etc. The thing that is bothering me is how did you come into this situation. Can you provide some schema, insert, delete, ttl related info and describe the workload?

Related

V8 BigInt size in memory?

Is there a way to get the occupied memory size in bytes of BigInt numbers?
let a = BigInt(99999n)
console.log(a.length) // yield undefined
Thanks
V8 developer here. There is generally no way to determine the occupied memory size of an object, and BigInts are no exception. Why do you want to access it?
As far as the internal implementation in V8 is concerned, a BigInt has a small object header (currently two pointer sizes; this might change over time), and then a bit for every bit of the BigInt, rounded up to multiples of a pointer size. 99999 is a 17-bit number, so in your example let a = 99999n ("BigInt(99999n)" is superfluous!), the allocated BigInt will consume (2 + Math.ceil(17/64)) * 64 bits === 24 bytes on a 64-bit system.
It may or may not make sense to add length-related properties or methods (.bitLength?) to BigInts in the future. If you have a use case, I suggest you file an issue at https://github.com/tc39/proposal-bigint/issues so that it can be discussed.

PostgreSQL out of memory: Linux OOM killer

I am having issues with a large query, that I expect to rely on wrong configs of my postgresql.config. My setup is PostgreSQL 9.6 on Ubuntu 17.10 with 32GB RAM and 3TB HDD. The query is running pgr_dijkstraCost to create an OD-Matrix of ~10.000 points in a network of 25.000 links. Resulting table is thus expected to be very big ( ~100'000'000 rows with columns from, to, costs). However, creating simple test as select x,1 as c2,2 as c3
from generate_series(1,90000000) succeeds.
The query plan:
QUERY PLAN
--------------------------------------------------------------------------------------
Function Scan on pgr_dijkstracost (cost=393.90..403.90 rows=1000 width=24)
InitPlan 1 (returns $0)
-> Aggregate (cost=196.82..196.83 rows=1 width=32)
-> Seq Scan on building_nodes b (cost=0.00..166.85 rows=11985 width=4)
InitPlan 2 (returns $1)
-> Aggregate (cost=196.82..196.83 rows=1 width=32)
-> Seq Scan on building_nodes b_1 (cost=0.00..166.85 rows=11985 width=4)
This leads to a crash of PostgreSQL:
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
normally and possibly corrupted shared memory.
Running dmesg I could trace it down to be an Out of memory issue:
Out of memory: Kill process 5630 (postgres) score 949 or sacrifice child
[ 5322.821084] Killed process 5630 (postgres) total-vm:36365660kB,anon-rss:32344260kB, file-rss:0kB, shmem-rss:0kB
[ 5323.615761] oom_reaper: reaped process 5630 (postgres), now anon-rss:0kB,file-rss:0kB, shmem-rss:0kB
[11741.155949] postgres invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null), order=0, oom_score_adj=0
[11741.155953] postgres cpuset=/ mems_allowed=0
When running the query I also can observe with topthat my RAM is going down to 0 before the crash. The amount of committed memory just before the crash:
$grep Commit /proc/meminfo
CommitLimit: 18574304 kB
Committed_AS: 42114856 kB
I would expect the HDD is used to write/buffer temporary data, when RAM is not enough. But the available space on my hdd does not change during the processing. So I began to dig for missing configs (expecting issues due to my relocated data-directory) and following different sites:
https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT
https://www.credativ.com/credativ-blog/2010/03/postgresql-and-linux-memory-management
My original settings of postgresql.conf are default except for changes in the data-directory:
data_directory = '/hdd_data/postgresql/9.6/main'
shared_buffers = 128MB # min 128kB
#huge_pages = try # on, off, or try
#temp_buffers = 8MB # min 800kB
#max_prepared_transactions = 0 # zero disables the feature
#work_mem = 4MB # min 64kB
#maintenance_work_mem = 64MB # min 1MB
#replacement_sort_tuples = 150000 # limits use of replacement selection sort
#autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem
#max_stack_depth = 2MB # min 100kB
dynamic_shared_memory_type = posix # the default is the first option
I changed the config:
shared_buffers = 128MB
work_mem = 40MB # min 64kB
maintenance_work_mem = 64MB
Relaunched with sudo service postgresql reload and tested the same query, but found no change in behavior. Does this simply mean, such a large query can not be done? Any help appreciated.
I'm having similar trouble, but not with PostgreSQL (which is running happily): what is happening is simply that the kernel cannot allocate more RAM to the process, whichever process it is.
It would certainly help to add some swap to your configuration.
To check how much RAM and swap you have, run: free -h
On my machine, here is what it returns:
total used free shared buff/cache available
Mem: 7.7Gi 5.3Gi 928Mi 865Mi 1.5Gi 1.3Gi
Swap: 9.4Gi 7.1Gi 2.2Gi
You can clearly see that my machine is quite overloaded: about 8Gb of RAM, and 9Gb of swap, from which 7 are used.
When the RAM-hungry process got killed after Out of memory, I saw both RAM and swap being used at 100%.
So, allocating more swap may improve our problems.

increase stack size to 20 gb giving integer overflow error

I need to run a C program which will use around 20Gb of RAM while executing. I took help from Change stack size for a C++ application in Linux during compilation with GNU compiler.
I am trying to expand stack size on Linux using setrlimit. But when I try to assign 20*1024*1024*1024 to rlim_cur, the compiler
warning: integer overflow in expression [-Woverflow]
How do I expand the stack?
The calculations 20*1024*1024*1024 are performed on int constants; the result has int type. On your x86_64 platform, int is not enough to represent large numbers (greater than 231) - that's what the compiler says.
To do it right, use a type that has 64 bits. In the description of setrlimit I see that rlim_cur has the type rlim_t. So it seems natural to use this 64-bit type:
... = (rlim_t)20*1024*1024*1024

Can Valgrind show the value of the leaked memory?

I am wondering if there is a possibility in valgrind to show the value of the leaked memory, such as (NOT a real valgrind output!):
==15060== 12 bytes (***HERE***) in 1 blocks are definitely lost in loss record 1 of 1
==15060== at 0x4C2AAA4: operator new[](unsigned long) (in vgpreload_memcheck-amd64-linux.so)
==15060== by 0x5DC8236: char* allocate(unsigned long, char const*, long) (mem.h:149)
==15060== by 0x5EAC286: trim(char const*, nap_compiler const*) (file.cpp:107)
Where the ***HERE*** shows the exact value of the string that is being leaked. I've been looking all over the documentation, but found nothing. Maybe someone more familiar with the tool can point out what to do to achieve this! (I'm not afraid of compiling it myself :) )
GDB server in Valgrind version >= 3.8.0 provides the monitor command
block_list
which will output the addresses of the leaked blocks.
You can then examine the leaked memory content using GDB commands such as x.
For more information, see
http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver
and
http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.monitor-commands

How to find out the amount of free physical memory under linux (in c)

Assume I want to cache certain calculations, but provoking syncing it out to disk would incur an I/O penalty that would more than defy the whole purpose of caching.
This means, I need to be able find out, how much physical RAM is left (including cached memory, assuming I can push that out and allowing for some slack should buffering increase). I looked into /proc/meminfo and know how to read it out. I am not so sure how to combine the numbers to get what i want though. Code not necessary, once i know what I need I can code it myself.
I will not have root on the box it needs to run at, but it should be reasonably quiet otherwise. No large amount of disk I/O, no other processes claiming a lot of mem in a burst. The OS is a rather recent linux with overcommitting turned on. This will need to work without triggering the OOM killer obviously.
The Numbers don't need to be exact down to the megabyte, I assume that it'll be roughly in the 1 to 7 gib range depending on the box but getting close to about 100 mb would be great.
It'd definitely be preferable if the estimate were to err on the smallish side.
Unices have the standard sysconf() function (OpenGroups man page, Linux man page).
Using this function, you can get the total physical memory:
unsigned long long ps = sysconf(_SC_PAGESIZE);
unsigned long long pn = sysconf(_SC_AVPHYS_PAGES);
unsigned long long availMem = ps * pn;
As an alternative to the answer of H2CO3, you can read from /proc/meminfo.
For me, statfs worked well.
#include <sys/vfs.h>
struct statfs buf;
size_t available_mem;
if ( statfs( "/", &buf ) == -1 )
available_mem = 0;
else
available_mem = buf.f_bsize * buf.f_bfree;

Resources