AWS Sagemaker Kernel appears to have died and restarts

AWS Sagemaker Kernel appears to have died and restarts - python-3.x

I am getting a kernel error while trying to retrieve the data from an API that includes 100 pages. The data size is huge but the code runs well when executed on Google Colab or on local machine.
The error I see in a window is-
Kernel Restarting
The kernel appears to have died. It will restart automatically.
I am using an ml.m5.xlarge machine with a memory allocation of 1000GB and there are no pre-saved datasets in the instance. Also, the expected data size is around 60 GB split into multiple datasets of 4 GB each.
Can anyone help?

I think you could try not to load all the data into memory, or try to switch to a beefier instance type. According to https://aws.amazon.com/sagemaker/pricing/instance-types/ ml.m5.xlarge has 15GB memory.
Jun

Related

Hazelcast causing heavy traffic between nodes

NOTE: Found the root cause in application code using hazelcast which started to execute after 15 min, the code retrieved almost entire data, so the issue NOT in hazelcast, leaving the question here if anyone will see same side effect or wrong code.
What can cause heavy traffic between Hazelcast (v3.12.12, also tried 4.1.1) 2 nodes ?
It holds maps with lot of data, no new entries are added/removed within that time, only map values are updated.
Java 11, Memory usage 1.5GB out of 12GB, no full GCs identified.
Following JFR the high IO is from:
com.hazelcast.internal.networking.nio.NioThread.processTaskQueue()
Below chart of Network IO, 15 min after start traffic jumps from 15 to 60 MB. From application perspective nothing changed after these 15 min.

This smells garbage collection, you are most likely to be running into long gc pauses. Check your gc logs, which you can enable using verbose gc settings on all members. If there are back-to-back GCs then you should do various things:
increase the heap size
tune your gc, I'd look into G1 (with -XX:MaxGCPauseMillis set to a reasonable number) and/or ZGC.

Java Heap Space issue in Grakn 1.6.0

i have a data of 100 nodes and 165 relations to be inserted into one keyspace. My grakn image have 4 core CPU and 3 GB Memory. While i try to insert the data i am getting an error [grpc-request-handler-4] ERROR grakn.core.server.Grakn - Uncaught exception at thread [grpc-request-handler-4] java.lang.OutOfMemoryError: Java heap space. It was noticed that the image used 346 % CPU and 1.46 GB RAM only. Also a finding for the issue in log was Caused by: com.datastax.oss.driver.api.core.AllNodesFailedException: Could not reach any contact point, make sure you've provided valid addresses (showing first 1, use getErrors() for more: Node(endPoint=/127.0.0.1:9042, hostId=null, hashCode=3cb85440): io.netty.channel.ChannelException: Unable to create Channel from class class io.netty.channel.socket.nio.NioSocketChannel)
Could you please help me with this?

It sounds like Cassandra ran out of memory - currently, Grakn spawns to processes: one for Cassandra and one for Grakn server. You can increase your memory limit with the following flags (unix):
SERVER_JAVAOPTS=-Xms1G STORAGE_JAVAOPTS=-Xms2G ./grakn server start
this would give the server 1GB, and the storage engine (cassandra) 2gb of memory, for instance. 3 GB may be a bit on the low end once your data grows so keep these flags in mind :)

MySQL Insert operation are slow on Linux RDS server

please check the workbench output after 20 mints its writing only 100-200 records persecond before that its writing 1211-2000 writes per second I am trying to insert 2 million records on the MySQL 5.7.10 RDS server, Its taking almost 40 mints to insert the data on Linux environment where as the same data are inserted in 28 mints on Windows platform.
On Linux I am using SSD disk still its taking long time to insert.
My hardware configuration is:-
SSD Disk
RAM:- 122 GB
CPU:- 16 Cores
My MySQL configuration is:-
innodb_buffer_pool_size=80 G
innodb_log_file_size= 1G
innodb_log_buffer_size= 64MB
innodb_buffer_pool_instances = 28
tmp_table_size =4G
max_heap_size= 4G
table_open_cache=32262
innodb_file_per_table=ON
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O-DIRECT
Team please check and help on this.
Thanks in advance.

tmp_table_size =4G -- lower to 1G
max_heap_size= 4G -- lower to 1G
table_open_cache=32262 -- lower to, say, 1000
What filesystem (xfs, ext4, etc)? RAID?
Please show us the insert command(s). Where is the source data coming from (same drive, different machine, etc)?
More
Batch the INSERTs -- but BEGIN and COMMIT around 100-1000 rows at a time.

There is a single line setting that can resolve the issue:
innodb_flush_log_at_trx_commit = 2

Couchdb views crashing for large documents

Couchdb keeps crashing whenever I try to build the index of the views of a design document emitting values for large documents. The total size of the database is 40 MB and I guess the documents are about 5 MB each. We're talking about large JSON without any attachment.
What concerns me is that I have 2.5 GB of free ram before trying to access the views but as soon as I try to access them, the CPU usage raises to 99% and all the free RAM gets eaten by erl.exe before the indexing fails with exit code 1.
Here is the log:
[info] 2016-11-22T22:07:52.263000Z couchdb#localhost <0.212.0> -------- couch_proc_manager <0.15603.334> died normal
[error] 2016-11-22T22:07:52.264000Z couchdb#localhost <0.15409.334> b9855eea74 rexi_server throw:{os_process_error,{exit_status,1}} [{couch_mrview_util,get_view,4,[{file,"src/couch_mrview_util.erl"},{line,56}]},{couch_mrview,query_view,6,[{file,"src/couch_mrview.erl"},{line,244}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
Views skipping these documents can be accessed without issue. Which general guidelines could you provide me to help with this kind of situation? I am using couchdb 2.0 on windows.
Many thanks
Update : I tried to limit the number of view server instances to 1 and vary the max RAM allowed for couchjs, but it keeps crashing. Also I noticed that even though CouchDb is supposed to pass only one document at a time to the view server, erl.exe keeps eating all the available RAM (3GB used for three 5mb docs to update...). Initially I thought this could be because of the multiple couchjs instances but apparently this isn't the case.
Update : Made some progress, now it looks like the indexing is progressing well for just less than 10 minutes then erl.exe crashes. I have posted the dump here (just to clarify "well" means, 99% CPU usage and computer screen completely frozen).

Solr Indexing Time

Solr 1.4 is doing great with respect to Indexing on a dedicated physical server (Windows Server 2008). For Indexing around 1 million full text documents (around 4 GB size) it takes around 20 minutes with Heap Size = 512M - 1G & 4GB RAM.
However while using Solr on a VM, with 4 GB RAM it took 50 minutes to index at the first time. Note that there is no Network delays and no RAM issues. Now when I increased the RAM to 8GB and increased the heap size, the indexing time increased to 2 hrs. That was really strange. Note that except for SQL Server there is no other process running. There are no network delays. However I have not checked for File I/O. Can that be a bottleneck? Does Solr has any issues running in "Virtualization" Environment?
I read a paper today by Brian & Harry: "ON THE RESPONSE TIME OF A SOLR SEARCH ENGINE IN A VIRTUALIZED ENVIRONMENT" & they claim that performance gets deteriorated when RAM is increased when Solr is running on a VM but that is with respect to query times and not indexing times.
I am bit confused as to why it took longer on a VM when I repeated the same test second time with increased heap size and RAM.

I/O on a VM will always be slower than on dedicated hardware. This is because the disk is virtualized and I/O operations must pass through an extra abstraction layer. Indexing requires intensive I/O operations, so it's not surprising that it runs more slowly on a VM. I don't know why adding RAM causes a slowdown though.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

AWS Sagemaker Kernel appears to have died and restarts - python-3.x

I think you could try not to load all the data into memory, or try to switch to a beefier instance type. According to https://aws.amazon.com/sagemaker/pricing/instance-types/ ml.m5.xlarge has 15GB memory. Jun

Related

Hazelcast causing heavy traffic between nodes

Java Heap Space issue in Grakn 1.6.0

MySQL Insert operation are slow on Linux RDS server

Couchdb views crashing for large documents

Solr Indexing Time

Categories

Resources