MySQL Insert operation are slow on Linux RDS server - linux

please check the workbench output after 20 mints its writing only 100-200 records persecond before that its writing 1211-2000 writes per second I am trying to insert 2 million records on the MySQL 5.7.10 RDS server, Its taking almost 40 mints to insert the data on Linux environment where as the same data are inserted in 28 mints on Windows platform.
On Linux I am using SSD disk still its taking long time to insert.
My hardware configuration is:-
SSD Disk
RAM:- 122 GB
CPU:- 16 Cores
My MySQL configuration is:-
innodb_buffer_pool_size=80 G
innodb_log_file_size= 1G
innodb_log_buffer_size= 64MB
innodb_buffer_pool_instances = 28
tmp_table_size =4G
max_heap_size= 4G
table_open_cache=32262
innodb_file_per_table=ON
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O-DIRECT
Team please check and help on this.
Thanks in advance.

tmp_table_size =4G -- lower to 1G
max_heap_size= 4G -- lower to 1G
table_open_cache=32262 -- lower to, say, 1000
What filesystem (xfs, ext4, etc)? RAID?
Please show us the insert command(s). Where is the source data coming from (same drive, different machine, etc)?
More
Batch the INSERTs -- but BEGIN and COMMIT around 100-1000 rows at a time.

There is a single line setting that can resolve the issue:
innodb_flush_log_at_trx_commit = 2

Related

PostgreSQL 12 - large cache files in "base" directory

I have noticed that my small 8MB database is cached by PostgreSQL 12 in this folder:
/var/lib/postgresql/12/main/base/16384
This folder contains multiple 1GB files for example "16417" that weighs 1073741824 bytes.
How to remove these files and limit cache file space to max 100 GB? Now it uses as much space as it can and crashes my disk (no space left).
In postgresql.conf file i have changed these options:
temp_file_limit = 10000000
shared_buffers = 128MB
checkpoint_timeout = 12h
max_wal_size = 100MB
min_wal_size = 80MB
but unfortunately it did not help.
What else can I do to resolve this issue? In one our these files grew up to 80 GB...
EDIT: This issue occurs even with default settings. My system is Ubuntu 18.04.4.
This is not a cache, these are the actual tables and indexes. If you mess with these files, you will break your database and lose data.
Figure out what database 16384 is:
SELECT datname FROM pg_database WHERE oid = 16384;
Then connect to that database and figure out what 16417 is:
SELECT relname, relnamespace::regnamespace, relkind
FROM pg_class WHERE relfilenode = 16417;
If the size of that object is bigger than it should be, perhaps you have a bloated table or index, and VACUUM (FULL) tab can make it smaller (but don't forget that the table is inaccessible while it is rewritten!).
Again, make sure you don't manipulate any of those files yourself.

AWS Sagemaker Kernel appears to have died and restarts

I am getting a kernel error while trying to retrieve the data from an API that includes 100 pages. The data size is huge but the code runs well when executed on Google Colab or on local machine.
The error I see in a window is-
Kernel Restarting
The kernel appears to have died. It will restart automatically.
I am using an ml.m5.xlarge machine with a memory allocation of 1000GB and there are no pre-saved datasets in the instance. Also, the expected data size is around 60 GB split into multiple datasets of 4 GB each.
Can anyone help?
I think you could try not to load all the data into memory, or try to switch to a beefier instance type. According to https://aws.amazon.com/sagemaker/pricing/instance-types/ ml.m5.xlarge has 15GB memory.
Jun

YCSB low read throughput cassandra

The YCSB Endpoint benchmark would have you believe that Cassandra is the golden child of Nosql databases. However, recreating the results on our own boxes (8 cores with hyperthreading, 60 GB memory, 2 500 GB SSD), we are having dismal read throughput for workload b (read mostly, aka 95% read, 5% update).
The cassandra.yaml settings are exactly the same as the Endpoint settings, barring the different ip addresses, and our disk configuration (1 SSD for data, 1 for a commit log). While their throughput is ~38,000 operations per second, ours is ~16,000 regardless (relatively) of the threads/number of client nodes. I.e. one worker node with 256 threads will report ~16,000 ops/sec, while 4 nodes will each report ~4,000 ops/sec
I've set the readahead value to 8KB for the SSD data drive. I'll put the custom workload file below.
When analyzing disk io & cpu usage with iostat, it seems that the reading throughput is consistently ~200,000 KB/s, which seems to suggest that the ycsb cluster throughput should be higher (records are 100 bytes). ~25-30% of cpu seems to be under %iowait, 10-25% in use by the user.
top and nload stats are not ostensibly bottlenecked (<50% memory usage, and 10-50 Mbits/sec for a 10 Gb/s link).
# The name of the workload class to use
workload=com.yahoo.ycsb.workloads.CoreWorkload
# There is no default setting for recordcount but it is
# required to be set.
# The number of records in the table to be inserted in
# the load phase or the number of records already in the
# table before the run phase.
recordcount=2000000000
# There is no default setting for operationcount but it is
# required to be set.
# The number of operations to use during the run phase.
operationcount=9000000
# The offset of the first insertion
insertstart=0
insertcount=500000000
core_workload_insertion_retry_limit = 10
core_workload_insertion_retry_interval = 1
# The number of fields in a record
fieldcount=10
# The size of each field (in bytes)
fieldlength=10
# Should read all fields
readallfields=true
# Should write all fields on update
writeallfields=false
fieldlengthdistribution=constant
readproportion=0.95
updateproportion=0.05
insertproportion=0
readmodifywriteproportion=0
scanproportion=0
maxscanlength=1000
scanlengthdistribution=uniform
insertorder=hashed
requestdistribution=zipfian
hotspotdatafraction=0.2
hotspotopnfraction=0.8
table=usertable
measurementtype=histogram
histogram.buckets=1000
timeseries.granularity=1000
The key was increasing native_transport_max_threads in the casssandra.yaml file.
Along with the increased settings in the comment (increasing connections in ycsb client as well as concurrent read/writes in cassandra), Cassandra jumped to ~80,000 ops/sec.

High disk I/O on Cassandra nodes

Setup:
We have 3 nodes Cassandra cluster having data of around 850G on each node, we have LVM setup for Cassandra data directory (currently consisting 3 drives 800G + 100G + 100G) and have separate volume (non LVM) for cassandra_logs
Versions:
Cassandra v2.0.14.425
DSE v4.6.6-1
Issue:
After adding 3rd (100G) volume in LVM on each of the node, all the nodes went very high in disk I/O and they go down quite often, servers also become inaccessible and we need to reboot the servers, servers don't get stable and we need to reboot after every 10 - 15 mins.
Other Info:
We have DSE recommended server settings (vm.max_map_count, file descriptor) configured on all nodes
RAM on each node : 24G
CPU on each node : 6 cores / 2600MHz
Disk on each node : 1000G (Data dir) / 8G (Logs)
As I suspected, you are having throughput problems on your disk. Here's what I looked at to give you background. The nodetool tpstats output from your three nodes had these lines:
Pool Name Active Pending Completed Blocked All time blocked
FlushWriter 0 0 22 0 8
FlushWriter 0 0 80 0 6
FlushWriter 0 0 38 0 9
The column I'm concerned about is the All Time Blocked. As a ratio to completed, you have a lot of blocking. The flushwriter is responsible for flushing memtables to the disk to keep the JVM from running out of memory or creating massive GC problems. The memtable is an in-memory representation of your tables. As your nodes take more writes, they start to fill and need to be flushed. That operation is a long sequential write to disk. Bookmark that. I'll come back to it.
When flushwriters are blocked, the heap starts to fill. If they stay blocked, you will see the requests starting to queue up and eventually the node will OOM.
Compaction might be running as well. Compaction is a long sequential read of SSTables into memory and then a long sequential flush of the merge sorted results. More sequential IO.
So all these operations on disk are sequential. Not random IOPs. If your disk is not able to handle simultaneous sequential read and write, IOWait shoots up, requests get blocked and then Cassandra has a really bad day.
You mentioned you are using Ceph. I haven't seen a successful deployment of Cassandra on Ceph yet. It will hold up for a while and then tip over on sequential load. Your easiest solution in the short term is to add more nodes to spread out the load. The medium term is to find some ways to optimize your stack for sequential disk loads, but that will eventually fail. Long term is get your data on real disks and off shared storage.
I have told this to consulting clients for years when using Cassandra "If your storage has an ethernet plug, you are doing it wrong" Good rule of thumb.

Cassandra latency patterns under constant load

I've got pretty unusual latency patterns in my production setup:
the whole cluster (3 machines: 48 gig ram, 7500 rpm disk, 6 cores) shows latency spikes every 10 minutes, all machines at the same time.
See this screenshot.
I checked the logfiles and it seems as there are no compactions taking place at that time.
I've got 2k reads and 5k reads/sec. No optimizations have been made so far.
Caching is set to "ALL", hit rate for row cache is at ~0,7.
Any ideas? Is tuning memtable size an option?
Best,
Tobias

Resources