ArangoDB-aysnchronize-log invokes msync too much - arangodb

Here is the question: why Arango sync data thousands times in one second under async mode? Is it my wrong configuration or expected behavior?
Recently I'm testing async insert of ArangoDB and MongoDB. In my test, the average latency of Arango is 2x of MongoDB. After tuning I found their IO is different. I think it's the root cause for pool async-inert perf of Arango.
Arango: Invoke msync continuously, thousand times in one second, like the following. This causes too much iowait and too much jbd2.
05:42:21.138119 msync(0x7f50fdd75000, 4096, MS_SYNC) = 0 <0.000574>
05:42:21.138843 msync(0x7f50fdd75000, 8192, MS_SYNC) = 0 <0.000558>
05:42:21.139541 msync(0x7f50fdd76000, 4096, MS_SYNC) = 0 <0.000351>
05:42:21.139928 msync(0x7f50fdd76000, , MS_SYNC) = 0 <0.000555>
05:42:21.140532 msync(0x7f50fdd77000, 4096, MS_SYNC) = 0 <0.000318>
05:42:21.141002 msync(0x7f50fdd77000, 8192, MS_SYNC) = 0 <0.000714>
05:42:21.141755 msync(0x7f50fdd78000, 4096, MS_SYNC) = 0 <0.000345>
05:42:21.142133 msync(0x7f50fdd78000, 4096, MS_SYNC) = 0 <0.000725>
Mongo: Invoke fdatasync just several times in one second.
Test Env:
All tests are in one VM: 8vCPU-24GBMem-120GBDisk-Centos6.7
It's single thread async insert test based on java driver with ycsb.
Conf for Arango:
v2.8.7
Server, scheduler, v8-cs's threads are all set to 1.
Create collection with false waitForSync, Send insert request with false waitForSync.
Start cmd:
/usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp-path /var/tmp/arangod --log.tty --supervisor --wal.sync-interval=1000
Collection propertis:
{
"doCompact" : true,
"journalSize" : 33554432,
"isSystem" : false,
"isVolatile" : false,
"waitForSync" : false,
"keyOptions" : {
"type" : "traditional",
"allowUserKeys" : true
},
"indexBuckets" : 8
}
Detailed trace log:
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef3318 - 0x7ff9beef37f2, length: 1242, wfs: false
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef37f8 - 0x7ff9beef3cd2, length: 1242, wfs: false
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef3cd8 - 0x7ff9beef41b2, length: 1242, wfs: false
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef41b8 - 0x7ff9beef4692, length: 1242, wfs: false
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef4698 - 0x7ff9beef4b72, length: 1242, wfs: false
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef4b78 - 0x7ff9beef5052, length: 1242, wfs: false
2016-04-19T06:59:36Z [12065] TRACE [arangod/Wal/SynchronizerThread.cpp:213] syncing logfile 6627612014716, region 0x7ff9beef5058 - 0x7ff9beef5532, length: 1242, wfs: false

ArangoDB as a multi model database can offer more usecases than MongoDB. While it can act as replacement, the other available features also imply different requirements to the default configuration settings and implementation details.
When you work with i.e. graphs, and want to maintain persistency in them, you can alter the probability data is actually lost by doing more frequent syncs.
ArangoDB does these syncs in another thread; When trying to reproduce your setup we found that this thread actualy does more than one would think the sync-interval configuration value in /etc/arangodb/arangod.conf:
[wal]
sync-interval=10000
We fixed this; it improves the perfromance a bit when writing locally via foxx or the arangod rescue console (which you get if you don't start it in daemon mode with the --console parameter)
However, It doesn't significantly change the performance when i.e. using arangosh to sequentially insert 10 k documents:
var time= require("internal").time;
var s = time()
db._drop('test')
db._create('test')
for (i=0; i < 100000; i++) {db.test.save({i: i})}
require("internal").print(time() -s)
In general, your numbers are similar to those in our performance comparison - so thats what was to be expected with ArangoDB 2.8.
Currently you can use the bulk import facility to reduce the overhead you get in the HTTP communication.

Related

Error generating the report: org.apache.jmeter.report.core.SampleException: Could not read metadata

I'm trying to run jmeter load testing scripts in non GUI mode to generate HTML report with below command
./jmeter.sh -n -t "/home/dsbloadtest/DSB_New_21_01_2022/apache-jmeter-5.4.3/dsb_test_plans/SERVICE_BOOKING.jmx" -l /home/dsbloadtest/DSB_New_21_01_2022/apache-jmeter-5.4.3/dsb_test_results/testresults.csv -e -o /home/dsbloadtest/DSB_New_21_01_2022/apache-jmeter-5.4.3/dsb_test_results/HTMLReports
It was working fine, but now not getting the result as im getting as below
summary = 0 in 00:00:00 = \*\*\*\*\*\*/s Avg: 0 Min: 9223372036854775807 Max: -9223372036854775808 Err: 0 (0.00%)
Tidying up ... # Fri Apr 01 11:22:40 IST 2022 (1648792360414)
Error generating the report: org.apache.jmeter.report.core.SampleException: Could not read metadata !
... end of run
I have tried to generate HTML report in J meter non GUI mode.
summary = 0 in 00:00:00 = ******/s Avg: 0 Min: 9223372036854775807 Max: -9223372036854775808 Err: 0 (0.00%)
it means that JMeter didn't execute any Sampler, your testresults.csv is empty and you don't have any data to generate the dashboard from.
The reason for test failure normally can be figured out from jmeter.log file, the most common mistakes are:
the file referenced in the CSV Data Set Config doesn't exist
the JMeter Plugins used in the test are not installed for this particular JMeter instance

Unable to use preloaded uWSGI cheaper algorithm

I'm unable to use uWSGI's busyness cheaper algorithm although it appears to be pre-loaded in my installation. Does it still merit an explicit install?
Is so, where can I download the standalone plugin package from?
Any help is appreciated, thank you.
uWSGI Information
uwsgi --version
2.0.19.1
uwsgi --cheaper-algos-list
*** uWSGI loaded cheaper algorithms ***
busyness
spare
backlog
manual
--- end of cheaper algorithms list ---
uWSGI Configuration File
[uwsgi]
module = myapp:app
socket = /path/to/myapp.sock
stats = /path/to/mystats.sock
chmod-socket = 766
socket-timeout = 60 ; Set internal sockets timeout
logto = /path/to/logs/%n.log
log-maxsize = 5000000 ; Max size before rotating file
disable-logging = true ; Disable built-in logging
log-4xx = true ; But log 4xx
log-5xx = true ; And 5xx
strict = true ; Enable strict mode (placeholder cannot be used)
master = true ; Enable master process
enable-threads = true ; Enable threads
vacuum = true ; Delete sockets during shutdown
single-interpreter = true ; Do not use multiple interpreters (single web app per uWSGI process)
die-on-term = true ; Shutdown when receiving SIGTERM (default is respawn)
need-app = true ; Exit if no app can be loaded
harakiri = 300 ; Forcefully kill hung workers after desired time in seconds
max-requests = 1000 ; Restart workers after this many requests
max-worker-lifetime = 3600 ; Restart workers after this many seconds
reload-on-rss = 1024 ; Restart workers after this much resident memory (this is per worker)
worker-reload-mercy = 60 ; How long to wait for workers to reload before forcefully killing them
cheaper-algo = busyness ; Specify the cheaper algorithm here
processes = 16 ; Maximum number of workers allowed
threads = 4 ; Number of threads per worker allowed
thunder-lock = true ; Specify thunderlock activation
cheaper = 8 ; Number of workers to keep idle
cheaper-initial = 8 ; Workers created at startup
cheaper-step = 4 ; Number of workers to spawn at once
cheaper-overload = 30 ; Check the busyness of the workers at this interval (in seconds)
uWSGI Log
*** Operational MODE: preforking+threaded ***
WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x15e8f10 pid: 14479 (default app)
spawned uWSGI master process (pid: 14479)
spawned uWSGI worker 1 (pid: 14511, cores: 4)
spawned uWSGI worker 2 (pid: 14512, cores: 4)
spawned uWSGI worker 3 (pid: 14516, cores: 4)
spawned uWSGI worker 4 (pid: 14520, cores: 4)
spawned uWSGI worker 5 (pid: 14524, cores: 4)
spawned uWSGI worker 6 (pid: 14528, cores: 4)
spawned uWSGI worker 7 (pid: 14529, cores: 4)
spawned uWSGI worker 8 (pid: 14533, cores: 4)
THIS LINE --> unable to find requested cheaper algorithm, falling back to spare <-- THIS LINE
OS Information
Red Hat Enterprise Linux Server release 7.7 (Maipo)
Other Details
uWSGI was installed using pip
The comments were messing things up for me - the below format fixed the issue
########################################################
# #
# Cheaper Algo and Worker Count #
# #
########################################################
cheaper-algo = busyness
processes = 16
threads = 4
thunder-lock = true
cheaper = 8
cheaper-initial = 8
cheaper-step = 4
cheaper-overload = 30

What does -1000 mean in spark exit status

I'm doing something with Spark-SQL and got error below:
YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to
remove executor 1 for reason Container marked as failed:
container_1568946404896_0002_02_000002 on host: worker1. Exit status:
-1000. Diagnostics: [2019-09-20 10:43:11.474]Task java.util.concurrent.ExecutorCompletionService$QueueingFuture#76430b7c
rejected from
org.apache.hadoop.util.concurrent.HadoopThreadPoolExecutor#16970b[Terminated,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks =
1]
I'm trying to figure it out by checking the meaning of Exit status: 1000, however, no valuable info returned by googling.
According to this thread, the -1000 is not even mentioned.
Any comment is welcomed, thanks.

Mysql seconds_behind master very high

Hi we have mysql master slave replication, master is mysql 5.6 and slave is mysql 5.7, seconds behind master is 245000, how I make it catch up faster. Right now it is taking more than 6 hours to copy 100 000 seconds.
My slave ram is 128 GB. Below is my my.cnf
[mysqld]
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
innodb_buffer_pool_size = 110G
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
# These are commonly set, remove the # and set as required.
basedir = /usr/local/mysql
datadir = /disk1/mysqldata
port = 3306
#server_id = 3
socket = /var/run/mysqld/mysqld.sock
user=mysql
log_error = /var/log/mysql/error.log
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
join_buffer_size = 256M
sort_buffer_size = 128M
read_rnd_buffer_size = 2M
#copied from old config
#key_buffer = 16M
max_allowed_packet = 256M
thread_stack = 192K
thread_cache_size = 8
query_cache_limit = 1M
#disabling query_cache_size and type, for replication purpose, need to enable it when going live
query_cache_size = 0
#query_cache_size = 64M
#query_cache_type = 1
query_cache_type = OFF
#GroupBy
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
#sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
enforce-gtid-consistency
gtid-mode = ON
log_slave_updates=0
slave_transaction_retries = 100
#replication related changes
server-id = 2
relay-log = /disk1/mysqllog/mysql-relay-bin.log
log_bin = /disk1/mysqllog/binlog/mysql-bin.log
binlog_do_db = brandmanagement
#replicate_wild_do_table=brandmanagement.%
replicate-wild-ignore-table=brandmanagement.t\_gnip\_data\_recent
replicate-wild-ignore-table=brandmanagement.t\_gnip\_data
replicate-wild-ignore-table=brandmanagement.t\_fb\_rt\_data
replicate-wild-ignore-table=brandmanagement.t\_keyword\_tweets
replicate-wild-ignore-table=brandmanagement.t\_gnip\_data\_old
replicate-wild-ignore-table=brandmanagement.t\_gnip\_data\_new
binlog_format=row
report-host=10.125.133.220
report-port=3306
#sync-master-info=1
read-only=1
net_read_timeout = 7200
net_write_timeout = 7200
innodb_flush_log_at_trx_commit = 2
sync_binlog=0
sync_relay_log_info=0
max_relay_log_size=268435456
Lots of possible solutions. But I'll go with the simplest one. Have you got enough network bandwidth to send all changes over the network? You're using "row" binlog, which may be good in case of random, unindexed updates. But if you're changing a lot of data using indexes only, then "mixed" binlog may be better.

DocumentDB performance issues

When running from DocumentDB queries from C# code on my local computer a simple DocumentDB query takes about 0.5 seconds in average. Another example, getting a reference to a document collection takes about 0.7 seconds in average. Is this to be expected? Below is my code for checking if a collection exists, it is pretty straight forward - but is there any way of improving the bad performance?
// Create a new instance of the DocumentClient
var client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey);
// Get the database with the id=FamilyRegistry
var database = client.CreateDatabaseQuery().Where(db => db.Id == "FamilyRegistry").AsEnumerable().FirstOrDefault();
var stopWatch = new Stopwatch();
stopWatch.Start();
// Get the document collection with the id=FamilyCollection
var documentCollection = client.CreateDocumentCollectionQuery("dbs/"
+ database.Id).Where(c => c.Id == "FamilyCollection").AsEnumerable().FirstOrDefault();
stopWatch.Stop();
// Get the elapsed time as a TimeSpan value.
var ts = stopWatch.Elapsed;
// Format and display the TimeSpan value.
var elapsedTime = String.Format("{0:00} seconds, {1:00} milliseconds",
ts.Seconds,
ts.Milliseconds );
Console.WriteLine("Time taken to get a document collection: " + elapsedTime);
Console.ReadKey();
Average output on local computer:
Time taken to get a document collection: 0 seconds, 752 milliseconds
In another piece of my code I'm doing 20 small document updates that are about 400 bytes each in JSON size and it still takes 12 seconds in total. I'm only running from my development environment but I was expecting better performance.
In short, this can be done end to end in ~9 milliseconds with DocumentDB. I'll walk through the changes required, and why/how they impact results below.
The very first query always takes longer in DocumentDB because it does some setup work (fetching physical addresses of DocumentDB partitions). The next couple requests take a little bit longer to warm the connection pools. The subsequent queries will be as fast as your network (the latency of reads in DocumentDB is very low due to SSD storage).
For example, if you modify your code above to measure, for example 10 readings instead of just the first one like shown below:
using (DocumentClient client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey))
{
long totalRequests = 10;
var database = client.CreateDatabaseQuery().Where(db => db.Id == "FamilyRegistry").AsEnumerable().FirstOrDefault();
Stopwatch watch = new Stopwatch();
for (int i = 0; i < totalRequests; i++)
{
watch.Start();
var documentCollection = client.CreateDocumentCollectionQuery("dbs/"+ database.Id)
.Where(c => c.Id == "FamilyCollection").AsEnumerable().FirstOrDefault();
Console.WriteLine("Finished read {0} in {1}ms ", i, watch.ElapsedMilliseconds);
watch.Reset();
}
}
Console.ReadKey();
I get the following results running from my desktop in Redmond against the Azure West US data center, i.e. about 50 milliseconds. These numbers may vary based on the network connectivity and distance of your client from the Azure DC hosting DocumentDB:
Finished read 0 in 217ms
Finished read 1 in 46ms
Finished read 2 in 51ms
Finished read 3 in 47ms
Finished read 4 in 46ms
Finished read 5 in 93ms
Finished read 6 in 48ms
Finished read 7 in 45ms
Finished read 8 in 45ms
Finished read 9 in 51ms
Next, I switch to Direct/TCP connectivity from the default of Gateway to improve the latency from two hops to one, i.e., change the initialization code to:
using (DocumentClient client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey, new ConnectionPolicy { ConnectionMode = ConnectionMode.Direct, ConnectionProtocol = Protocol.Tcp }))
Now the operation to find the collection by ID completes within 23 milliseconds:
Finished read 0 in 197ms
Finished read 1 in 117ms
Finished read 2 in 23ms
Finished read 3 in 23ms
Finished read 4 in 25ms
Finished read 5 in 23ms
Finished read 6 in 31ms
Finished read 7 in 23ms
Finished read 8 in 23ms
Finished read 9 in 23ms
How about when you run the same results from an Azure VM or Worker Role also running in the same Azure DC? The same operation completes with about 9 milliseconds!
Finished read 0 in 140ms
Finished read 1 in 10ms
Finished read 2 in 8ms
Finished read 3 in 9ms
Finished read 4 in 9ms
Finished read 5 in 9ms
Finished read 6 in 9ms
Finished read 7 in 9ms
Finished read 8 in 10ms
Finished read 9 in 8ms
Finished read 9 in 9ms
So, to summarize:
For performance measurements, please allow for a few measurement samples to account for startup/initialization of the DocumentDB client.
Please use TCP/Direct connectivity for lowest latency.
When possible, run within the same Azure region.
If you follow these steps, you can get great performance and you'll be able to get the best performance numbers with DocumentDB.

Resources