MemSQL speed test VS dbbench speed test - singlestore

I'm running MemSQL speed test which shows 638K inserts per second (in the web UI).
Likewise I'm running dbbench benchmark tool in order to simulate the same load, however I get throughput of only 20K inserts per sec. (dbbench is run on the same machine as MemSQL)
I'm confused, is there something that I'm missing?
Here is how I run dbbench:
dbbench --host=127.0.0.1 --port=3306 dbbench.ini*
dbbench.ini:
[setup]
query=CREATE DATABASE IF NOT EXISTS speed_test
query=CREATE TABLE IF NOT EXISTS speed_test.tbl (id INT AUTO_INCREMENT PRIMARY KEY, val INT)
[teardown]
query=DROP DATABASE speed_test
[inserts]
query=insert into speed_test.tbl (val) values(5)
concurrency=10

Running single-value inserts will be much slower than multi-inserts, and concurrency 10 is much too low to saturate the cluster as well. The bottleneck for your workload in the dbbench config here will be in all these round-trips which are doing very little work in the database.
Try:
batching more values into each insert statement
increasing the concurrency

Related

Postgres CPU utilisation shot up. Any insights for my case?

My postgres instance CPU utilisation has shot up recently. I'm trying to identify the root cause. I will add the details below.
My postgres database instance running on GCP has the following configuration:
v PostgreSQL 9.6
vCPUs-1
Memory-3.75 GB
SSD storage-15 GB
I'm running 5 databases in the above DB server which are connected with a nodejs app.
I use sequelize as my ORM and recently upgraded my sequeliz from 4.6.x to 5.8.6".
Before this upgrade the CPU utilization would usually remain less than 20 percent. But after the upgrade, I see a lot of fluctuation in CPU utilization graph. And it hits 100 percent too often as well. Also, when it hits 100%, my services start wont work as expected ( because they cant interact with the DB).
I tried running this query .
SELECT "usesysid", "backend_start", "xact_start","query_start", "state_change", "state", "query" FROM pg_stat_activity ORDER BY "query_start" DESC
And, it returns the following:
But I'm not sure if this info is enough for me to find out which query could be causing this issue.
I also ran this query:
SELECT max(now() - xact_start) FROM pg_stat_activity WHERE state IN ('idle in transaction', 'active');
and it returns max = 1 day 01:42:10.987635. I think this is something alarming, but i dont know how to put this info to use.
Another thing which i think is worth mentioning is, I have started using sequelize's bulk update.
Its syntax is something like this:
Model.bulkCreate(scalesToUpdate, {
updateOnDuplicate: [
'field1',
'field2'
],
})
And, this gets translated into SQL like below:
INSERT INTO "mymodel" ("id","field1","field2","field3","field4","field5","field6","field7") VALUES (') ON CONFLICT ("id") DO UPDATE SET "field3"=EXCLUDED."field3","field4"=EXCLUDED."field4","field6"=EXCLUDED."field6","field7"=EXCLUDED."field7"
And, this query gets fired 5 times per second. Could this be the culprit?
Any insight into this is highly appreciable.
You could try the next things:
Increase ht machine type to have one core more having vCPUs= 2
It might be that sequelize 5.8.6 requires more resources than the old version, you could try to install one of the tools and run it, run the queries that you typed, to review which query has more resource usage.
If you have that query running 5 times per second, that could be using more resources. Test using one of the tools in order to be able to have a better approach.

How does Cassandra stress test determine threadcount?

I ran a Cassandra stress-test and the output came back to between 4 and 913 threadcounts. What causes Cassandra to increase and stop the threadcount?
When you use Cassandra Stress, I see these tests
First, Cassandra starts with a small amount of thread and displays the result, then it also raises the thread until (This number seems to depend on the cluster that Cassandra has attached to stress and allowed to connect. Given this parameter, thread are counted.)ends the test.
And in the end, the results of all the tests with the number of thread they used in testing them in
As you can see above, the system I tested on was able to run 32 threads and the test was completed with the same amount and the results were displayed.

Inserting 1000 rows into Azure Database takes 13 seconds?

Can anyone please tell me why it might be taking 12+ seconds to insert 1000 rows into a SQL database hosted on Azure? I'm just getting started with Azure, and this is (obviously) absurd...
Create Table xyz (ID int primary key identity(1,1), FirstName varchar(20))
GO
create procedure InsertSomeRows as
set nocount on
Declare #StartTime datetime = getdate()
Declare #x int = 0;
While #X < 1000
Begin
insert into xyz (FirstName) select 'john'
Set #X = #X+1;
End
Select count(*) as Rows, DateDiff(SECOND, #StartTime, GetDate()) as SecondsPassed
from xyz
GO
Exec InsertSomeRows
Exec InsertSomeRows
Exec InsertSomeRows
GO
Drop Table xyz
Drop Procedure InsertSomeRows
Output:
Rows SecondsPassed
----------- -------------
1000 11
Rows SecondsPassed
----------- -------------
2000 13
Rows SecondsPassed
----------- -------------
3000 14
It's likely the performance tier you are on that is causing this. With a Standard S0 tier you only have 10 DTUs (Database throughput units). If you haven't already, read up on the SQL Database Service Tiers. If you aren't familiar with DTUs it is a bit of a shift from on-premises SQL Server. The amount of CPU, Memory, Log IO and Data IO are all wrapped up in which service tier you select. Just like on premises if you start to hit the upper bounds of what your machine can handle things slow down, start to queue up and eventually start timing out.
Run your test again just as you have been doing, but then use the Azure Portal to watch the DTU % used while the test is underway. If you see that the DTU% is getting maxed out then the issue is that you've chosen a service tier that doesn't have enough resources to handle you've applied without slowing down. If the speed isn't acceptable, then move up to the next service tier until the speed is acceptable. You pay more for more performance.
I'd recommend not paying too close attention to the service tier based on this test, but rather on the actual load you want to apply to the production system. This test will give you an idea and a better understanding of DTUs, but it may or may not represent the actual throughput you need for your production loads (which could be even heavier!).
Don't forget that in Azure SQL DB you can also scale your Database as needed so that you have the performance you need but can then back down during times you don't. The database will be accessible during most of the scaling operations (though note it can take a time to do the scaling operation and there may be a second or two of not being able to connect).
Two factors made the biggest difference. First, I wrapped all the inserts into a single transaction. That got me from 100 inserts per second to about 2500. Then I upgraded the server to a PREMIUM P4 tier and now I can insert 25,000 per second (inside a transaction.)
It's going to take some getting used to using an Azure server and what best practices give me the results I need.
My theory: Each insert is one log IO. Here, this would be 100 IOs/sec. That sounds like a reasonable limit on an S0. Can you try with a transaction wrapped around the inserts?
So wrapping the inserts in a single transaction did indeed speed this up. Inside the transaction it can insert about 2500 rows per second
So that explains it. Now the results are no longer catastrophic. I would now advise looking at metrics such as the Azure dashboard DTU utilization and wait stats. If you post them here I'll take a look.
one way to improve performance ,is to look at Wait Stats of the query
Looking at Wait stats,will give you exact bottle neck when a query is running..In your case ,it turned out to be LOGIO..Look here to know more about this approach :SQL Server Performance Tuning Using Wait Statistics
Also i recommend changing while loop to some thing set based,if this query is not a Psuedo query and you are running this very often
Set based solution:
create proc usp_test
(
#n int
)
Begin
begin try
begin tran
insert into yourtable
select n ,'John' from
numbers
where n<#n
commit
begin catch
--catch errors
end catch
end try
end
You will have to create numbers table for this to work
I had terrible performance problems with updates & deletes in Azure until I discovered a few techniques:
Copy data to a temporary table and make updates in the temp table, then copy back to a permanent table when done.
Create a clustered index on the table being updated (partitioning didn't work as well)
For inserts, I am using bulk inserts and getting acceptable performance.

oracle: Is there a way to check what sql_id downgraded to serial or lesser degree over the period of time

I would like to know if there is a way to check sql_ids that were downgraded to either serial or lesser degree in an Oracle 4-node RAC Data warehouse, version 11.2.0.3. I want to write a script and check the queries that are downgraded.
SELECT NAME, inst_id, VALUE FROM GV$SYSSTAT
WHERE UPPER (NAME) LIKE '%PARALLEL OPERATIONS%'
OR UPPER (NAME) LIKE '%PARALLELIZED%' OR UPPER (NAME) LIKE '%PX%'
NAME VALUE
queries parallelized 56083
DML statements parallelized 6
DDL statements parallelized 160
DFO trees parallelized 56249
Parallel operations not downgraded 56128
Parallel operations downgraded to serial 951
Parallel operations downgraded 75 to 99 pct 0
Parallel operations downgraded 50 to 75 pct 0
Parallel operations downgraded 25 to 50 pct 119
Parallel operations downgraded 1 to 25 pct 2
Does it ever refresh? What conclusion can be drawn from above output? Is it for a day? month? hour? since startup?
This information is stored as part of Real-Time SQL Monitoring. But it requires licensing the Diagnostics and Tuning packs, and it only stores data for a short period of time.
Oracle 12c can supposedly store SQL Monitoring data for longer periods of time. If you don't have Oracle 12c, or if you don't have those options licensed, you'll need to create your own monitoring tool.
Real-Time SQL Monitoring of Parallel Downgrades
select /*+ parallel(1000) */ * from dba_objects;
select sql_id, sql_text, px_servers_requested, px_servers_allocated
from v$sql_monitor
where px_servers_requested <> px_servers_allocated;
SQL_ID SQL_TEXT PX_SERVERS_REQUESTED PX_SERVERS_ALLOCATED
6gtf8np006p9g select /*+ parallel ... 3000 64
Creating a (Simple) Historical Monitoring Tool
Simplicity is the key here. Real-Time SQL Monitoring is deceptively simple and you could easily spend weeks trying to recreate even a tiny portion of it. Keep in mind that you only need to sample a very small amount of all activity to get enough information to troubleshoot. For example, just store the results of GV$SESSION or GV$SQL_MONITOR (if you have the license) every minute. If the query doesn't show up from sampling every minute then it's not a performance issue and can be ignored.
For example: create a table create table downgrade_check(sql_id varchar2(100), total number), and create a job with DBMS_SCHEDULER to run insert into downgrade_check select sql_id, count(*) total from gv$session where sql_id is not null group by sql_id;. Although the count from GV$SESSION will rarely be exactly the same as the DOP.
Other Questions
V$SYSSTAT is updated pretty frequently (every few seconds?), and represents the total number of events since the instance started.
It's difficult to draw many conclusions from those numbers. From my experience, having only 2% of your statements downgraded is a good sign. You likely either have good (usually default) settings and not too many parallel jobs running at once.
However, some parallel queries run for seconds and some run for weeks. If the wrong job is downgraded even a single downgrade can be disastrous. Storing some historical session information (or using DBA_HIST_ACTIVE_SESSION_HISTORY) may help you find out if your critical jobs were affected.

In Cassandra 1.2 - CQL 3 is it possible to abort a secondary index build?

Been using a 6GB dataset with each source record being ~1KB in length when I accidentally added an index on a column that I am pretty sure has a 100% cardinality.
Tried dropping the index from cqlsh but by that point the two node cluster had gone into a run away death spiral with loadavg surpassing 20 on each node and cqlsh hung on the drop command for 30 minutes. Since this was just a test setup, I shut-down and destroyed the cluster and restarted.
This is a fairly disconcerting problem as it makes me fear a scenario where a junior developer is on a production cluster and they set an index on a similar high cardinality column. I scanned through the documentation and looked at the options in nodetool but there didn't seem to be anything along the lines of "abort job or abort building index".
Test environment:
2x m1.xlarge EC2 instances with 2 Raid 0 ephemeral disks
Dataset was 6GB, 1KB per record.
My question in summary: Is it possible to abort the process of building a secondary index AND or possible to stop/postpone running builds (indexing, compaction) for a later date.
nodetool -h node_address stop index_build
See: http://www.datastax.com/docs/1.2/references/nodetool#nodetool-stop

Resources