Aggregate function taking too long to execute in postgreSQL

Aggregate function taking too long to execute in postgreSQL - linux

I have recently deployed PostgreSQL database on Linux server.
One of the stored procedure is taking around 20 to 24second.I have executed same stored procedure in blank database as well(no any row return) and it is taking same time. I found that slowness occurs because of aggregate function.
Here if i removed function ARRAY_TO_JSON(ARRAY_AGG( then result will be fetch within second.
Below is my code snippet:
SELECT ARRAY_TO_JSON(ARRAY_AGG(ROW_TO_JSON(A))) FROM (
select billservice.billheaderid,billservice.billserviceid AS billserviceid,.....(around 120 columns in select ).....
)A;
Explain Execution Plan:
Previously i was deployed PostgreSQL database to windows server and the same stored procedure is taking around only 1 to 1.5 second.
In both cases i have tested with same database with same amount of data. and also both server have same configuration like RAM, Processor. And also have same PostgreSQL configuration.
While executing my stored procedure in Linux server CPU usages goes to 100%.
Let me know if you have any solution for the same.

Related

Slow bulk insert to Azure database

We are running an elastic pool in Azure running multiple databases, when running 1 of our larger imports this seems to take longer than we are used to. During these imports we ran at 6 cores as a test. All databases are allowed to use all cores.
On our local enviroment, it inserts about 100k records per second, however, the same dataset on Azure does about 1k per second (our vm) to 4k per second (dev laptop).
During this insert, the database only uses 14% log IO, 5% CPU and 0% DataIO.
When setting up a new database using DTU model in P2 we are noticing the same experience. So we are not even hitting the limits of the database
The table contains about 36 columns which are all required.
We have tried this using BulkInsert in the following way using different batchsizes
BulkConfig b = new BulkConfig();
b.BatchSize = 100000;
await dbcontext.BulkInsertAsync(entities, b);
As well as using standard EntityFramework addranges using smaller batches. We even went as far as using the manually written SqlBulkCopy methods, however all with no dice.
Now the question is mainly, is this a software issue? Are we running into issues in our AzureDB? Do we need to change the way we do Bulk imports?
Edit:
Attempted to run the import using the TempDB Setting in BulkInsert, however this also does not increase performance. LogIO is still at 14%.

Iterate through the dataset on the application layer, invoking a
stored procedure for each row that will perform an INSERT/UPDATE
action based on the existence of a record with a certain key. If the
number of records to upsert is limited, this strategy may work well;
otherwise, roundtrips and log writes will have a major influence on
speed.
To minimise roundtrips and log writes and increase throughput, use
bulk insert approaches like the SqlBulkCopy class in ADO.NET to
upload the full dataset to Azure SQL Database and then execute all
the INSERT/UPDATE (or MERGE) operations in a single batch. Overall
execution times may be reduced from hours to minutes/seconds using
this method.
Here, is a discussion related to same scenario: Optimize Azure SQL Database Bulk Upsert scenarios - link.

MariaDB got much faster, and I can't find the cause?

I have a concern about my MariaDB 10.4.12 database query execution time, which is getting much faster without any update to my database schema or data. While a speed-up is always welcome, I am concerned about the root cause of this speed-up, especially since I have not rolled out any changes in the last 24 hours. This specific query has sped up 60x overnight.
I have a NodeJS web application that filters a large dataset into "reporting" pages, which typically take 10-12 seconds to load. My main table has 3.5 million rows and the base query involves many joins, date comparisons, and text comparisons. There is room for fine-tuning the query, but it worked for what it was designed to do and I could live with 10 second load times. I noticed this morning, though, that my queries were executed in less than 1 second, without any recent changes on my part.
The most recent change to the application was pushed out five days ago, which affected the amount of data being pulled into this database. A separate application on the same server reaches out to a data set every 10 minutes and replicates these rows into the same database the "reporting" application communicates with. Up until this update, the query was collecting and inserting ~80,000 rows on average, taking about 8-10 seconds to fully replicate the data into this database. My change five days ago reduced the rows being inserted to ~20,000 on average.
Other clues:
PHPMyAdmin still takes 10-12 seconds to run the query, while the MySQL command-line tool takes in less than 1 second
The MariaDB temp directory was changed to a larger partition 7 days ago
The query was tested to be slow (10-12 seconds) 24 hours ago
The query is still slow on a pre-production server that runs the same application with an identical MySQL instance running (same schema and data)
My current running theory is that the ~80,000 inserts were not being executed in the time range being reported by NodeJS (8-10 seconds for the inserts), and they were instead waiting in the MariaDB temp directory until they could be fully written to the database. That would suggest that the database was constantly bogged down by these writes, and reducing the number to ~20k allowed the database to insert faster, allowing the select queries to run faster this morning.
Should I be concerned about this speed up? Could MariaDB have found a faster way to index my data? Am I going crazy?
Thank you.

Don't worry. This kind of thing can be caused by contention (multiple database clients using the database concurrently) and all sorts of other things.
(Cherish this moment. Performance usually goes the other direction.)
You can test for correctness to increase your confidence level. Check a few older and a few newer records to see if they still contain good data.
Or a full-table-scan query, something like this
SELECT COUNT(*), AVG(some_number_column), MIN(some_text_column) FROM mytable
That will take a while but it will hit every row in the table.
You probably don't need to do this, but it's a way to double check (and tell your boss, "I double checked.)

10 seconds, then 1 second. That is "normal".
The first was run when none of the data was cached in RAM; the second was with all cached.
Run it a third time; it will be 1 second again.
Restart MariaDB and run it again; it will again take 10 seconds.
Walk away from the machine for a long time; don't touch the table. It might be back to 10 seconds. For this, look at size of RAM and innodb_buffer_pool_size. Also look for big table scans that bump everything out of cache.

Azure SQL Database update performance

We're migrating some databases from an Azure VM running SQL Server to Azure SQL. The current VM is a Standard DS12 v2 with two 1TB SSDs attached.
We are using an elastic pool at the P1 performance level. We're early days in this, so nothing else is really running in the pool.
At any rate, we are doing an ETL process that involves a handful of ~20M row tables. We bulk load these tables and then update some attributes to help with the rest of the process.
For example, I am currently running the following update:
UPDATE A
SET A.CompanyId = B.Id
FROM etl.TRANSACTIONS AS A
LEFT OUTER JOIN dbo.Company AS B
ON A.CO_ID = B.ERPCode
TRANSACTIONS is ~ 20M rows; Company is fewer than 50.
I'm already 30 minutes into running this update which is far beyond what will be acceptable. The usage meter on the Pool is hovering around 40%.
For reference, our Azure VM runs this in about 2 minutes.
I load this table via the bulk copy and this update is already beyond what it took to load the entire table.
Any suggestions on speeding up this (and other) updates?

We are using an elastic pool at the P1 performance level.
Not sure ,how this translates your VM performance levels and what criteria you are using to compare both
I would recommend below steps ,since there is no execution plan provided ..
1.Check if there is any wait type ,while the update is running
select
session_id,
start_time,
command,
db_name(ec.database_id) as dbname,
blocking_session_id,
wait_type,
last_wait_type,
wait_time,
cpu_time,
logical_reads,
reads,
writes,
((database_transaction_log_bytes_used +database_transaction_log_bytes_reserved)/1024)/1024 as logusageMB,
txt.text,
pln.query_plan
from sys.dm_exec_requests ec
cross apply
sys.dm_exec_sql_text(ec.sql_handle) txt
outer apply
sys.dm_exec_query_plan(ec.plan_handle) pln
left join
sys.dm_tran_database_transactions trn
on trn.transaction_id=ec.transaction_id
the wait type,provides you lot of info,which can be used to troubleshoot..
2.You can also use below query to see in parallel ,what is happening with the query
set statistics profile on
your update query
then run below query in a seperate window
select
session_id,physical_operator_name,
row_count,actual_read_row_count,estimate_row_count,estimated_read_row_count,
rebind_count,
rewind_count,
scan_count,
logical_read_count,
physical_read_count,
logical_read_count
from
sys.dm_exec_query_profiles
where session_id=your sessionid;
as per your question,there don't seems to be an issue with DTU.So i dont see much issue on that front..

Slow performance solved in one case:
I have recently had severe problems with slow Azure updates that made it nearly unusable. It was updating only 1000 rows in 1 second. So 1M rows was 1000 seconds. I believe this is due to logging in Azure, but I haven't done enough research to be certain. Opening a MS support incident went nowhere. I finally solved the issue using two techniques:
Copy the data to a temporary table and make updates in the temp table. So in the above case, try copying the 50 rows to a temp table & then back again after updates. No/Minimal logging in this case.
My copying back was still slow (I had a few 100K rows), and I create a clustered index on that table. Update duration dropped by a factor of 4-5.
I am using a S1-20DTU database. It is still about 5 times slower than a dedicated instance, but that is fantastic performance for the price.

The real answer to this issue is that SQL Azure will spill to the tempdb much faster than you would expect if you are used to using a well provisioned VM or physical machine.
You can tell that this is happening by recording the actual execution query plan. Look for the warning icon:
The popup will complain about the spill:
At any rate, if you see this, it is likely that you're trying to do too much in the statement.
The Microsoft support person suggested updating the statistics, but this did not change the situation for us.
What seems to be working is the traditional advice to break the inserts up into smaller batches.

Slow performance first queries on SQL Azure

I have a very small database (50MB) and I'm on a basic plan. There will be only a single user, but we need to create many databases (always one per user) since they will be used for training purposes. Each database is created by doing the following statement:
CREATE DATABASE Training1 AS COPY OF ModelDatabase1
We seem to be getting very very slow performance when we first query this database, afterwards it seems acceptable.
To give you an idea: we have a SP: StartupEvents that runs when the application is started. This query takes 25 seconds to run the first time. This seems incredible since the database is very small, and the tables the query calls don't contain many records. If we run this procedure afterwards it executes immediately...
How can we avoid this?

Sql server Linked server query error

I am using a big stored procedure which is using many linked server queries. If i run this stored procedure manually it runs fine but if i call this stored procedure with exe using mufti-threading, it is raising "Cannot get the data of the row from the OLE DB provider "SQLNCLI11" for linked server "linkedserver1". and "Row handle referred to a deleted row or a row marked for deletion." for each execution. Performance of stored procedure is also very slow in comparison of same stored procedure without linked server queries. Please provide me some tips to improve performance of stored procedure and fix the issue mentioned above.
Thanks

If you are querying over linked servers, you will see a decrease in performance. Could it be possible that the procedures are affecting the same results - therefore giving you exceptions? If so you might be looking at dirty reads. Is that OK for your result set?
From the looks of it you seem to have to call the procedures sequentially and not in parallel. What you can do is cache the data on a server, and sync the updates etc, in batches.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string