pt-kill show database in log - percona

We've enabled pt-kill on a few servers of us, but without the killing, only to monitor slow queries for now.
The only problem is, the log doesn't contain the database, only the query. Is there a way to enable in the log on what database the query is executed?
# 2012-09-12T10:31:23 KILL 419539612 (Query 138 sec) SELECT blog.*, blog_text.*, user.*
FROM blog AS blog
INNER JOIN blog_text AS blog_text ON (blog.firstblogtextid = blog_text.blogtextid)
INNER JOIN blog_user AS blog_user ON (blog_user.bloguserid = blog.userid)
LEFT JOIN user AS user ON (user.userid = blog_text.userid)
WHERE 1=1
AND blog.state = 'visible'
AND blog.dateline <= 1347438544
AND blog.pending = 0
AND blog_user.options_guest & 1
AND ~blog.options & 8
ORDER BY blog.dateline DESC
LIMIT 15

Logging the database for the query is not currently a feature of pt-kill (as of version 2.1.x).
This feature has been requested in the past:
https://bugs.launchpad.net/percona-toolkit/+bug/1015804
But it has not been implemented yet.

It's being run by vBulletin, if that helps. You can speed up the query dramatically by indexing some of the referenced fields in the "blog" and "blog_user" tables.

Related

Script returning limited amount of records as compare to Query

I tried to convert an SQL query into Gosu Script ( Guidewire). My script is working only for limited number of records
This is the SQL query
select PolicyNumber,* from pc_policyperiod
where ID in ( Select ownerID from pc_PRActiveWorkflow
where ForeignEntityID in (Select id from pc_workflow where State=3))
This is my script
var workFlowIDQuery = Query.make(Workflow).compare(Workflow#State,Relop.Equals,WorkflowState.TC_COMPLETED).select({QuerySelectColumns.path(Paths.make(entity.Workflow#ID))}).transformQueryRow(\row ->row.getColumn(0)).toTypedArray()
var prActiveWorkFlowQuery = Query.make(PRActiveWorkflow).compareIn(PRActiveWorkflow#ForeignEntity, workFlowIDQuery).select({QuerySelectColumns.path(Paths.make(entity.PRActiveWorkflow#Owner))}).transformQueryRow(\row -> row.getColumn(0)).toTypedArray()
var periodQuery = Query.make(PolicyPeriod).compareIn(PolicyPeriod#ID,prActiveWorkFlowQuery).select()
for(period in periodQuery){
print(period.policynmber)
}
Can anyone find a cause; why the script results in limited records or suggest improvements?
I would suggest you to write a single Gosu Query to select policyPeriod and join 3 entities with a foreign key to other entity.
I am note sure if the PolicyPeriod ID is same as the PRActiveWorkflow ID. Can you elaborate the relation between PolicyPeriod and PRActiveWorkflow entity ?

ADW - Query performance issues

I have an Azure SQL Warehouse setup of DW500c of gen2 and i have a Data Vault model in it with several tables.
I am trying to execute one query that i think is taking too much time.
Here is the query i have been executing:
SELECT
H_PROFITCENTER.[BK_PROFITCENTER]
,H_ACCOUNT.[BK_ACCOUNT]
,H_LOCALCURRENCY.[BK_CURRENCY]
,H_DOCUMENTCURRENCY.[BK_CURRENCY]
,H_COSTCENTER.[BK_COSTCENTER]
,H_COMPANY.[BK_COMPANY]
,H_CURRENCY.[BK_CURRENCY]
,H_INTERNALORDER.[BK_INTERNALORDER]
,H_VERSION.[BK_VERSION]
,H_COSTELEMENT.[BK_COSTELEMENT]
,H_CALENDARDATE.[BK_DATE]
,H_VALUETYPEREPORT.[BK_VALUETYPEREPORT]
,H_FISCALPERIOD.[BK_FISCALPERIOD]
,H_COUNTRY.[BK_COUNTRY]
,H_FUNCTIONALAREA.[BK_FUNCTIONALAREA]
,SLADI.[LINE_ITEM]
,SLADI.[AMOUNT]
,SLADI.[CREDIT]
,SLADI.[DEBIT]
,SLADI.[QUANTITY]
,SLADI.[BALANCE]
,SLADI.[LOADING_DATE]
FROM [dwh].[L_ACCOUNTINGDOCUMENTITEMS] AS LADI
INNER JOIN [dwh].[SL_ACCOUNTINGDOCUMENTITEMS] AS SLADI ON LADI.[HK_ACCOUNTINGDOCUMENTITEMS] = SLADI.[HK_ACCOUNTINGDOCUMENTITEMS]
LEFT JOIN dwh.H_PROFITCENTERAS H_PROFITCENTER ON H_PROFITCENTER.[HK_PROFITCENTER] = LADI.[HK_PROFITCENTER]
LEFT JOIN dwh.H_ACCOUNT AS H_ACCOUNT ON H_ACCOUNT.[HK_ACCOUNT] = LADI.[HK_ACCOUNT]
LEFT JOIN dwh.H_CURRENCY AS H_LOCALCURRENCY ON H_LOCALCURRENCY.[HK_CURRENCY] = LADI.[HK_LOCALCURRENCY]
LEFT JOIN dwh.H_CURRENCY AS H_DOCUMENTCURRENCY ON H_DOCUMENTCURRENCY.[HK_CURRENCY] = LADI.[HK_DOCUMENTCURRENCY]
LEFT JOIN dwh.H_COSTCENTER AS H_COSTCENTER ON H_COSTCENTER.[HK_COSTCENTER] = LADI.[HK_COSTCENTER]
LEFT JOIN dwh.H_COMPANY AS H_COMPANY ON H_COMPANY.[HK_COMPANY] = LADI.[HK_COMPANY]
LEFT JOIN dwh.H_CURRENCY AS H_CURRENCY ON H_CURRENCY.[HK_CURRENCY] = LADI.[HK_CURRENCY]
LEFT JOIN dwh.H_INTERNALORDERAS H_INTERNALORDER ON H_INTERNALORDER.[HK_INTERNALORDER] = LADI.[HK_INTERNALORDER]
LEFT JOIN dwh.H_VERSION AS H_VERSION ON H_VERSION.[HK_VERSION] = LADI.[HK_VERSION]
LEFT JOIN dwh.H_COSTELEMENT AS H_COSTELEMENT ON H_COSTELEMENT.[HK_COSTELEMENT] = LADI.[HK_COSTELEMENT]
LEFT JOIN dwh.H_DATE AS H_CALENDARDATE ON H_CALENDARDATE.[HK_DATE] = LADI.[HK_CALENDARDATE]
LEFT JOIN dwh.H_VALUETYPEREPORTAS H_VALUETYPEREPORT ON H_VALUETYPEREPORT.[HK_VALUETYPEREPORT] = LADI.[HK_VALUETYPEREPORT]
LEFT JOIN dwh.H_FISCALPERIODAS H_FISCALPERIOD ON H_FISCALPERIOD.[HK_FISCALPERIOD] = LADI.[HK_FISCALPERIOD]
LEFT JOIN dwh.H_COUNTRY AS H_COUNTRY ON H_COUNTRY.[HK_COUNTRY] = LADI.[HK_COUNTRY]
LEFT JOIN dwh.H_FUNCTIONALAREAAS H_FUNCTIONALAREA ON H_FUNCTIONALAREA.[HK_FUNCTIONALAREA] = LADI.[HK_FUNCTIONALAREA]
This query is taking me 22 minutes to execute.
I must say that it returns around 1200000000 rows.
[L_ACCOUNTINGDOCUMENTITEMS] and [SL_ACCOUNTINGDOCUMENTITEMS] are hash distributed by [HK_ACCOUNTINGDOCUMENTITEMS] column and all other tables were created with replicated table distribution.
Also, i activated in azure datawarehouse automatic statistics creation.
Can anyone help me to understand how can i speed it up?
Here are some things to try out to see if you make this faster -
Create a table using 'Create Table as Select' (CTAS) with RoundRobin option for your query and take the timing of that. I have a feeling that returning that large amount of rows to your client could be a big contributor to the time. If the CTAS finishes in lets say 5 minutes, you can safely say that the rest of the time is being taken by return operation.
If not, You can materialize some of the left joins into a table and then add that table to the main query to see if that finishes faster.
You can also look at explain plans to see if you can cut down some steps by aligning the tables on a common key.

Alternative of sp_depends in Azure Data Warehouse

I need to get the list of tables used in a stored procedure,However in Azure Datawarehouse sp_depends is not supported.
The other alternative I thought of having is to get the stored proc code from INFORMATION_SCHEMA.ROUTINES and then run a script to get the [schema].[tablename] from the stored procedure definition but here the issue is in storing the whole stored proc into a variable. VARCHAR(MAX)has a limit of 8000 to store and if my proc exceeds this limit then I wont be able to get the complete table list.
Try using sys.sql_expression_dependencies. The following query may help you:
SELECT ReferencingObjectType = o1.type,
ReferencingObject = SCHEMA_NAME(o1.schema_id)+'.'+o1.name,
ReferencedObject = SCHEMA_NAME(o2.schema_id)+'.'+ed.referenced_entity_name,
ReferencedObjectType = o2.type
FROM sys.sql_expression_dependencies ed
INNER JOIN sys.objects o1
ON ed.referencing_id = o1.object_id
INNER JOIN sys.objects o2
ON ed.referenced_id = o2.object_id
WHERE o1.type in ('P','TR','V', 'TF')
ORDER BY ReferencingObjectType, ReferencingObject

Azure SQL Data Warehouse hanging or not responding to simple query after large BCP operation

I have a preview version of Azure Sql Data Warehouse running which was working fine until I imported a large table (~80 GB) through BCP. Now all the tables including the small one do not respond even to a simple query
select * from <MyTable>
Queries to Sys tables are working still.
select * from sys.objects
The BCP process was left over the weekend, so any Statistics Update should have been done by now. Is there any way to figure out what is making this happen? Or at lease what is currently running to see if anything is blocking?
I'm using SQL Server Management Studio 2014 to connect to the Data Warehouse and executing queries.
#user5285420 - run the code below to get a good view of what's going on. You should be able to find the query easily by looking at the value in the "command" column. Can you confirm if the BCP command still shows as status="Running" when the query steps are all complete?
select top 50
(case when requests.status = 'Completed' then 100
when progress.total_steps = 0 then 0
else 100 * progress.completed_steps / progress.total_steps end) as progress_percent,
requests.status,
requests.request_id,
sessions.login_name,
requests.start_time,
requests.end_time,
requests.total_elapsed_time,
requests.command,
errors.details,
requests.session_id,
(case when requests.resource_class is NULL then 'N/A'
else requests.resource_class end) as resource_class,
(case when resource_waits.concurrency_slots_used is NULL then 'N/A'
else cast(resource_waits.concurrency_slots_used as varchar(10)) end) as concurrency_slots_used
from sys.dm_pdw_exec_requests AS requests
join sys.dm_pdw_exec_sessions AS sessions
on (requests.session_id = sessions.session_id)
left join sys.dm_pdw_errors AS errors
on (requests.error_id = errors.error_id)
left join sys.dm_pdw_resource_waits AS resource_waits
on (requests.resource_class = resource_waits.resource_class)
outer apply (
select count (steps.request_id) as total_steps,
sum (case when steps.status = 'Complete' then 1 else 0 end ) as completed_steps
from sys.dm_pdw_request_steps steps where steps.request_id = requests.request_id
) progress
where requests.start_time >= DATEADD(hour, -24, GETDATE())
ORDER BY requests.total_elapsed_time DESC, requests.start_time DESC
Checkout the resource utilization and possibly other issues from https://portal.azure.com/
You can also run sp_who2 from SSMS to get a snapshot of what's threads are active and whether there's some crazy blocking chain that's causing problems.

Effecientcy of using multiple subselects

I'm trying to gather multiple related pieces of data for a master account and create a view (e.g. overdue balance, account balance, debt recovery status, interest hold). Will this approach be effecient? Database platforms are Informix, Oracle and Sql Server. Doing some statistics on Informix I'm just getting 1 sequential scan of auubmast. I assume the sub-selects are quite effecient because they filter down to the account number immediately. I may need many sub-selects before I'm finished. On top of the question of efficiency are there any other 'tidy' approaches?
Thank you.
select
auubmast.acc_num,
auubmast.cls_cde,
auubmast.acc_typ,
(select
sum(auubtrnh.trn_bal)
from auubtrnh, aualtrcd
where aualtrcd.trn_cde = auubtrnh.trn_cde
and auubtrnh.acc_num = auubmast.acc_num
and (auubtrnh.due_dte < current or aualtrcd.trn_typ = 'I')
) as ovd_bal,
(select
sum(auubytdb.ytd_bal)
from auubytdb, auubsvgr
where auubytdb.acc_num = auubmast.acc_num
and auubsvgr.svc_grp = auubmast.svc_grp
and auubytdb.bil_yer = auubsvgr.bil_yer
) as acc_bal,
(select
max(cur_stu)
from audemast
where mdu_acc = auubmast.acc_num
and mdu_ref = 'UB'
) as drc_stu,
(select
hol_typ
from aualhold
where mdu_acc = auubmast.acc_num
and mdu_ref = 'UB'
and pro_num = 2601
and (hol_til is null or hol_til > current)
) as int_hld
from auubmast
In general, the answer to this is that correlated subqueries should be avoided whenever possible.
Using them will result in a full table scan for your view, which is bad. The only times you want to use subqueries like this is if you can limit the range of the main select to only a few rows, or if there really is no other choice.
When you're running into situations like this, you might want to consider adding columns and precalculating them on an update trigger, rather than using subqueries. This will save your database a thrashing.

Resources