Tracing in Dynamics AX 2012 AOS during loadtest - performance-testing

I'm currently preparing a loadtest on a dynamics AX application. We're exploring the possibilities for enabling tracing on the AOS during the loadtest. I'm worried the trace might impact the performance of the server, therefore influencing the test results.
Is it possible to do tracing during a load test without influencing the results?

It is true that if you turn on AX tracing (e.g., either by editing the Ax32Serv.exe.config or using the WCF tracing), you will create a performance impact under load (and also end up with some huge logs). FH-Inway's comment that tracing causes only 4% overhead might be right (depends on the load), but under load the real problem is the gigabytes of logs that you get from just one minute of turning them on. This is also true if you use the Trace Parser method of monitoring.
If you are failing under load, you should also be able to find exception messages returned to you in the failed request.
You can also view all the exceptions AX throws through the AOS GUI (see the System--> Exceptions item in the GUI) or you can query AX's exception table directly using a query such as:
SELECT COUNT(*) AS theCount ,EXCEPTION ,theDESCRIPTION
FROM (
SELECT SE.CREATEDDATETIME ,SE.EXCEPTION ,SUBSTRING(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(CONVERT(VARCHAR(max), SE.DESCRIPTION), CHAR(13), ' '), CHAR(10), ' '), CHAR(9), ' '), (CHAR(13) + CHAR(10)), ' '), CHAR(32), ' '), 1, 250) AS theDESCRIPTION
FROM SYSEXCEPTIONTABLE SE
LEFT JOIN AIFEXCEPTIONMAP T2 ON (SE.RECID = T2.EXCEPTIONID)
LEFT JOIN PARTITIONS T3 ON (SE.DATAPARTITION = T3.RECID)
WHERE SE.CREATEDDATETIME > '2015-12-15 02:25:50' ) AS InnerS
--where theDESCRIPTION like '%deadlock%'
GROUP BY Exception ,theDESCRIPTION
ORDER BY theCount DESC
You can isolate your issue to the SQL tier or the AOS tier in AX by using the SQL Server based Performance Analyzer for Dynamics AX. DynPerf has almost no perf impact.
If you are under load my suggestion would be to start with Perf Analyzer (DynPerf), or use SQL Server's own DMVs to get an idea of what queries are performing poorly, then find the specific user action that the query relates to.
Finally, AX Perfmon counters can give you some information about what is happening.
The different options for monitoring AX are listed in the TechNet article "Tools for monitoring performance [AX 2012]". As you can see there is a plethora of monitoring options in AX. Just as much as in any other system.
If you are getting a lot of exceptions, run each scenario in the load test in isolation until you find the test and the data that throws the exception.
Load testing is a great way to figure out how you are going to monitor and troubleshoot problems in production. If you can't do it in the load test, how will you do it when real users are having problems?
You do not say how you are loading they system. Are you going through the AIF? You also do not state the reasons you want to do the tracing. Is it merely to monitor response time? If you are using Visual Studio load tools and you have properly wrapped your requests with Begin and End.Timer() methods, your load tool will collect response times for you.

"Is it possible to do tracing during a load test without influencing the results?"
Not really. If it didn't have a performance impact, tracing would always be on. The mere act of tracing is causing additional work, so it must affect the results.
If you're performance testing against a baseline system with the same load test, you could turn tracing on in both environments, somewhat nullifying the performance against both.

Related

Native Transport Requests All time blocked

Trying to understand more about Native-Transport-Requests!
As we know these are cql requests and if limit exceeds the result will be all time blocked NTR.
My question is how do i monitor these requests in real time and get some kind of report on it.
I see some settings like max_queued_native_transport_requests and native_transport_max_threads. How these settings will have effect over all time blocked.
Have a look at JIRA-11363.
Also check this discussion for more info.
The recommendation is to start with the default values and tune from there. The default values are:
max_queued_native_transport_requests=1024
native_transport_max_threads: 128
Monitor you nodes and if you see an increasing number of blocked Native-Transport-Requests, then you need to increase max_queued_native_transport_requests.
Also, I think it's worth checking these discussions: 1, 2

How to get the NET response of single system during performance testing using Load runner or other tools

There are multiple systems involved in my application in order to get the proper response to final system.
Above image shows first the request will go from system 1 to system 2 and so on to system n. The final response will be visible in system 1. Here I want to know how we can get the "NET response time of one request from system 2 to system 3 or system 1 to system 2 and so on". I am a beginner in performance testing. Please let me know how we can achieve this.
Thank you so much!
APM tool Integration at System n through 1 (assuming your systems are supported by your APM tool), or log analysis of messages with timestamps indicating start and completion of an event. As long as you have a unique correlating key in the logs you can then reconstruct timing records for various event types which are key to your understanding of performance in your ~n~ tier model

Give reads priority over writes in Elasticsearch

I have an EC2 server running Elasticsearch 0.9 with a nginx server for read/write access. My index has about 750k small-medium documents. I have a pretty continuous stream of minimal writes (mainly updates) to the content. The speeds/consistency I receive with search is fine with me, but I have some sporadic timeout issues with multi-get (/_mget).
On some pages in my app, our server will request a multi-get of a dozen to a few thousand documents (this usually takes less than 1-2 seconds). The requests that fail, fail with a 30,000 millisecond timeout from the nginx server. I am assuming this happens because the index was temporarily locked for writing/optimizing purposes. Does anyone have any ideas on what I can do here?
A temporary solution would be to lower the timeout and return a user friendly message saying documents couldn't be retrieved (however they still would have to wait ~10 seconds to see an error message).
Some of my other thoughts were to give read priority over writes. Anytime someone is trying to read a part of the index, don't allow any writes/locks to that section. I don't think this would be scalable and it may not even be possible?
Finally, I was thinking I could have a read-only alias and a write-only alias. I can figure out how to set this up through the documentation, but I am not sure if it will actually work like I expect it to (and I'm not sure how I can reliably test it in a local environment). If I set up aliases like this, would the read-only alias still have moments where the index was locked due to information being written through the write-only alias?
I'm sure someone else has come across this before, what is the typical solution to make sure a user can always read data from the index with a higher priority over writes. I would consider increasing our server power, if required. Currently we have 2 m2x-large EC2 instances. One is the primary and the replica, each with 4 shards.
An example dump of cURL info from a failed request (with an error of Operation timed out after 30000 milliseconds with 0 bytes received):
{
"url":"127.0.0.1:9200\/_mget",
"content_type":null,
"http_code":100,
"header_size":25,
"request_size":221,
"filetime":-1,
"ssl_verify_result":0,
"redirect_count":0,
"total_time":30.391506,
"namelookup_time":7.5e-5,
"connect_time":0.0593,
"pretransfer_time":0.059303,
"size_upload":167002,
"size_download":0,
"speed_download":0,
"speed_upload":5495,
"download_content_length":-1,
"upload_content_length":167002,
"starttransfer_time":0.119166,
"redirect_time":0,
"certinfo":[
],
"primary_ip":"127.0.0.1",
"redirect_url":""
}
After more monitoring using the Paramedic plugin, I noticed that I would get timeouts when my CPU would hit ~80-98% (no obvious spikes in indexing/searching traffic). I finally stumbled across a helpful thread on the Elasticsearch forum. It seems this happens when the index is doing a refresh and large merges are occurring.
Merges can be throttled at a cluster or index level and I've updated them from the indicies.store.throttle.max_bytes_per_sec from the default 20mb to 5mb. This can be done during runtime with the cluster update settings API.
PUT /_cluster/settings HTTP/1.1
Host: 127.0.0.1:9200
{
"persistent" : {
"indices.store.throttle.max_bytes_per_sec" : "5mb"
}
}
So far Parmedic is showing a decrease in CPU usage. From an average of ~5-25% down to an average of ~1-5%. Hopefully this can help me avoid the 90%+ spikes I was having lock up my queries before, I'll report back by selecting this answer if I don't have any more problems.
As a side note, I guess I could have opted for more balanced EC2 instances (rather than memory-optimized). I think I'm happy with my current choice, but my next purchase will also take more CPU into account.

Memory leak in IPython.parallel module?

I'm using IPython.parallel to process a large amount of data on a cluster. The remote function I run looks like:
def evalPoint(point, theta):
# do some complex calculation
return (cost, grad)
which is invoked by this function:
def eval(theta, client, lview, data):
async_results = []
for point in data:
# evaluate current data point
ar = lview.apply_async(evalPoint, point, theta)
async_results.append(ar)
# wait for all results to come back
client.wait(async_results)
# and retrieve their values
values = [ar.get() for ar in async_results]
# unzip data from original tuple
totalCost, totalGrad = zip(*values)
avgGrad = np.mean(totalGrad, axis=0)
avgCost = np.mean(totalCost, axis=0)
return (avgCost, avgGrad)
If I run the code:
client = Client(profile="ssh")
client[:].execute("import numpy as np")
lview = client.load_balanced_view()
for i in xrange(100):
eval(theta, client, lview, data)
the memory usage keeps growing until I eventually run out (76GB of memory). I've simplified evalPoint to do nothing in order to make sure it wasn't the culprit.
The first part of eval was copied from IPython's documentation on how to use the load balancer. The second part (unzipping and averaging) is fairly straight-forward, so I don't think that's responsible for the memory leak. Additionally, I've tried manually deleting objects in eval and calling gc.collect() with no luck.
I was hoping someone with IPython.parallel experience could point out something obvious I'm doing wrong, or would be able to confirm this in fact a memory leak.
Some additional facts:
I'm using Python 2.7.2 on Ubuntu 11.10
I'm using IPython version 0.12
I have engines running on servers 1-3, and the client and hub running on server 1. I get similar results if I keep everything on just server 1.
The only thing I've found similar to a memory leak for IPython had to do with %run, which I believe was fixed in this version of IPython (also, I am not using %run)
update
Also, I tried switching logging from memory to SQLiteDB, in case that was the problem, but still have the same problem.
response(1)
The memory consumption is definitely in the controller (I could verify this by: (a) running the client on another machine, and (b) watching top). I hadn't realized that non SQLiteDB would still consume memory, so I hadn't bothered purging.
If I use DictDB and purge, I still see the memory consumption go up, but at a much slower rate. It was hovering around 2GB for 20 invocations of eval().
If I use MongoDB and purge, it looks like mongod is taking around 4.5GB of memory and ipcluster about 2.5GB.
If I use SQLite and try to purge, I get the following error:
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/hub.py", line 1076, in purge_results
self.db.drop_matching_records(dict(completed={'$ne':None}))
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 359, in drop_matching_records
expr,args = self._render_expression(check)
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 296, in _render_expression
expr = "%s %s"%null_operators[op]
TypeError: not enough arguments for format string
So, I think if I use DictDB, I might be okay (I'm going to try a run tonight). I'm not sure if some memory consumption is still expected or not (I also purge in the client like you suggested).
Is it the controller process that is growing, or the client, or both?
The controller remembers all requests and all results, so the default behavior of storing this information in a simple dict will result in constant growth. Using a db backend (sqlite or preferably mongodb if available) should address this, or the client.purge_results() method can be used to instruct the controller to discard any/all of the result history (this will delete them from the db if you are using one).
The client itself caches all of its own results in its results dict, so this, too, will result in growth over time. Unfortunately, this one is a bit harder to get a handle on, because references can propagate in all sorts of directions, and is not affected by the controller's db backend.
This is a known issue in IPython, but for now, you should be able to clear the references manually by deleting the entries in the client's results/metadata dicts and if your view is sticking around, it has its own results dict:
# ...
# and retrieve their values
values = [ar.get() for ar in async_results]
# clear references to the local cache of results:
for ar in async_results:
for msg_id in ar.msg_ids:
del lview.results[msg_id]
del client.results[msg_id]
del client.metadata[msg_id]
Or, you can purge the entire client-side cache with simple dict.clear():
view.results.clear()
client.results.clear()
client.metadata.clear()
Side note:
Views have their own wait() method, so you shouldn't need to pass the Client to your function at all. Everything should be accessible via the View, and if you really need the client (e.g. for purging the cache), you can get it as view.client.

oracle query - ORA-01652: unable to extend temp segment but only in some versions of sql*plus

This one has me rather confused. I've written a query which runs fine from my development client but fails on the production client with error "ORA-01652: unable to extend temp segment by....". In both instances, the database and user is the same. On my development machine (MS Windows) I've got SQL*PLUS (Release 9.0.1.4.0) and Toad 9.0 (both using version 9.0.4.0.1 of the oci.dll). Both run the code without errors.
However when I run the same file, against the same database, using the same username/password from a different machine, this time version 10.2.0.4.0 (from the 10.2.0.4-1 Oracle instant client) I get the error.
It does occur reproducibly.
Unfortunately I've only got limited access to the dictionary views on the database which is set up as read-only (can't even get an explain plan!).
I've tried working around the problem by tuning the query (I suspect that there is a large interim result set which is subsequently trimmed down) but have not managed to change the behaviour at either client.
It may be possible to deploy a different version of the client on the machine causing the problems - but currently that looks like downgrading to a previous version.
Any ideas?
TIA
Update
Based on Gary's answer below, I had a look at the glogin.sql scripts - the only difference was that 'SET SQLPLUSCOMPATIBILITY 8.1.7' was present on the working client but absent on failing client - but adding it in did not resolve the problem.
I also tried
alter session set workarea_size_policy=manual;
alter session set hash_area_size=1048576000;
and
alter session set sort_area_size=1048576000;
to no avail :(
Update 2
I managed to find the same behaviour, this time talking to an Oracle 8i backend. In this case the database was RW. That allowed me to confirm that the different clients were, as I suspected, generating different plans. But why????
Looking at the output of 'SHOW PARAMETERS' both clients reported exactly the same settings!
Years ago I worked on a DR database that was fully READONLY, and even the TEMP tablespace wasn't writable. Any query that tried to spill to temp would fail (even if the temp space to be used was pretty trivial).
If this is the same situation, I wouldn't be surprised if there was a login.sql (or glogin.sql or a logon trigger) that does an ALTER SESSION to set larger PGA memory value for the session, and/or changes the optimizer goal to FIRST_ROWS.
If you can, compare the results of the following from both clients:
select * from v$parameter
where ismodified != 'FALSE';
Also from each client for the problem SQL, try EXPLAIN PLAN FOR SELECT...
and SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
See if it is coming up with different query plans.
Not really an answer - but a bit more information....
Our local DBAs were able to confirm that the 16Gb (!) TEMP tablespace was indeed being used and had filled up, but only from the Linux clients (I was able to recreate the error making an oci8 call from PHP). In the case of the sqlplus client I was actually using exactly the same file to run the query on both clients (copied via scp without text conversion - so line endings were CRLF - i.e. byte for byte the same as was running on the Windows client).
So the only rational solution was that the 2 client stacks were resulting in different execution plans!
Running the query from both clients approx simultaeneously on a DBMS with very little load gave the same result - meaning that the two clients also generated different sqlids for the query.
(and also Oracle was ignoring my hints - I hate when it does that).
There is no way Oracle should be doing this - even if it were doing some internal munging of the query before presenting it to the DBMS (which would give rise to the different sqlids) the client stack used should be totally transparent regarding the choice of an execution plan - this should only ever change based on the content of the query and the state of the DBMS.
The problem was complicated by not being to see any explain plans - but for the query to use up so much temporary tablespace, it had to be doing a very ugly join (at least partially cartesian) before filtering the resultset. Adding hints to override this had no effect. So I resolved the problem by splitting the query into 2 cursors and doing a nested lookup using PL/SQL. A very ugly solution, but it solved my immediate problem. Fortunately I just need to generate a text file.
For the benefit of anyone finding themselves in a similar pickle:
BEGIN
DECLARE
CURSOR query_outer IS
SELECT some_primary_key, some_other_stuff
FROM atable
WHERE....
CURSOR query_details (p_some_pk) IS
SELECT COUNT(*), SUM(avalue)
FROM btable
WHERE fk=p_some_pk
AND....
FOR m IN query_outer
LOOP
FOR n IN query_details(m.some_primary_key)
LOOP
dbms_out.put_line(....);
END LOOP;
END LOOP;
END;
The more I use Oracle, the more I hate it!
A bit more information - I've run into the same problem connecting to a different database - this time an Oracle 8i. And I can get EXPLAIN plans.
Although in this case the query completed on both clients, it took 50% longer running from Linux sql*plus 10.2.0.4.0 vs WinXP sql*plus 8.0.6.0
As I suspected, the plans are different - but both are using the default settings on the client, both are using the same optimizer mode. It seems to think the plan generated from the Linux client has a lower cost than that from the XP client (although it does take longer to run). How does the optimizer prune the search path for potential plans?

Resources