KQL timebased query rather than results - azure

So I've got a HTTP function that writes logs to app insights when its invoked.
I'm wanting to know when a period of time elapses when the HTTP function isn't called.
traces | where message contains "function invoked" | summarize count() by bin(timestamp, 10m)
This works, but it only pulls out the present of logs,
What I'm wanting is to know how many requests have hit this endpoint from now. Rather than it showing "no results". it should have a table showing the datetime and a value of 0.
that way I can show a flat line.

make-series operator
This works if you have at least one data point
traces
| where message has "function invoked"
| make-series count() on timestamp from ago(1h) to now() step 10m
| render timechart
If you might have no data points, you will need to use some tricks
union traces, (print timestamp = now(1ms))
| make-series countif(timestamp <= now()) on timestamp from ago(1h) step 10m
| render timechart

Related

Can we find out if there is an increase to number of requests to a given page?

If i have a web app in Azure, with ApplicationInsights configured, is there a way we can tell if there was an increase in the number of requests to a given page?
I know we can get the "Delta" of performance in a given time slice, compared to the previous period, but doesn't seem like we can do this for requests?
For example, i'd like to answer questions like: "what pages in the last hour had the highest % increase in requests, compared to the previous period"?
Does anyone know how to do this, or can it be done via the AppInsights query language?
Thanks!
Not sure whether it can be done using the Portal, I don't think so. But I came up with the following Kusto query:
requests
| where timestamp > ago(2h) and timestamp < ago(1h)
| summarize previousPeriod = todouble(count()) by url
| join (
requests
| where timestamp > ago(1h)
| summarize lastHour = todouble(count()) by url
) on url
| project url, previousPeriod, lastHour, change = ((lastHour - previousPeriod) / previousPeriod) * 100
| order by change desc
This is about increase/decrease of amount of traffic per url, you can change count() to for example avg(duration) to get the increase/decrease of the average duration.

Azure Response Time Monitoring per Url with a range

I am trying to configure the dashboard consists few business critical functionality which we needs to focus for performance monitoring based on the SLAs.
Example a landing page url retrieves a records needs to be faster and accepted SLA is
Green < 1sec
Amber 1 sec - 2 secs
Red > 2 secs
We were able to configure the same in SPLUNK based on flat file logs. However we could not able to configure similar thing in Azure.
As of now I could not able to create a dashboard for our requirement. Any type of graphical representation is Ok for us. Based on this monitoring we might need to react and improve the performance over the period of time when it goes slow.
You can use the below Kusto query in application insights:
requests
| where timestamp > ago(2h) //set the time range
| where url == "http://localhost:54917/" //set the url here
| summarize avg_time =avg(duration)
| extend my_result = case(
avg_time<=1000,"good", //1000 milliseconds
avg_time<=2000,"normal",//2000 milliseconds
"bad"
)
Note:
1.the unit of avg_time is milliseconds
2.when avg_time <=1000 milliseconds, then in dashboard, it shows "good"; when <=2000 milliseconds, it shows "normal"; when >2000 milliseconds, it shows "bad".
The query result(change it to Chart):
Then in dashboard:
An approximated solution which can serve your purpose
use request time vs time char along with reference lines which can be your SLA thresholds
So you can figure out at this moment the response time is below or above the threshold
// Response time trend
// Chart request duration over the last 12 hours
requests
| where timestamp > ago(12h)
| summarize avgRequestDuration=avg(duration) by bin(timestamp, 10m) // use a time grain of 10 minutes
| render timechart
| extend Green = 200
| extend Amber = 400
| extend red = 800
it would look something like below
I think it is much more useful than your previous UI, which has kind of a meter like feel that gives you health indication at that moment, but with continuous time plot you get better picture of the trend
If you run the same query in Azure Workbooks, you could use the "thresholds" renderer in grids or tiles to format cells with if/then/else like that for color for each range.
would get you:
you can then pin that grid/tiles/graph to an azure dashboard. (if the query uses a workbooks time range parameter, it will inherit the dashboard's time range and auto update as well.

Spark Structured Streaming - Log the internal progress of a query

Let's assume the following setting: I have a stream of events. I want some specific events to trigger an action. Concrete case could be: stream of customers' orders and if the order meets certain set of conditions I want to send the customer a notification/SMS. At the same time, I want to track how fast I am processing the messages and monitor which order met which condition.
For notifications, I use Spark Structured Streaming code consisting of several operations:
df_orders = spark.readStream.format("eventhubs").options(**conf).load()
(df_orders
.filter(col('sms_consent') == True)
.filter(col('order_price') > 1000)
.dropDuplicates(['order_id', 'customer_id'])
.writeStream
.format('eventhubs')
.options(**conf)
.start()
)
Now I want to build a "monitoring/reporting" solution, which will export the following data for every incoming order:
+----------+-----------------------+-----------------------+-----------------------+--------------------------+----------------------+
| order_id | filtered_sms_consent | filtered_order_price | time_messageReceived | time_processingFinished | time_sentToEventHub |
+----------+-----------------------+-----------------------+-----------------------+--------------------------+----------------------+
| 1 | True | None | 9:40:00 | 9:41:00 | None |
| 2 | False | False | 9:41:00 | 9:42:00 | 9:42:21 |
| 3 | False | True | 9:43:00 | 9:45:00 | None |
+----------+-----------------------+-----------------------+-----------------------+--------------------------+----------------------+
(The shape does not matter - the table can be de-pivoted to more "log-like" structure...)
My experiments:
First, I thought about using the Spark listeners (StreamingQueryListener) as it seems the Listeners are able to logs things such as the query state, average processing time etc.. But I couldn't find any solution to match certain event (order_id) with data from query listener.
Next, I wrote a separate query for monitoring while keeping the query for the actual logic execution. Issue is that since these are two separate queries, each is executed independently. Therefore, the timestamps are off. I managed to bound them together using the foreachBatch() approach. This however does encounter a problem with dropDuplicates (must split the query in two) and it feels very "heavy" (it is slowing down the execution quite a bit).
Dream:
What I would love to have is something like:
(df_orders
.log('order_id {}: Processing started at {time}'.format(col('order_id'), time.now())
.filter(col('sms_consent') == True)
.log('order_id {}: filtered on sms_consent'.format(col('order_id'))
.filter(col('order_price' > 1000)
.log('order_id {}: filtered on sms_price'.format(col('order_id'))
...
)
or to have this information in spark logs by default and that have means to extract it.
How is this achievable?
You can create UDF for sending logs to needed to you storage and call it during streaming and send data from each worker. It can be slow.
You can create UDF for logging to standard spark logs. And for look logs you need collect logs from all nodes. I used logstash for collect local logs from all nodes and kibana as dashboard.
If you need log time series data you can use spark metrics system https://spark.apache.org/docs/latest/monitoring.html#metrics and https://github.com/groupon/spark-metrics for custom metrics. This will allow to you
create UDF and send custom metrics during streaming.

How can I consume more than the reserved number of request units with Azure Cosmos DB?

We have reserved various number of RUs per second for our various collections. I'm trying to optimize this to save money. For each response from Cosmos, we're logging the request charge property to Application Insights. I have one analytics query that returns the average number of request units per second and one that returns the maximum.
let start = datetime(2019-01-24 11:00:00);
let end = datetime(2019-01-24 21:00:00);
customMetrics
| where name == 'RequestCharge' and start < timestamp and timestamp < end
| project timestamp, value, Database=tostring(customDimensions['Database']), Collection=tostring(customDimensions['Collection'])
| make-series sum(value) default=0 on timestamp in range(start, end, 1s) by Database, Collection
| mvexpand sum_value to typeof(double), timestamp limit 36000
| summarize avg(sum_value) by Database, Collection
| order by Database asc, Collection asc
let start = datetime(2019-01-24 11:00:00);
let end = datetime(2019-01-24 21:00:00);
customMetrics
| where name == 'RequestCharge' and start <= timestamp and timestamp <= end
| project timestamp, value, Database=tostring(customDimensions['Database']), Collection=tostring(customDimensions['Collection'])
| summarize sum(value) by Database, Collection, bin(timestamp, 1s)
| summarize arg_max(sum_value, *) by Database, Collection
| order by Database asc, Collection asc
The averages are fairly low but the maxima can be unbelievably high in some cases. An extreme example is a collection with a reservation of 1,000, an average used of 15,59 and a maximum used of 63,341 RUs/s.
My question is: How can this be? Are my queries wrong? Is throttling not working? Or does throttling only work on a longer period of time than a single second? I have checked for request throttling on the Azure Cosmos DB overview dashboard (response code 429), and there was none.
I have to answer myself. I found two problems:
Application Insights logs an inaccurate timestamp. I added a timestamp as a custom dimension, and within a certain minute I get different seconds in my custom timestamp but the built-in timestamp is one second past the minute for many of these. That is why I got (false) peaks in request charge.
We did have throttling. When viewing request throttling in the portal, I have to select a specific database. If I try to view request throttling for all databases, it looks like there is none.

Efficient Cassandra DB design to retrieve a summary of time series financial data

I am looking to use the apache cassandra database to store a time series of 1 minute OHLCV financial data for ~1000 symbols. This will need to be updated in real-time as data is streamed in. All entries where time>24hr oldare not needed and should be discarded.
Assuming there are 1000 symbols with entries for each minute from the past 24 hrs, the total number of entries will amount to 1000*(60*24) = 1,440,000.
I am interested in designing this database to efficiency retrieve a slice of all symbols from the past [30m, 1h, 12h, 24h] with fast querying times. Ultimately, I need to retrieve the OHLCV that summarises this slice. The resulting output would be {symbol, FIRST(open), MAX(high), MIN(low), LAST(close), SUM(volume)} of the slice for each symbol. This essentially summarises the 1m OHLCV entries and creates an [30m, 1h, 12h, 24h] OHLCV from the time of the query. E.g. If I want to retrieve the past 1h OHLCV from 1:32pm, the query will give me a 1h OHLCV that represents data from 12:32pm-1:32pm.
What would be a good design to meet these requirements? I am not concerned with the database's memory footprint on the hard drive. The real issue is with fast querying times that is light on cpu and ram.
I have come up with a simple and naive way to store each record with clustering ordered by time:
CREATE TABLE symbols (
time timestamp,
symbol text,
open double,
high double,
low double,
close double,
volume double
PRIMARY KEY (time, symbol)
) WITH CLUSTERING ORDER BY (time DESC);
But I am not sure how to select from this to meet my requirements. I would rather design it specifically for my query, and duplicate data if necessary.
Any suggestions will be much appreciated.
While not based on Cassandra, Axibase Time Series Database can be quite relevant to this particular use case. It supports SQL with time-series syntax extensions to aggregate data into periods of arbitrary length.
An OHLCV query for a 15-minute window might look as follows:
SELECT date_format(datetime, 'yyyy-MM-dd HH:mm:ss', 'US/Eastern') AS time,
FIRST(t_open.value) AS open,
MAX(t_high.value) AS high,
MIN(t_low.value) AS low,
LAST(t_close.value) AS close,
SUM(t_volume.value) AS volume
FROM stock.open AS t_open
JOIN stock.high AS t_high
JOIN stock.low AS t_low
JOIN stock.close AS t_close
JOIN stock.volume AS t_volume
WHERE t_open.entity = 'ibm'
AND t_open.datetime >= '2018-03-29T14:32:00Z' AND t_open.datetime < '2018-03-29T15:32:00Z'
GROUP BY PERIOD(15 MINUTE, END_TIME)
ORDER BY datetime
Note the GROUP BY PERIOD clause above which does all the work behind the scenes.
Query results:
| time | open | high | low | close | volume |
|----------------------|----------|---------|----------|---------|--------|
| 2018-03-29 10:32:00 | 151.8 | 152.14 | 151.65 | 152.14 | 85188 |
| 2018-03-29 10:47:00 | 152.18 | 152.64 | 152 | 152.64 | 88065 |
| 2018-03-29 11:02:00 | 152.641 | 153.04 | 152.641 | 152.69 | 126511 |
| 2018-03-29 11:17:00 | 152.68 | 152.75 | 152.43 | 152.51 | 104068 |
You can use a Type 4 JDBC driver, API clients or just curl to run these queries.
I'm using sample 1-minute data for the above example which you can download from Kibot as described in these compression tests.
Also, ATSD supports scheduled queries to materialize minutely data into OHLCV bars of longer duration, say for long-term retention.
Disclaimer: I work for Axibase.

Resources