In K6 - How to break the TPS at individual transactions within a single script - performance-testing

I would like to achieve different set of TPS at different API calls within the single script.
Currently i am using stages, minIterationDuration and sleep function for think time. However using different set of think time at API level doesn't help in achieving various TPS. Already --rps parameter but which distributes tps evenly across the API's
Below is my sample code,
export default function()
{
group("T01_API1", function1.function); // Example: 50 TPS for API 1
group("T02_API2", function2.function);sleep(2); // Example: 100 TPS for API 2
group("T03_API3", function3.function);sleep(2); // Example: 20 TPS for API 3
}
Stages:
{
"minIterationDuration":"4s",
"stages": [
{ "duration": "15s", "target":15}, // Ramp Up
{ "duration": "1h", "target":15}, // Steady State
{ "duration": "10s", "target": 0} // Ramp down
]
}
Please note above is the sample code used, not the entire script and i can't execute multiple instances of K6 due to system limitation and were i need to control within the single script file. Let me know your thoughts

The new k6 v0.27.0 has arrival-rate executors (to set iterations per second, i.e. TPS), and supports multiple scenarios (and you can have different TPS in each one). See the examples in https://github.com/loadimpact/k6/releases/tag/v0.27.0 and https://k6.io/docs/using-k6/scenarios/

Related

What is the most efficient way of frequently getting the last tweets from 1000+ accounts using Twitter API?

I have a list of approximately 1.500 twitter accounts (that may or may not have tweeted) for which I want to retrieve the last (max 100 tweets) every ~20 minutes. Considering the rate limits of Twitter API v.2, what is the most efficient way of doing this without hitting the rate limits (https://developer.twitter.com/en/docs/twitter-api/rate-limits)?
As far as I understand, there is no way of getting tweets from multiple users at the same time using https://api.twitter.com/2/users/<twitter id>/tweets and iterating through the 1.500 accounts to get the last tweets will make you hit the rate limit of ~900 requests per 15 minutes.
Is there a bulk request that can do this? Is adding them all to a Twitter list and get the latest tweets from there the only real option here?
I am needing this for a Node.js application but the issue is more about how to solve it at a Twitter API level.
The Twitter search API is publicly available at /2/tweets/search/all. You can also use /2/tweets/search/recent.
Using this, you can search from tweets from multiple accounts at once using their OR operator:
(from:twitter OR from:elonmusk)
Returns:
{
"data": [
{
"id": "1540059169771978754",
"text": "we would know"
},
{
"id": "1540058653155278849",
"text": "ratios build character"
},
{
"id": "1539759270501023744",
"text": "RT #NASA: The landmark law #TitleIX opened up a universe of possibility for women, including Janet Petro, the 1st woman director of #NASAKeā€¦"
},
// ...
Note, this has a more strict rate limit, and you will have a limit of how many characters you can use in your search (probably 512).
You can add extra fields like author_id from tweet.fields, if you need them.
If you cannot get by with this, then you may be able to combine API endpoints, since rate limits are applied per-endpoint. For example, search half via the searching endpoint, and the other half via the individual user endpoints.
If this still doesn't work, you're right (from everything that I've found), you will need to either:
Increase your cache time from 20 minutes to something more 30-45 minutes
Create a list

AWS DocumentDB Performance Issue with Concurrency of Aggregations

I'm working with DocumentDB in AWS, and I've been having troubles when I try to read from the same collection simultaneously from different aggregation queries.
The issue is not that I cannot read from the database, but rather that it takes a lot of time to complete the queries. It doesn't matter if I trigger the queries simultaneously or one after the other.
I'm using a Lambda Function with NodeJS to run my code. And I'm using mongoose to handle the connection with the database.
Here's a sample code that I put together to illustrate my problem:
query1() {
return Collection.aggregate([...])
}
query2() {
return Collection.aggregate([...])
}
query3() {
return Collection.aggregate([...])
}
It takes the same time if I run it using Promise.all
Promise.all([ query1(), query2(), query3() ])
Than if I run it waiting for the previous one to finish
query1().then(result1 => query2().then(result3 => query3()))
While if I run each query in different Lambda Executions, it takes significantly less time for each individual query to finish (Between 1 and 2 seconds).
So if they were running in parallel the execution should be finished with the time of the query that takes the most time (2 seconds), and not take 7 seconds, as it does now.
So my guessing is that the instance of DocumentDB is running the queries in sequence no matter how I send them. In the collection there are around 19,000 documents with a total size of almost 25Mb.
When I check the metrics of the instance, the CPUUtilization is barely over 8% and the RAM available only drops by 20Mb. So I don't think the problem of the delay has to do with the size of the instance.
Do you know why DocumentDB is behaving like this? Is there a configuration that I can change to run the aggregations in parallel?

Get Last Value of a Time Series with Azure TimeSeries Insights

How can i query the last (most recent) event along with it's timestamp within a time series?
The approach described here does not work for me as i can not guarantee that the most recent event is within a fixed time window. The event might have been received hours or days ago in my case.
The LAST() function return the last events and the Get Series API should preserve the actual event time stamps according to the documentation but i am a bit confused about the results i am getting back from this API. I get multiple results (sometimes not even sorted by timestamp) and have to find out the latest value on my own.
Also i noticed that the query result does not actually reflect the latest ingested value. The latest ingested value is only contained in the result set if i ingest this value multiple times.
It there any more straight-forward or reliable way to get the last value of a time series with Azure Time Series Insights?
The most reliable way to get the last known value, at the moment, is to use the AggregateSeries API.
You can use the last() aggregation in a variable calculating the last event property and the last timestamp property. You must provide a search span in the query, so you will still have to "guess" when the latest value could have occurred.
Some options are to always have a larger search span than what you may need (e.g. if a sensor sends data every day, you may input a search span of a week to be safe) or use the Availability API to get the time range and distribution of the entire data set across all TSIDs and use that as the search span. Keep in mind that having large search spans will affect performance of the query.
Here's an example of a LKV query:
"aggregateSeries": {
"searchSpan": {
"from": "2020-02-01T00:00:00.000Z",
"to": "2020-02-07T00:00:00.000Z"
},
"timeSeriesId": [
"motionsensor"
],
"interval": "P30D",
"inlineVariables": {
"LastValue": {
"kind": "aggregate",
"aggregation": {
"tsx": "last($event['motion_detected'].Bool)"
}
},
"LastTimestamp": {
"kind": "aggregate",
"aggregation": {
"tsx": "last($event.$ts)"
}
}
},
"projectedVariables": [
"LastValue",
"LastTimestamp"
]
}

Parallel For-Each vs Scatter Gather in mule

I have multiple records :
{
"item_id":1",
key1: "data1"
}
{
item_id:2
key1: "data1"
}
{
item_id:2
key1: "data1"
}
{
item_id:1
key1: "data1"
}
I do not want to process them sequentially.There can be more than 200 records. Should I process them using for-each parallel or scatter-gather. Which approach would be best as per my requirement.
I do not need the accumulated response, but if there is some exception while processing (hit an api for each record based on an if condition) any one of the record,processing of other records must remain unaffected.
Why not then use the VM module, break the collection into its individual records and push them to a VM queue? Then have another flow with a VM listener picking up the individual records (in parallel) and processing them.
Here's more details: https://docs.mulesoft.com/mule-runtime/4.2/reliability-patterns
Scatter-gather is meant for cases where you have a static number of routes. Imagine one route to send to the HR system and another to the Accounting system.
For processing a variable number of records you should use parallel for-each.
Use the foreach async or parallel or jms pattern. Scater-gather receives one payload for all thread and you won't be able to cycle

Hosts.json in Azure Functions 2.0 doesn't seem to respect queues properties

My hosts.json file is this:
{
"version": "2.0",
"extensions": {
"queues": {
"maxPollingInterval": "00:00:02",
"visibilityTimeout": "00:01:00",
"batchSize": 2,
"maxDequeueCount": 2,
"newBatchThreshold": 1
}
}
}
Yet I can see that there are at least 50 concurrent sessions running. Is there a known issue with this or should I be doing something different to be able to limit the amount of concurrent functions running from a queue trigger?
If seen WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT but it doesn't appear to be fully supported yet.
From MSDN - Azure Functions:
batchSize
The number of queue messages that the Functions runtime retrieves simultaneously and processes in parallel. When the number being processed gets down to the newBatchThreshold, the runtime gets another batch and starts processing those messages. So the maximum number of concurrent messages being processed per function is batchSize plus newBatchThreshold. This limit applies separately to each queue-triggered function.
So, the number of concurrent operations is limited by this constraint. In the case of the sample config, that would be at most 3 concurrent operations. However, if there are more messages queued, then they will be processed after the current batch. Depending on the speed at which each message is processed, this can be a large number of messages per second.
As you are monitoring the end result this may appear as there are large number of operations in a very small time window, but they are not necessarily concurrent.

Resources