Redis Node - Querying a list of 250k items of ~15 bytes takes at least 10 seconds

Redis Node - Querying a list of 250k items of ~15 bytes takes at least 10 seconds - node.js

I'd like to query a whole list of 250k items of ~15 bytes each.
Each item (some coordinates) is a 15 bytes string like that xxxxxx_xxxxxx_xxxxxx.
I'm storing them using this function :
function setLocation({id, lat, lng}) {
const str = `${id}_${lat}_${lng}`
client.lpush('locations', str, (err, status) => {
console.log('pushed:', status)
})
}
Using nodejs, doing a lrange('locations', 0, -1) takes between 10 seconds and 15 seconds.
Slowlog redis lab:
I tried to use sets, same results.
According to this post
This shouldn't take more than a few milliseconds.
What am I doing wrong here ?
Update:
I'm using an instance on Redis lab

Related

Lost Clients PBI

I am trying to get the number of lost clients per month. The code I'm using for the measure is set forth next:
`LostClientsRunningTotal =
VAR currdate = MAX('Date'[Date])
VAR turnoverinperiod=[Turnover]
VAR clients=
ADDCOLUMNS(
Client,
"Turnover Until Now",
CALCULATE([Turnover],
DATESINPERIOD(
'Date'[Date],
currdate,
-1,
MONTH)),
"Running Total Turnover",
[RunningTotalTurnover]
)
VAR lostclients=
FILTER(clients,
[Running Total Turnover]>0 &&
[Turnover Until Now]=0)
RETURN
IF(turnoverinperiod>0,
COUNTROWS(lostclients))`
The problem is that I'm getting the running total and the result it returns is the following:
enter image description here
What I need is the lost clients per month so I tried to use the dateadd function to get the lost clients of the previous month and then subtract the current.
The desired result would be, for Nov-22 for instance, 629 (December running total) - 544 (November running total) = 85.
For some reason the **dateadd **function is not returning the desired result and I can't make head or tails of it.
Can you tell me how should I approach this issue please? Thank you in advance.

Push aggregated metrics to the Prometheus Pushgateway from clustered Node JS

I've ran my node apps in a cluster with Prometheus client implemented. And I've mutated the metrics (for testing purpose) so I've got these result:
# HELP otp_generator_generate_failures A counter counts the number of generate OTP fails.
# TYPE otp_generator_generate_failures counter
otp_generator_generate_failures{priority="high"} 794
# HELP otp_generator_verify_failures A counter counts the number of verify OTP fails.
# TYPE otp_generator_verify_failures counter
otp_generator_verify_failures{priority="high"} 802
# HELP smsblast_failures A counter counts the number of SMS delivery fails.
# TYPE smsblast_failures counter
smsblast_failures{priority="high"} 831
# HELP send_success_calls A counter counts the number of send API success calls.
# TYPE send_success_calls counter
send_success_calls{priority="low"} 847
# HELP send_failure_calls A counter counts the number of send API failure calls.
# TYPE send_failure_calls counter
send_failure_calls{priority="high"} 884
# HELP verify_success_calls A counter counts the number of verify API success calls.
# TYPE verify_success_calls counter
verify_success_calls{priority="low"} 839
# HELP verify_failure_calls A counter counts the number of verify API failure calls.
# TYPE verify_failure_calls counter
verify_failure_calls{priority="high"} 840
FYI: I have adopted the given example from: https://github.com/siimon/prom-client/blob/master/example/cluster.js
Then, I pushed the metrics and everything work fine at the process. But, when I checked the gateway portal, it gives different result from those metrics above.
The one that becomes my opinion is the pushed metrics came from single instance instead of aggregated metrics from few instances running. Isn't it?
So, has anyone ever solved this problem?
Oh, this is my push code:
const { Pushgateway, register } = require('prom-client')
const promPg = new Pushgateway(
config.get('prometheus.pushgateway'),
{
timeout: 5000
},
register
)
promPg.push({ jobName: 'sms-otp-middleware' }, (err, resp, body) => {
console.log(`Error: ${err}`)
console.log(`Body: ${body}`)
console.log(`Response status: ${resp.statusCode}`)
res.status(200).send('metrics_pushed')
})

DocumentDB performance issues

When running from DocumentDB queries from C# code on my local computer a simple DocumentDB query takes about 0.5 seconds in average. Another example, getting a reference to a document collection takes about 0.7 seconds in average. Is this to be expected? Below is my code for checking if a collection exists, it is pretty straight forward - but is there any way of improving the bad performance?
// Create a new instance of the DocumentClient
var client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey);
// Get the database with the id=FamilyRegistry
var database = client.CreateDatabaseQuery().Where(db => db.Id == "FamilyRegistry").AsEnumerable().FirstOrDefault();
var stopWatch = new Stopwatch();
stopWatch.Start();
// Get the document collection with the id=FamilyCollection
var documentCollection = client.CreateDocumentCollectionQuery("dbs/"
+ database.Id).Where(c => c.Id == "FamilyCollection").AsEnumerable().FirstOrDefault();
stopWatch.Stop();
// Get the elapsed time as a TimeSpan value.
var ts = stopWatch.Elapsed;
// Format and display the TimeSpan value.
var elapsedTime = String.Format("{0:00} seconds, {1:00} milliseconds",
ts.Seconds,
ts.Milliseconds );
Console.WriteLine("Time taken to get a document collection: " + elapsedTime);
Console.ReadKey();
Average output on local computer:
Time taken to get a document collection: 0 seconds, 752 milliseconds
In another piece of my code I'm doing 20 small document updates that are about 400 bytes each in JSON size and it still takes 12 seconds in total. I'm only running from my development environment but I was expecting better performance.

In short, this can be done end to end in ~9 milliseconds with DocumentDB. I'll walk through the changes required, and why/how they impact results below.
The very first query always takes longer in DocumentDB because it does some setup work (fetching physical addresses of DocumentDB partitions). The next couple requests take a little bit longer to warm the connection pools. The subsequent queries will be as fast as your network (the latency of reads in DocumentDB is very low due to SSD storage).
For example, if you modify your code above to measure, for example 10 readings instead of just the first one like shown below:
using (DocumentClient client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey))
{
long totalRequests = 10;
var database = client.CreateDatabaseQuery().Where(db => db.Id == "FamilyRegistry").AsEnumerable().FirstOrDefault();
Stopwatch watch = new Stopwatch();
for (int i = 0; i < totalRequests; i++)
{
watch.Start();
var documentCollection = client.CreateDocumentCollectionQuery("dbs/"+ database.Id)
.Where(c => c.Id == "FamilyCollection").AsEnumerable().FirstOrDefault();
Console.WriteLine("Finished read {0} in {1}ms ", i, watch.ElapsedMilliseconds);
watch.Reset();
}
}
Console.ReadKey();
I get the following results running from my desktop in Redmond against the Azure West US data center, i.e. about 50 milliseconds. These numbers may vary based on the network connectivity and distance of your client from the Azure DC hosting DocumentDB:
Finished read 0 in 217ms
Finished read 1 in 46ms
Finished read 2 in 51ms
Finished read 3 in 47ms
Finished read 4 in 46ms
Finished read 5 in 93ms
Finished read 6 in 48ms
Finished read 7 in 45ms
Finished read 8 in 45ms
Finished read 9 in 51ms
Next, I switch to Direct/TCP connectivity from the default of Gateway to improve the latency from two hops to one, i.e., change the initialization code to:
using (DocumentClient client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey, new ConnectionPolicy { ConnectionMode = ConnectionMode.Direct, ConnectionProtocol = Protocol.Tcp }))
Now the operation to find the collection by ID completes within 23 milliseconds:
Finished read 0 in 197ms
Finished read 1 in 117ms
Finished read 2 in 23ms
Finished read 3 in 23ms
Finished read 4 in 25ms
Finished read 5 in 23ms
Finished read 6 in 31ms
Finished read 7 in 23ms
Finished read 8 in 23ms
Finished read 9 in 23ms
How about when you run the same results from an Azure VM or Worker Role also running in the same Azure DC? The same operation completes with about 9 milliseconds!
Finished read 0 in 140ms
Finished read 1 in 10ms
Finished read 2 in 8ms
Finished read 3 in 9ms
Finished read 4 in 9ms
Finished read 5 in 9ms
Finished read 6 in 9ms
Finished read 7 in 9ms
Finished read 8 in 10ms
Finished read 9 in 8ms
Finished read 9 in 9ms
So, to summarize:
For performance measurements, please allow for a few measurement samples to account for startup/initialization of the DocumentDB client.
Please use TCP/Direct connectivity for lowest latency.
When possible, run within the same Azure region.
If you follow these steps, you can get great performance and you'll be able to get the best performance numbers with DocumentDB.

Mnesia pagination with fragmented table

I have a mnesia table configured as follow:
-record(space, {id, t, s, a, l}).
mnesia:create_table(space, [ {disc_only_copies, nodes()},
{frag_properties, [ {n_fragments, 400}, {n_disc_copies, 1}]},
{attributes, record_info(fields, space)}]),
I have at least 4 million records for test purposes on this table.
I have implemented something like this Pagination search in Erlang Mnesia
fetch_paged() ->
MatchSpec = {'_',[],['$_']},
{Record, Continuation} = mnesia:activity(async_dirty, fun mnesia:select/4, [space, [MatchSpec], 10000, read], mnesia_frag).
next_page(Cont) ->
mnesia:activity(async_dirty, fun() -> mnesia:select(Cont) end, mnesia_frag).
When I execute the pagination methods it brings batch between 3000 and 8000 but never 10000.
What I have to do to bring me the batches consistently?

The problem is that you expect mnesia:select/4, which is documented as:
select(Tab, MatchSpec, NObjects, Lock) -> transaction abort | {[Object],Cont} | '$end_of_table'
to get you the NObjects limit, being NObjects in your example 10,000.
But the same documentation also says:
For efficiency the NObjects is a recommendation only and the result may contain anything from an empty list to all available results.
and that's the reason you are not getting consistent batches of 10,000 records, because NObjects is not a limit but a recommendation batch size.
If you want to get your 10,000 records you won't have no other option that writing your own function, but select/4 is written in this way for optimization purposes, so most probably the code you will be written will be slower than the original code.
BTW, you can find the mnesia source code on https://github.com/erlang/otp/tree/master/lib/mnesia/src

Using reduce with a composite key in couchdb view returns no result on GET

I have a couchdb view with the following map function:
function(doc) {
if (doc.date_of_operation) {
date_triple = doc.date_of_operation.split("/");
d = new Date(date_triple[2], date_triple[1]-1, date_triple[0], 0, 0, 0, 0)
emit([d, doc.name], 1);
}
}
When I issue a GET request for this, I get the whole view's data (2.8MB):
$ curl -X GET http://somehost:5984/ops-db/_design/ops-views/_view/counts
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2751k 0 2751k 0 0 67456 0 --:--:-- 0:00:41 --:--:-- 739k
However, when I add a reduce function:
function (key, values, rereduce) {
return sum(values);
}
I no longer get any data when using curl:
$ curl -X GET http://somehost:5984/ops-db/_design/ops-views/_view/counts
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 42 0 42 0 0 7069 0 --:--:-- --:--:-- --:--:-- 8400
The result looks like this:
{"rows":[
{"key":null,"value":27065}
]}
This view & map & reduce functions were added using the Futon interface and when the Reduce checkbox is checked there, I do get one row for every 'date, name' pairs with values accumulated for that pair. What changes when queried through a GET?

When you calling the view through curl you can try passing in the necessary parameters for triggering the reduce and grouping
e.g.
Explicitly tell CouchDB to run the reduce function
$ curl -X GET http://somehost:5984/ops-db/_design/ops-views/_view/counts?reduce=true
Or the group and group_level params
You can read more on the available options Here (under Querying Options section)

The reduce should look like this
_sum
So a simple "view" would look like this:
{
"_id": "_design/foo",
"_rev": "2-6145338c3e47cf0f311367a29787757c",
"language": "javascript",
"views": {
"test1": {
"map": "function(doc) {\n emit(null, 1);\n}",
"reduce": "_sum"
}
}
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Redis Node - Querying a list of 250k items of ~15 bytes takes at least 10 seconds - node.js

Related

Lost Clients PBI

Push aggregated metrics to the Prometheus Pushgateway from clustered Node JS

DocumentDB performance issues

Mnesia pagination with fragmented table

Using reduce with a composite key in couchdb view returns no result on GET

Categories

Resources