How to implement node-lru-cache? - node.js

I've developed a real time app with Node.js, Socket.io and mongodb. It has a certain requirement that when a user loads a specific page then some 20000 points with x and y coordinates which are between 2 specific dates are fetched from mongodb and rendered on the map to client. Now if the user again reloads, the process is repeated. I'm confused how to insert these points in cache with what key so that when user reloads, the values from cache are fetched easily with the key.
Any suggestions? Thanks!

you could
completly write your own caching layer
use an existing caching library here (e.g.
lru-cache-module by isaacs,
which probably is the most popular in this field)
could use redis as a cache (there is the
ability to set a TTL for inserted docs) there is already a
mongoose-redis-cache-module,
maybe that helps
and potentially x other solutions. it depends on the scale of your data/number of reqests and so on.

The caching is something your database does for you in this case. MongoDB relies on the operating system's memory-mapped I/O for storage. A general purpose OS will usually keep the most frequently used pages in memory. If you still want to use an additional cache, the obvious key to use for coordinates would be a Geohash.

This library runtime-memcache implements lru and a few other caching schemes in javascript. Works with Node.js and written in Typescript.
It uses modified Doubly Linked List to achieve O(1) for get, set and remove.

Related

Mainline DHT: why hash in ping is different than hash in find_node?

I am working with Mainline DHT implementation. And I saw strange behaviour.
Let’s say I know node IP and port: 1.1.1.1:7777. I send "find_node" request to him with my own node hash as a target. I get 8 nodes from him, let’s say the first one hash is: abcdeabcdeabcdeabcde and IP: 2.2.2.2:8888.
Now I send "ping" request to 2.2.2.2:8888 and that node responses me with completely different hash than I got from 1.1.1.1:7777 in "find_node" response. And I see that is not individual case. What’s going on? Why hashes of the same node from 2 different sources are different? Thanks for answer.
This may be a malicious node that does not keep its node ID consistent in an effort to get into as many routing tables as possible. It might be doing that for data harvesting or DoS amplification purposes.
Generally you shouldn't put too much trust in anything that remote nodes report and sanitize the data. In the case of it not keeping its ID consistent you should remove it from your routing table and disregard results returned in its queries. I have listed a bunch of possible sanitizing approaches beyond BEP42 in the documentation of my own DHT implementation.
Another possibility is that the node B simply changed its ID in the meantime (e.g. due to a restart) and node A either has not updated it yet or does not properly keep track of ID changes. But this shouldn't be happening too frequently.
And I see that is not individual case.
In total I would only expect this behavior from a tiny total fraction of the network. So you should compare the number of unique IP addresses sending bogus responses to the number of unique IPs sending sane ones. It's easy to get these kinds of statistics wrong if your implementation is naive and gets trapped by malicious nodes to contact even more malicious nodes.
But during a lookup you may see this more frequently during the terminal phase when you get polluted data from nodes that do not sanitize their routing table properly. As one example old libtorrent versions did not (see related issue; note that I'm not singling out libtorrent here, many implementations are crappy in this area).
It can be, that the 2.2.2.2:8888 does not know its external port / address or it didn’t update it yet. Thus different hashes..

A way to quickly purge very large list of URLs from Varnish

My Varnish server caches a maps tile server, which is updated real-time from OpenStreetMap every 1 minute. Frequently, an entire area of the map needs to be invalidated -- i.e. 10,000 or even 100,000 tiles at once. Each tile is a single URL (no variances).
Is there an efficient way to run such large scale Varnish invalidation? Ideally objects should remain in cache (so that grace period would continue to work unless a URL flag of nograce is passed in), but marked as no longer valid. Ideally this tight loop would be implemented in VCL itself.
Example URL: http://example.org/tile/10/140/11.pbf (no variance, no query portion) where the numbers are {zoom}/{x}/{y}, and the list of those numbers (i.e. 100,000 at a time) is generated externally every minute and stored in a file. BTW, most likely most of those URLs won't even be in cache.
The answer depends a lot on how those URLs look like. Options are:
Using multiple soft purges [1] (beware of the 'soft' part; you'll need the purge VMOD for that) triggered by an external loop (sorry, you cannot do that in VCL). Soft purges set TTL to 0 instead of fully removing objects from the storage.
Using a simple ban [2]. However, bans will completely (and lazily) remove matching objects from the storage (i.e. there is not 'soft' flavour for bans).
Using the xkey VMOD [3]. The VMOD provides a 'soft' invalidation option, but not sure if a surrogate index would help for your use case.
[1] https://varnish-cache.org/docs/trunk/reference/vmod_purge.html
[2] https://varnish-cache.org/docs/trunk/users-guide/purging.html#bans
[3] https://github.com/varnish/varnish-modules/blob/master/docs/vmod_xkey.rst

Couchdb view generation performance

How to avoid slow requests on frequently updated view in couchdb , when returned "up-to-date" is not important, what I am talking is probably caching , and wondering is there any out of the box solution without involving third party software like "nginx cache"
What I've tried is set compression to 0,
[{db_fragmentation, "0%"}, {view_fragmentation, "0%"}] yet the views sometimes take 30+ seconds to be available for the consumer.
Adding &update=false on the end of url seems to do the job
I am "relaxed" again

ArangoDB: Is it bad if data or index does not fit in RAM anymore?

Dear ArangoDB community,
I am wondering if it is bad when we use ArangoDB, and if unfortunately the data + index is grown too large and does not fit anymore in the RAM. What's happening then? Does it ruin the system performance horribly?
For TokuMX, which is a very fascinating Fork of MongoDB (TokuMX offers ACID Transactions which I need), they say thats NO problem at all! Even TokuMX clearly says on their internet site that its no big deal for TokuMX if data + index does not fit in RAM.
Also, for MongoDB respective for TokuMX we can limit the RAM usage by some commands.
For my web project I would like to decide which database I am going to use, I dont want to change later. The RAM of my database server is not more than 500MB right now, and it is used concurrenctly by the NodeJS server. So both sit on one server.
On that server I have 1 Nodejs Server and 2 Database instances running. Hereby I compare TokuMX and ArangoDb with the TOP command in linux to check RAM usage. Both databeses just have a tiny collection for testing. And by the TOP command in Linux is says to me: ArangoDB: RES: 128 MByte in use, and for TokuMX it says only 9 MB (!!) Res means: actual, really used physical RAM I found out. So, the difference is already huge... thanks and many greetings. Also the virtual ram usage is enormously different. 5 Gb for arangodb. and just 300 MB for tokumx.
ArangoDB is a database plus an API sever. It uses V8 toifif extend the functionality of the database and define new APIs. In its default configuration on my Mac, it is using 3 scheduler threads and 5 V8 threads. This gives a resident size 81M. Setting the scheduler to 1 (which is normally enough) reduces this to 80M. Setting the V8 instances to two
./bin/arangod -c etc/relative/arangod.conf testbase3 --scheduler.threads 1 --server.threads 2
reduces this to 34M, setting it to 1 reduces it to 24M. Therefore presumably the database itself uses less than 14M.
Any mostly memory database (be it mongodb or arangodb or redis) will have a decreased performance, if the data and index grow too large. It might be OK, if your index still fits into memory and your working set of data is small enough. But if your working set becomes too large, your server will begin to swap and performance will decrease. This problem will exists in any type of mostly memory database like ArangoDB and MongoDB.
One additional question: Is it possible to use two different databases on one ArangoDB instance for you? It would be more memory efficient, to start just one server with two databases in it.
hello and thanks for your answer. Well, I will use one ArangoDB instance on my back server with Node instance only, as you recommended. No mongodb there anymore. I am sure I will have some questions in the future again. So far, I have to say, Arangosh is not that easy to hendle like mongo-shell. Its shell commands are a little bit cryptic to me. but the ACID transactions and the ability to run javascript on server side for production is a very big plus and really cool! And actually thats the reason why I use ArangoDB. Right now, from my Nodejs server I start the ACID transactions, which then sends the action and parameters to ArangoDb. And to minimize the action block I created a javascript module and I put it on ArangoDb in the directory you told me last time, and thats great. That little selfprogrammed javascript module does all the ACID transactions. But, do you agree, I probably could increase performance even more by programming an FOXX app for doing ACID transactions? Because right now I think everytime I call that selfmade module it needs to get loaded first bevor it can perform the transactions. And the foxx applications stays forever in RAM, and does not get reloaded for every hit. Do you agree?

open trip planner scalibility (or an alternative library)

I am playing around with open trip planner novadays. Actually I will use some parts of the library like base public transportation functions, without bike, roads, streets, etc... functionality. I will only provide bus/subway stops, times, and routes information to the library.
As far as I understood OTP uses a Graph.obj file which could built from a custom route and street data. The process loads all the Graph.obj data into memory when application gets started.
My concern is; if I have a huge route data, then I will probably need to create a huge Graph.obj file from data itself. Then the process will load all the data into memory, and this will eat all my memory.
Question; Does OTP scalible? Are there any way to provide source data from a database? Or something already implemented in a database like mySql, postgreSql, etc..? And what are the other open source alternatives which I can scale my application?
From the data point of view most of the open source routing library use the same approach: generation of a custom data-structure (e.g. file for OpenTripPlanner) and memory loading.
How big it is your map?
Anyway here are a couple of alternatives:
http://graphhopper.com (in Java, very simple and lightweight)
http://project-osrm.org (in C++, probably the fastest one)
Take a look also here:
http://wiki.openstreetmap.org/wiki/Routing/online_routers#comparison_matrix

Resources