Why is bcryptjs slower on AWS Lambda than in local docker? - node.js

I have Lambda written in NodeJS. I noticed it takes several seconds to complete. I added logs and found that bcrypt is quite slow.
Packages:
"dependencies": {
"bcryptjs": "^2.4.3",
Source code:
const bcrypt = require('bcryptjs');
console.log("User was found"); // following part takes more than 1 second!
if (bcrypt.compareSync(password, user.password)) {
console.log("Password verified"); //
This is a log from AWS LogWatch:
2020-01-13T20:25:30.951 User was found
2020-01-13T20:25:32.670 Password verified
and
2020-01-13T20:31:20.192 User was found
2020-01-13T20:31:21.550 Password verified
So it takes 1.7 seconds. I ran the same code in docker on my machine
2020-01-13T20:09:48.109 User was found
2020-01-13T20:09:48.211 Password verified
Locally it takes just 120 ms. AWS uses NodeJS 10.x, local docker image is probably 8.x. I do not know how to tell docker to reflect changes in packaged.yaml.
Is this NodeJS regression? Or some issue on AWS configuration?

Encryption performance is typically CPU bound. AWS Lambda CPU is proportional to RAM, so you should choose the largest (3008 MB) and re-test.
When I run this inside a Lambda function handler on a 3008 MB RAM Lambda in us-east-1, the compareSync call consistently takes 90-100ms. With a 128 MB Lambda, it takes a little over 1s.
On a related note, it's helpful to understand that choosing the lowest (128 MB) RAM option, simply because it is cheaper per GB-s, is not always the best thing to do. While the highest RAM option (with proportionally higher CPU and network) is certainly more expensive per GB-s, it also completes Lambda functions a lot quicker. So, for example, you might be able to complete your task in 1/10th of the time at only 1.75x the cost. In a lot of situations, that can be very valuable.
There is a project that can help you tune price/performance for your Lambdas: alexcasalboni/aws-lambda-power-tuning

I have the same issue , this is because you are using bcryptjs library try to use bcrypt is much much faster . Bcryptjs use plain javascipt thats why is too slow in the other hand bcrypt use c++ extensions

Related

Why are mongodb queries to a localhost instance of mongo so much faster than to a cloud instance?

I'm using this code to run the tests outlined in this blog post.
(For posterity, relevant code pasted at the bottom).
What I've found is that if I run these experiments with a local instance of Mongo (in my case, using docker)
docker run -d -p 27017:27017 -v ~/data:/data/db mongo
Then I get pretty good performance, similar results as outlined in the blog post:
finished populating the database with 10000 users
default_query: 277.986ms
query_with_index: 262.886ms
query_with_select: 157.327ms
query_with_select_index: 136.965ms
lean_query: 58.678ms
lean_with_index: 65.777ms
lean_with_select: 23.039ms
lean_select_index: 21.902ms
[nodemon] clean exit - waiting
However, when I switch do using a cloud instance of Mongo, in my case an Atlas sandbox instance, with the following configuration:
CLUSTER TIER
M0 Sandbox (General)
REGION
GCP / Iowa (us-central1)
TYPE
Replica Set - 3 nodes
LINKED STITCH APP
None Linked
(Note that I'm based in Melbourne, Australia).
Then I get much worse performance.
adding 10000 users to the database
finished populating the database with 10000 users
default_query: 8279.730ms
query_with_index: 8791.286ms
query_with_select: 5234.338ms
query_with_select_index: 4933.209ms
lean_query: 13489.728ms
lean_with_index: 10854.134ms
lean_with_select: 4906.428ms
lean_select_index: 4710.345ms
I get that obviously there's going to be some round trip overhead between my computer and the mongo instance, but I would expect that to add 200ms max.
It seems that that round trip time must be being added multiple times, or something completely else that I'm not aware of - can someone explain just what it is that would cause this to blow out?
A good answer might involve doing an explain plan, and explaining that in terms of network latency.
Tests against different Atlas instances - For those suggesting the issue is that I'm using a Sandbox instance of Atlas - here is the results for a M20 and M30 instances:
BACKUPS
Active
CLUSTER TIER
M20 (General)
REGION
GCP / Iowa (us-central1)
TYPE
Replica Set - 3 nodes
LINKED STITCH APP
None Linked
BI CONNECTOR
Disabled
adding 10000 users to the database
finished populating the database with 10000 users
default_query: 9015.309ms
query_with_index: 8779.388ms
query_with_select: 4568.794ms
query_with_select_index: 4696.811ms
lean_query: 7694.718ms
lean_with_index: 7886.828ms
lean_with_select: 3654.518ms
lean_select_index: 5014.867ms
BACKUPS
Active
CLUSTER TIER
M30 (General)
REGION
GCP / Iowa (us-central1)
TYPE
Replica Set - 3 nodes
LINKED STITCH APP
None Linked
BI CONNECTOR
Disabled
adding 10000 users to the database
finished populating the database with 10000 users
default_query: 8268.799ms
query_with_index: 8933.502ms
query_with_select: 4740.234ms
query_with_select_index: 5457.168ms
lean_query: 9296.202ms
lean_with_index: 9111.568ms
lean_with_select: 4385.125ms
lean_select_index: 4812.982ms
These really don't show any significant difference (be aware than any difference may just be network noise).
Tests colocating the Mongo client and the mongo database instance
I created a docker container and ran it on Google's Cloud Run, in the same region (US Central1), the results are:
2019-12-30 11:46:06.814 AEDTfinished populating the database with 10000 users
2019-12-30 11:46:07.885 AEDTdefault_query: 1071.233ms
2019-12-30 11:46:08.917 AEDTquery_with_index: 1031.952ms
2019-12-30 11:46:09.375 AEDTquery_with_select: 457.659ms
2019-12-30 11:46:09.657 AEDTquery_with_select_index: 281.678ms
2019-12-30 11:46:10.281 AEDTlean_query: 623.417ms
2019-12-30 11:46:10.961 AEDTlean_with_index: 680.622ms
2019-12-30 11:46:11.056 AEDTlean_with_select: 94.722ms
2019-12-30 11:46:11.148 AEDTlean_select_index: 91.984ms
So while this doesn't give results as fast as running on my own machine - it does show that colocating the client and the database gives a very large performance improvement.
So the question again is - why is the improvement ~7000ms?
The test code:
(async () => {
try {
await mongoose.connect('mongodb://localhost:27017/perftest', {
useNewUrlParser: true,
useCreateIndex: true
})
await init()
// const query = { age: { $gt: 22 } }
const query = { favoriteFruit: 'potato' }
console.time('default_query')
await User.find(query)
console.timeEnd('default_query')
console.time('query_with_index')
await UserWithIndex.find(query)
console.timeEnd('query_with_index')
console.time('query_with_select')
await User.find(query)
.select({ name: 1, _id: 1, age: 1, email: 1 })
console.timeEnd('query_with_select')
console.time('query_with_select_index')
await UserWithIndex.find(query)
.select({ name: 1, _id: 1, age: 1, email: 1 })
console.timeEnd('query_with_select_index')
console.time('lean_query')
await User.find(query).lean()
console.timeEnd('lean_query')
console.time('lean_with_index')
await UserWithIndex.find(query).lean()
console.timeEnd('lean_with_index')
console.time('lean_with_select')
await User.find(query)
.select({ name: 1, _id: 1, age: 1, email: 1 })
.lean()
console.timeEnd('lean_with_select')
console.time('lean_select_index')
await UserWithIndex.find(query)
.select({ name: 1, _id: 1, age: 1, email: 1 })
.lean()
console.timeEnd('lean_select_index')
process.exit(0)
} catch (err) {
console.error(err)
}
})()
My best guess is that you're dealing with slow network throughput between your local machine and Atlas (something I've experienced myself this week - hence how I found this post!)
Looking at your local query performance:
default_query: 277.986ms
query_with_index: 262.886ms
The query with index isn't noticeably any faster than the one without. For an indexed query to take 262ms in a Node app with a local DB probably means that either:
The index isn't being used properly OR more likely...
You're returning quite a few results in the query. If the query returns say 3,000 results and each result is 1KB, that's 3MB of JSON data that your app needs to handle.
I've got a 150Mbit/s internet connection and yet my throughput to Atlas (M2 shared tier, if that makes a difference) fluctuates between around 1Mbit/s to 6Mbit/s.
On localhost I have a Mongo query that returns 2,400 results for a total of 1.7MB of JSON data. The roundtrip time for that query in my Node app (using console.time() like you did) connected to Mongo on the same local dev machine is ~150ms. But when connecting that local app to Atlas the query takes 2,400ms to 3,400ms to return. When I profiled the query on Atlas it only took 2ms to execute, so the query itself is really fast, it's apparently the data transfer that's slow.
Based on these results, I have a feeling that Atlas perhaps throttles throughput over the public internet (or just doesn't bother optimizing for it in their network) because 99% of apps are colocated in the same network region as their Atlas DB. That's the reason why they ask you to pick not just AWS, Azure, etc but your specific network region when creating a cluster.
UPDATE: I just ran a few Amazon EC2 speed tests for my network region (us-east-1) using a 3rd-party service and the average download speed was 4.5Mbit/s for smaller files (1KB to 128KB) and 41Mbit/s for larger files (256KB to 10MB). So the primary issue may be generally slow throughput on the EC2 instances that Atlas clusters run on rather than any throttling by Atlas, or perhaps a combination of both.
Usually, It takes a little bit of time for a request to propagate over the network. this depends on the connection speed, latency, and distance to the server and so many factors. but the server on your local computer doesn't face above mentioned issues as it is for a cloud environment.
But since you are confident about the max delay due to network propagations is ~200ms.
There may be several other possible reasons also to consider
Usually, sandbox plans are for testing and they have limited resources allocated to them.
They don't use SSD drives to store data and uses cheap storage solutions.
They assume that sandbox plans are usually just for exploring features.
Most of the times those instances are run on shared virtual machines.
Make sure there are no other services running on your computer which consumes a higher data rate eg :( torrent applications )
Cloud services depend on a variety of metrics like System Availability, Response Time, Throughput, Latency and many more...
If the average response time of the user base and the data centers is located in the same region then the average overall response time is about 50ms but if located in the different region the response time significantly increases from 200ms - 400ms which can also depend upon the type of instance you're using and the region which you choose.
Since you're using the Atlas Sandbox cluster you must first select the nearest region to avoid poor performance issues as Atlas Sandbox clusters do have it's own limitations. If you're looking for quick response time and faster performance try to upgrade your instance.
If you are sure that it's not about network issues like latency and bandwidth vs response size, then it's either low edge host (non-SSD, low RAMs) or misconfigured web server/proxy, or there is throttling/filtering happening to your traffic.
To narrow it down more use encrypted (https) connection (it's easy, just install letsencrypt on your server) and try to use VPN to change your network route.
Also you can try running the script directly on the server to measure actual executing performance.
Of course you have to consider that your network delay is for each request to the cloud instance , so if you have a ping time of +30ms , you will take 30ms more for each query (approximately) , moreover if your instance is a sandbox ( free account https://docs.atlas.mongodb.com/tutorial/deploy-free-tier-cluster/ ) you will have a poor and shared CPU/RAM.
This is why your mongo db queries are slow.
Making a system faster in production is one of the design goals
We need to take into the account many variables:
Networking, for example, VPC/subnetting
MongoDB Storage (SSD)
MongoDB Indexes
MongoDB RAM, CPU
Node Web Servers or Cluster
Cluud Tenants
TLS encryption
You may need to discard each and every single possible bottleneck

Slow Firebase Firestore cold starts in Cloud Functions

On a cold start (after deploying or after 3hrs) the function to request a document from Firestore takes an incredible amount of time which is different to when it's used rapidly.
Cold Start:
Function execution took 4593 ms, finished with status code: 200
Rapid fire (me sending using the same function over and over):
Function execution took 437 ms, finished with status code: 200
My code for getting the documents is quite simple:
function getWorkspaceDocument(teamSpaceId) {
return new Promise((resolve, reject) => {
var teamRef = db.instance.collection('teams').doc(teamSpaceId);
teamRef.get().then(doc => {
if (doc.exists) {
resolve(doc.data());
return;
}
else {
reject(new Error("Document cant be found"));
return;
}
}).catch(error => {
reject(new Error("Document cant be found"));
});
});
}
I'm trying to make a Slack bot and the slow returns on Firebase Firestore throw time outs in Slacks API. Is there a way on Firebase to stop cold starts from happening and letting it persist through out?
If the cloud function needs to start a new instance your cold start time seems normal. This is one drawback of a serverless function.
I think there is a problem with your implementation. Could you show more details?
Here is a nice little video about this topic:
https://youtu.be/v3eG9xpzNXM
firebaser here
We actually just released a new preferRest API that should considerably improve the cold start times for Cloud Functions that use Firestore. The documentation for it is not very complete, but you can enable the feature with:
import { initializeFirestore }
from 'firebase-admin/firestore';
const app = initializeApp();
const firestore = initializeFirestore(app,
{ preferRest: true }); // 👈
firestore.collection(...);
With preferRest: true the Firestore Admin SDK uses the REST transport layer by default, and it then only loads and uses the gRPC libraries when it encounters an operation that needs them.
Since the gRPC libraries are quite big and the only operation that requires gRPC is creating a snapshot listener, this should reduce the cold start times for most Cloud Functions implementations significantly.
I haven't had a chance to test this myself yet and there are still some known issues, so YMMV and I'd love to hear specifics on what performance change you see from this.
Also see:
the written release note about this option
the Release Notes video where this is mentioned
Another thing I would suggest checking is the amount of memory allocated to a particular function. Each level selected increases non only the RAM, but the CPU frequency as well (and the costs, be careful and don't forget about the pricing calculator!). There is a direct dependency between the package size of your function and the cold-start (source: https://mikhail.io/serverless/coldstarts/gcp/).
I can see that you are using the Firestore admin package, which is not considered to be lightweight (source: https://github.com/firebase/firebase-admin-node/issues/238). Thus, 128MB configuration might not be enough.
For our project increasing the RAM from 128MB to 512MB decreased the cold boot 10x from 20 seconds to 2.5seconds on average. Be sure not to overlook this in case you have several dependencies (7 in our case).

Node JS memory management on ARM single board

I am using sharp image processing module to resize the image and render it on UI.
app.get('/api/preview-small/:filename',(req,res)=>{
let filename = req.params.filename;
sharp('files/' + filename)
.resize(200, 200, {
fit: sharp.fit.inside,
withoutEnlargement: true
})
.toFormat('jpeg')
.toBuffer()
.then(function(outputBuffer) {
res.writeHead('200',{"Content-Type":"image/jpeg"});
res.write(outputBuffer);
res.end();
});
});
I am running above code on a single board computer Rock64 with 1 GB ram. When I run a Linux htop command and monitor the memory utilization, I could see the memory usage is adding up exponentially from 10% to 60% after every call to the nodejs app and it never comes down.
CPU USAGE
Though it does not give any issue running the application, my only concern is memory usage does not come down, even when the app is not running and I am not sure if this will crash the application eventually if this application runs continuously.
or if I move a similar code snippet to the cloud will it keep occupying memory even when it's not running?
Anyone who is using sharp module facing the similar issue or is this a known issue with node.js. Do we have a way to flush out/clear out the memory or will node do garbage collection?
Any help is appreciated. Thanks
sharp has some memory debugging stuff built in:
http://sharp.dimens.io/en/stable/api-utility/#cache
You can control the libvips cache, and get stats about resource usage.
The node version has a very strong effect on memory behaviour. This has been discussed a lot on the sharp issue tracker, see for example:
https://github.com/lovell/sharp/issues/429
Or perhaps:
https://github.com/lovell/sharp/issues/778

Firebase Dynamic Pages using Cloud Functions with Firestore are slow

We have dynamic pages being served by Firebase Cloud Functions, but the TTFB is very slow on these pages with TTFB of 900ms - 2s, at first we just assumed it to be a cold start issue, but even with consistent traffic it is very slow at TTFB of 700ms - 1.2s.
This is a bit problematic for our project since it is organic traffic dependent and Google Pagespeed would need a server response of less than 200ms.
Anyway, we tried to check what might be causing the issue and we pinpointed it with Firestore, when a Cloud Function accesses Firestore, we noticed there are some delays. This is a basic sample code of how we implement Cloud Function and Firestore:
dynamicPages.get('/ph/test/:id', (req, res) => {
var globalStartTime = Date.now();
var period = [];
db.collection("CollectionTest")
.get()
.then((querySnapshot) => {
period.push(Date.now() - globalStartTime);
console.log('1', period);
return db.collection("CollectionTest")
.get();
})
.then((querySnapshot) => {
period.push(Date.now() - globalStartTime);
console.log('2', period);
res.status(200)
.send('Period: ' + JSON.stringify(period));
return true;
})
.catch((error) => {
console.log(error);
res.end();
return false;
});
});
This is running on Firebase + Cloud Functions + NodeJS
CollectionTest is very small with only 100 documents inside, with each document having the following fields:
directorName: (string)
directorProfileUrl: (string)
duration: (string)
genre: (array)
posterUrl: (string)
rating: (string)
releaseDate: (string)
status: (int)
synopsis: (string)
title: (string)
trailerId: (string)
urlId: (string)
With this test, we would get the following results:
[467,762] 1.52s
[203,315] 1.09s
[203,502] 1.15s
[191,297] 1.00s
[206,319] 1.03s
[161,267] 1.03s
[115,222] 843ms
[192,301] 940ms
[201,308] 945ms
[208,312] 950ms
This data is [Firestore Call 1 Exectution Time, Firestore Call 2 Exectution Time] TTFB
If we check the results of the test, there are signs that the TTFB is getting lower, maybe that is when the Cloud Function has already warmed up? But even so, Firestore is eating up 200-300ms in the Cloud Function based on the results of our second Firestore Call and even if Firestore took lesser time to execute, TTFB would still take up 600-800ms, but that is a different story.
Anyway, can anyone help how we can improve Firestore performance in our Cloud Functions (or if possible, the TTFB performance)? Maybe we are doing something obviously wrong that we don't know about?
I will try to help, but maybe lacks a bit of context about what you load before returning dynamicPages but here some clues:
First of all, the obvious part (I have to point it anyway):
1 - Take care how you measure your TTFB:
Measuring TTFB remotely means you're also measuring the network
latency at the same time which obscures the thing TTFB is actually
measuring: how fast the web server is able to respond to a request.
2 - And from Google Developers documentation about Understanding Resource Timing (here):
[...]. Either:
Bad network conditions between client and server, or
A slowly responding server application
To address a high TTFB, first cut out as much network as possible.
Ideally, host the application locally and see if there is still a big
TTFB. If there is, then the application needs to be optimized for
response speed. This could mean optimizing database queries,
implementing a cache for certain portions of content, or modifying
your web server configuration. There are many reasons a backend can be
slow. You will need to do research into your software and figure out
what is not meeting your performance budget.
If the TTFB is low locally then the networks between your client and
the server are the problem. The network traversal could be hindered by
any number of things. There are a lot of points between clients and
servers and each one has its own connection limitations and could
cause a problem. The simplest method to test reducing this is to put
your application on another host and see if the TTFB improves.
Not so obvious ones:
You can take a look at the official Google documentation regarding Cloud Functions Performance here: https://cloud.google.com/functions/docs/bestpractices/tips
Did you require some files before?
According to this answer from Firebase cloud functions is very slow: Firebase cloud functions is very slow:
Looks like a lot of these problems can be solved using the hidden
variable process.env.FUNCTION_NAME as seen here:
https://github.com/firebase/functions-samples/issues/170#issuecomment-323375462
Are these dynamic pages loaded being accessed by a guest user or a logged user? Because maybe the first request has to sort out the authentication details, so it's known to be slower...
If nothing of this works, I will take a look at the common performance issues like DB connection (here: Optimize Database Performance), improving server configuration, cache all you can and take care of possible redirections in your app...
To end, reading through Internet, there are a lot of threads with your problem (low performance on simple Cloud Functions). Like this one: https://github.com/GoogleCloudPlatform/google-cloud-node/issues/2374 && in S.O: https://stackoverflow.com/search?q=%5Bgoogle-cloud-functions%5D+slow
With comments like:
since when using cloud functions, the penalty is incurred on each http
invocation the overhead is still very high (i.e. 0.8s per HTTP call).
or:
Bear in mind that both Cloud Functions and Cloud Firestore are both in
beta and provide no guarantees for performance. I'm sure if you
compare performance with Realtime Database, you will see better
numbers.
Maybe it is still an issue.
Hope it helps!

Firebase Functions - ERROR, but no Event Message in Console

I have written a function on firebase that downloads an image (base64) from firebase storage and sends that as response to the user:
const functions = require('firebase-functions');
import os from 'os';
import path from 'path';
const storage = require('firebase-admin').storage().bucket();
export default functions.https.onRequest((req, res) => {
const name = req.query.name;
let destination = path.join(os.tmpdir(), 'image-randomNumber');
return storage.file('postPictures/' + name).download({
destination
}).then(() => {
res.set({
'Content-Type': 'image/jpeg'
});
return res.status(200).sendFile(destination);
});
});
My client calls that function multiple times after one another (in series) to load a range of images for display, ca. 20, of an average size of 4KB.
After 10 or so pictures have been loaded (amount varies), all other pictures fail. The reason is that my function does not respond correctly, and the firebase console shows me that my function threw an error:
The above image shows that
A request to the function (called "PostPictureView") suceeds
Afterwards, three requests to the controller fail
In the end, after executing a new request to the "UserLogin"-function, also that fails.
The response given to the client is the default "Error: Could not handle request". After waiting a few seconds, all requests get handled again as they are supposed to be.
My best guesses:
The project is on free tier, maybe google is throttling something? (I did not hit any limits afaik)
Is there a limit of messages the google firebase console can handle?
Could the tmpdir from the functions-app run low? I never delete the temporary files so far, but would expect that either google deletes them automatically, or warns me in a different way that the space is running low.
Does someone know an alternative way to receive the error messages, or has experienced similar issues? (As Firebase Functions is still in Beta, it could also be an error from google)
Btw: Downloading the image from the client (android app, react-native) directly is not possible, because I will use the function to check for access permissions later. The problem is reproducable for me.
In Cloud Functions, the /tmp directory is backed by memory. So, every file you download there is effectively taking up memory on the server instance that ran the function.
Cloud Functions may reuses server instances for repeated calls to the same function. This means your function is downloading another file (to that same instance) with each invocation. Since the names of the files are different each time, you are accumulating files in /tmp that each occupy memory.
At some point, this server instance is going to run out of memory with all these files in /tmp. This is bad.
It's a best practice to always clean up files after you're done with them. Better yet, if you can stream the file content from Cloud Storage to the client, you'll use even less memory (and be billed even less for the memory-hours you use).
After some more research, I've found the solution: The Firebase Console seems to not show all error information.
For detailed information to your functions, and errors that might be omitted in the Firebase Console, check out the website from google cloud functions.
There I saw: The memory (as suggested by #Doug Stevensson) usage never ran over 80MB (limit of 256MB) and never shut the server down. Moreover, there is a DNS resolution limit for the free tier, that my application hit.
The documentation points to a limit of DNS resolutions: 40,000 per 100 seconds. In my case, this limit was never hit - firebase counts a total executions of roundabout 8000 - but it seems there is a lower, undocumented limit for the free tier. After upgrading my account (I started the trial that GCP offers, so actually not paying anything) and linking the project to the billing account, everything works perfectly.

Resources