Performant API to handle many requests - node.js

I am developing an iOS app to show news from a MongoDB database. The app has around 50,000 active users, so it's quite heavy on the server. I am trying to rethink how the API should be built. I have just learned a little about AWS API Gateway, Google Cloud Functions, Firebase, etc.
If I simply need a few functions to extract list of news, list of users, etc., what would be the best way to build this API as of 2017? I have always thought I should simply create a Node.js server with some endpoints. But now it seems it's more performant to create separate endpoints with, for instance, AWS API Gateway which each points to an AWS Lambda function.
But what is really the most scalable option?

Cloudfront (With Caching) --> API Gateway --> Lambda would be a scalable solution. Since you have not chosen DynamoDB, you need to manage mongoose for your storage and avaibility.
To make it bit more advanced, you can use Lambda Edge with Cloudfront to make it active/active across more than one region. So if One region goes down, app will still be available.

When you say "Performant", what does that mean in your scenario?
If you need a few functions that extract a list of news, a list of users and etc, it doesn't sound like performance would be an issue.
If you're wondering if the AWS Serverless Stack can handle thousands or millions of requests, the answer is yes, API Gateway + Lambda can handle requests at any scale.
If you only need to query data by its hash keys, DynamoDB would fit very well! Also, there's an auto scale feature. But, if you need to perform scans or some complex queries, that would be a little bit expensive and some times slow. For this scenario you can stream your DynamoDB data to Elasticsearch or a data warehouse solution.
I'm not sure if combining Lambda with MongoDB would be that great. Let's say you choose a hosted MongoDB service like Mongo Atlas, you'll need to add more complexity to your system like VPC Peering, and also, manage/optmize lambda connections to MongoDB (Optimizing AWS Lambda Performance with MongoDB Atlas). What if you're handling a million request calls, I guess a lot of Lambda Functions will start running in parallel, how many connections would be opened in your MongoDB?
Besides the AWS Serverless Stack, there's the Elastic BeansTalk that can easily scale your system.
All the solutions have pros and cons, you can scale with both, API Gateway + Lambda or Elastic Beanstalk(or some other vendor solution). I guess "scalable" is a built in feature that cloud vendors are offering these days, and "performance" is just one of the topics that should be analyzed when designing your infrastructure.

Related

Microservices on GCP

I am looking to use GCP for a micro-services application. After comparing AWS and GCP I have decided to go with Google because one major requirement for the project is to schedule tasks to run in the future (Cloud Tasks) which AWS does not seem to offer an equivalent of.
I am planning on containerizing my services and deploying to GCP using Cloud Run with a Redis cluster running as well for caching.
I understand that you cannot have multiple Firestore instances running in one project. Does this mean that all if my services will be using the same database?
I was looking to follow a model (possible on AWS) where each service had its own database instance that it reached out to.
Is this pattern possible on GCP?
Firestore indeed is for the moment limited to a single database instance per project. For performance that is usually not a problem, but for isolation such as your use-case, that can indeed be a reason to look elsewhere.
Firebase's original Realtime Database does allow multiple instances per project, and recently added a REST API for provisioning database instances. Since each Google Cloud Project can be toggled to also be a Firebase project, you could consider that.
Does this mean that all if my services will be using the same database?
I don't know all details of your case. Do you think that you can deploy a "microservice" per project? Not ideal, especially if they are to communicate using PubSub, but may be an option. In that case every "microservice" may get its own Firestore if that is a requirement.
I don't think one should consider GCP project as some kind of "hard boundaries". For me they are just another level of granularity - in addition to folders, etc.
There might be some benefits for "one microservice - one project" appraoch as well. For example, less dependent lifecycles, better (more accurate) security, may be simpler development workflows...

Microservices, how to notify backend when task complete

For example, if i have main application (backend) and some microservice, e.g for image cropping.
User loads an image, making request to backend, backend using rabbitmq posts new task in the queue, then image cropping service pickup a task, completes it and i need somehow notify backend.
What is options for this? I need another microservice for such notifications?
so... there are reaaaaaaly many ways to do that.
On the high level, what you want to achieve is to produce an event that 1 or more services can react to. Now depending on what you have available, you can produce the event in a number of different ways.
if you want to be completely platform independent, you can use Apache Kafka. It's a popular service specifically for what we need -> publishing events and processing them at mass-scale. Kafka can be clustered, partitioned, have multiple parallel consumers of the same type (like multiple instances of your main backend service) or different types (3 different microservices that happen to be interested in a specific event). This bad boy just has it all and is famous for that. You can set up a cluster yourself or use one that comes out-of-the-box with some of the cloud platforms (like AWS for instance), but this might be more expensive and difficult to use compared to some cloud-specific fully-managed solutions.
if you're running your stuff on the google cloud, you can make it easier and cheaper by using the PubSub service. PubSub is a fully managed service that is scaled out-of-the-box (welcome to the cloud! you don't need to scale or cluster anything by yourself!).
if you're running on AWS, you can use SNS, or a more recent alternative - EventBridge (kinda like SNS, but booooooy what can it not do?). Yeah... I would recommend EventBridge. It can just do more... with the target filtering rules, payload transformations, it can automatically trigger more things...
Azure... ehm... Event Hub... but I haven't worked with this one yet... I'm not much of an Azurer... because you know... nobody uses azure for this kind of stuff...

How does an api compare to directly querying your database

I am kind of confused about when an API is needed. I have recently created a mobile app with flutter and cloud firestore as the database where i simply queried and wrote to the database when needed. Now i am learning full stack web development and I recently watched a tutorial where he built like an Express API with GET, POST, and DELETE functionality for a simple item in the database.
Coming from a background where i just directly accessed the database i am not sure why an API in this case is necessary, is it so I wouldnt have to rewrite the queries every time? This is a very simple project so he's definitely not making a 3rd party api for other developers to use. Am i misunderstanding what an API does exactly?
It was really simple, there was one collection in a MongoDB database and he was using postman to read and write to and from the database to check if it works.
API is a standard way with which your front-end (web/mobile) stores/gets information for your application. Your front-end can/should not directly access database ever. Understand the purpose of front-end which is to just display the interface and should do minimal processing. All the application logic should be at your backend (API server) which is exposed to your frontend via API (GET, POST etc) calls. So to store an item in your database, you will write data storing logic in your backend, and expose an API end-point which when triggered will perform the storing operation. That API call should be used by your front-end to trigger the storing process. In this way your logic of storing/database or any other thing is not exposed, only the API URL is. The purpose of front-end is to be exposed whereas backend/database should never be exposed and used from front-end
May be for you, an API is not necessary. But, the use-cases of an API is a lot.
For example:
You don't have to write business logic for every platform. (iOS, Android, Web, Whatever)
Your app will be lightweight since some computation would be offloaded to server.
Your app can be reverse engineered to get secret informations. (or, Your secret algorithm may be?)
What if you need to store something in filesystem that you want share with others?
Also a good read: Why we should use REST?
In your case, you are using a pre-written SDK which knows how to connect to Firestore, does caching and updates application data when needed, and provides a standard method of reading, writing and deleting data in Firestore (with associated documentation and example data from google).
Therefore, using an API (as described for the mongoDB) is not required and is undesirable.
There are some cases where you might want to have no read or write access to a firestore collection or document, and in this case, you could write a cloud function which your app calls with parameters, that receives the data that you want to write and does some sort of checking or manipulation beyond the capabilities of cloud firestore rules (although these can get pretty sophisticated). See https://firebase.google.com/docs/firestore/security/get-started
Todd (in the video contained in this link) does a few good videos on this subject.
However, this is not really working in the same was as the API you mentioned in your question.
So in the case of using Firestore, you should use the SDK and not re-invent the wheel by creating your own API.
If you want to share photos for example, you can also store them in firebase storage and then provide a URL for other devices to access them without your app being installed.
If you want to write something to firestore which is then sent to all other users then you can use listeners on each app, and the data will be sent to the apps after it arrives at Firestore.
https://firebase.google.com/docs/firestore/query-data/listen gives an overview of this.
One thing to always look at with firebase is the cost of doing anything. Cloud functions cost more than doing a read of a firestore document.
This gives an overview of pricing for different capabilities within the firebase set of capabilities.
https://firebase.google.com/pricing
Another most important factor is coupling. To add to #Dijkstra API provides a way to decouple the logic from each other, thus allowing for more application reliability, maintainability, fault-tolerance and if required scalability.
Thus there is no right or wrong here, or the comparison of API vs DB call is in itself not justified for the fact that fetching the data from Database is the ultimate aim. Even if you use a REST API or Query a database.
The means to achieve the same can differ based on specific requirements. For example, fetching water from the well.
You can always climb down the well and fetch a bucket of water if you need 1 bucket per day and you are the only user.
But if there are many users you would want to install a pull and wheel where people use it to pour fetched water into their bucket, yet again this will depend if there are 100 users per day using or more than that. As this will not work in the case of more than 100 users.
IF the case is that an entire community of say 1000 user are going to need the water you would go with a more complex solution of installing a motorized water pump to pump out the water and supply it to the user's home via a pipeline. This solution has many benefits like fast supply, easy to use, filtered water, scheduled, etc. But the cost and effort to achieve the solution is higher as well.
All in all, It comes down to the cost-vs-benefit ratio which you and only you can chart out, for different solutions vs the particular problem, as you are the best judge of scale and future user flow.
While doing that you can ask the following question about the solution to help decide :
Is the solution satisfying the primary requirement of the problem?
How much time is it going to take to build it?
For the time we spend to build a solution, is it going to working at more than 75% or more of its capacity?
If not is there a simpler solution that I can use to satisfy the problem and scale it as the requirement increases?
HTH.

EC2 vs Elastic Beanstalk vs Lambda

I have simple API, with connection to DB, calls to FB API etc.
What is the best way to serve it.
1) I have started with EC2 first.
Good: Cheap enough. I can control everything
Bad: Long set up process. Need to control everything. Set up monitoring tools etc by myself. Keep in mind a lot.
2) Next I have moved NodeJS to EB and move DB to RDS.
Good: Just commit a code, all other things handled by service
Bad: Load Balancer + Multiple instance + RDS costs a lot.
3) Lambda, thinking about moving to Lambda + API Gateway setup
It is look easy to implement, monitoring and support
Have no idea how much money it will cost.
I know that there is a lot of configuration inside.
Do you have any suggestion what will be the best for simple API?
Also I thinking about moving only picture generation to Lambda,
and keep simple API like AUTH, GET users etc on EB.
If you are sure that the processing logic does not exceed 5 minutes, then Option 3 will be definitely desired - as you write functions and deploy them in Lambda. No other deployment and auto scaling worries.
Of course, subject to the other factors like dependency on third party libraries for your logic, and compatibility with Lambda underlying image.

How to serve node.js service for worldwide customers and fast?

I have a local VPS that hosting and providing my Node.js REST API in my country.
However soon I will need to open it for different countries.
That means that clients from remote will ask for my services.
Since they are far it will be probably slow connection.
How can I avoid this? Maybe I need more servers located in their countries too, but still, how the data could be shared over one DB?
I do not looking for a full tutorial for how to do that (could be nice to have) but I am looking for get info about the methodology of this.
What do you recommend to do, keep buying servers in remote countries, sharing their data between them someway, or maybe choose to use some cloud service like Firebase? How cloud services work in first place?
Without going into too much detail for each item, here are some keypoints in which I think you should focus your on learning to solve your problem.
For data storage - look into firestore (not the json database) as firestore is globally scaleable.
For your REST endpoints I would use google cloud functions, but without knowing the nature of your application its hard to say if its suitable. The key to being able to reach global scale is having cacheable endpoints. Then you are leveraging google's global CDN which is much faster than hitting the origin server. Note: The firebase cloud functions infrastructure WILL face cold start issues which may/may not be a problem for you.
Cache invalidation is a little lacking so you can leverage longer max-age cache settings but use either cache busing and/or the header stale-while-revalidate to help with this.
There is some great info here https://www.youtube.com/watch?v=dbV-293m1dQ that covers some of what I have mentioned in more detail.

Resources