How to replicate data in Kafka to the cloud securely

How to replicate data in Kafka to the cloud securely - security

We are currently running two Kafka clusters. One is inside our corporate network and another is in AWS. We need to stream data from the internal Kafka to the cluster in AWS. Mirror Maker is currently our first choice based on simplicity. I've been investigating on how to do this in a secure fashion. I would prefer not to place Kafka on a public subnet as it seems that is not a secure option but, some limitations of Kafka have made this somewhat difficult. Namely that Kafka producers/consumers need to be able to see all instances of the target cluster on the network. This means that all Kafka instances in AWS need a public IP address.
The alternative is to place Mirror Maker in AWS, but then how to we expose our internal servers to the web in a secure manner? This seems like it might be a common use case, but I cannot find anything on the web that relates. Can someone provide recommendations or correct me if any of my assumptions are incorrect?

Related

How to connect to an existing Kafka MSK Cluster step by step?

I would like to know more about how to connect to a Kafka MSK Cluster.

You need a Kafka client.
Since you have node.js, start with kafkajs or node-rdkafka. I'd recommend kafkajs since it supports SASL consumption with AWS IAM
Beyond that, MSK is an implementation detail, and any steps will work for all Kafka clusters (depending on other security implementations). However, cloud providers will require that you allow your client IP address space within the VPC, and you can follow the official getting started guide for verifying a basic client will work.

Azure WebService - MySQL - Redis configuration

I am creating a WebService with C# Core 3.0 that is using MySQL and Redis, but I am not so familiar with Azure so I need advice about configuring everything.
I had MySQL hosted on AWS, but I am transferring it to Azure because I think that performance (speed) will be better on Azure because they will be on same data center. Right?
But, on my MySQL page Host is like '*.mysql.database.azure.com'. That means that every connection will go out of Azure, and than come back? I don't have some local IP for connection? Same question for Redis.
Do I need to configure some local network on Azure and will that impact speed on the app? And, is MySQL a good choice for Azure or should I try with another one?
I am just reading about Azure Virtual Networks. But as I understand it, VN's sole purpose is to isolate elements from the outside network?

You will get better performance if your my-sql instance and your app service are in the same region (basically the same data centre).
The connection string is mysql.database.azure.com, but remember the connection will be a TCP/IP connection, so the DNS lookup will realise that this address mysql.database.azure.com is in the same region (same data center). Then the TCP/IP connection will go to an internal IP.
You could use tcpping in your app service's kudo console to try this and see the result.
The basic rule is that you should group your app and database in the same region for better performance and cheaper cost (as Microsoft doesn't charge traffic within the same region).
Azure Virtual network is for a different purpose. For example, if you have some on premise database servers and you want to call these servers from azure, then VM could be helpful. But for the scenario you described, it is not really needed.
The company I work for has Microsoft azure support included, and if you or your company have support contract with them, you can raise questions directly to them and get really quick responses.

Microservices Architecture in NodeJS

I was working on a side project and i deiced to redesign my Skelton project to be as Microservices, so far i didn't find any opensource project that follow this pattern. After a lot of reading and searching i conclude to this design but i still have some questions and thought.
Here are my questions and thoughts:
How to make the API gateway smart enough to load balnce the request if i have 2 node from the same microservice?
if one of the microservice is down how the discovery should know?
is there any similar implementation? is my design is right?
should i use Eureka or similar things?

Your design seems OK. We are also building our microservice project using API Gateway approach. All the services including the Gateway service(GW) are containerized(we use docker) Java applications(spring boot or dropwizard). Similar architecture could be built using nodejs as well. Some topics to mention related with your question:
Authentication/Authorization: The GW service is the single entry point for the clients. All the authentication/authorization operations are handled in the GW using JSON web tokens(JWT) which has nodejs libray as well. We keep authorization information like user's roles in the JWT token. Once the token is generated in the GW and returned to client, at each request the client sends the token in HTTP header then we check the token whether the client has the required role to call the specific service or the token has expired. In this approach, you don't need to keep track user's session in the server side. Actually there is no session. The required information is in the JWT token.
Service Discovery/ Load balance: We use docker, docker swarm which is a docker engine clustering tool bundled in docker engine (after docker v.12.1). Our services are docker containers. Containerized approach using docker makes it easy to deploy, maintain and scale the services. At the beginning of the project, we used Haproxy, Registrator and Consul together to implement service discovery and load balancing, similar to your drawing. Then we realized, we don't need them for service discovery and load balancing as long as we create a docker network and deploy our services using docker swarm. With this approach you can easily create isolated environments for your services like dev,beta,prod in one or multiple machines by creating different networks for each environment. Once you create the network and deploy services, service discovery and load balancing is not your concern. In same docker network, each container has the DNS records of other containers and can communicate with them. With docker swarm, you can easily scale services, with one command. At each request to a service, docker distributes(load balances) the request to a instance of the service.

Your design is OK.
If your API gateway needs to implement (and thats probably the case) CAS/ some kind of Auth (via one of the services - i. e. some kind of User Service) and also should track all requests and modify the headers to bear the requester metadata (for internal ACL/scoping usage) - Your API Gateway should be done in Node, but should be under Haproxy which will care about load-balancing/HTTPS
Discovery is in correct position - if you seek one that fits your design look nowhere but Consul.
You can use consul-template or use own micro-discovery-framework for the services and API-Gateway, so they share end-point data on boot.
ACL/Authorization should be implemented per service, and first request from API Gateway should be subject to all authorization middleware.
It's smart to track the requests with API Gateway providing request ID to each request so it lifecycle could be tracked within the "inner" system.
I would add Redis for messaging/workers/queues/fast in-memory stuff like cache/cache invalidation (you can't handle all MS architecture without one) - or take RabbitMQ if you have much more distributed transaction and alot of messaging
Spin all this on containers (Docker) so it will be easier to maintain and assemble.
As for BI why you would need a service for that? You could have external ELK Elastisearch, Logstash, Kibana) and have dashboards, log aggregation, and huge big data warehouse at once.

Making locally hosted server accessible ONLY by AWS hosted instances

Our system has 3 main components:
A set of microservices running in AWS that together comprise a webapp.
A very large monolithic application that is hosted within our network, and comprises of several other webapps, and exposes a public API that is consumed by the AWS instances.
A locally hosted (and very large) database.
This all works well in production.
We also have a testing version of the monolith that is inaccessible externally.
I would like to able to spin up any number of copies of the AWS environment for testing or demo purposes that can access the demo testing version of the monolith. However, because it's a test system, it needs to remain inaccessbile to the public. I know how to achieve this with AWS easily enough (security groups etc.), but how can I secure the monolith so it can be accessed ONLY by any number of dynamically created instances running in AWS (given that the IP addresses are dynamic and can therefore not be whitelisted)?
The only idea I have right now is to use an access token, but I'm not sure how secure that is.
Edit - My microservices are each running on an EC2 instance.

Assuming you are running your microservices on EC2, if you want API calls from your application servers running in AWS to come from a known IP/IPs then this can be accomplished by using a NAT instance or a proxy. This way even though your application servers are dynamic, the apparent source of the requests is not.
For a NAT you would run your EC2 instances in a private subnet and configure them to send all of their Internet traffic out over the NAT instance which will have a constant IP. Using a proxy server or fleet of proxy servers can be accomplished in much the same way, but would require your microservice applications be configured to use it.
The better approach would be to simply not send the traffic to your microservices over the public Internet.
This can be accomplished by establishing a VPN from your company network to your VPC. Alternatively, you could establish a Direct Connect to bridge the networks.
Side note, if your microservices are actually running in AWS Lambda then this answer does not apply.

Cassandra across multiple Azure datacentres

trying to figure out how to create Cassandra cluster in Azure across more than one datacentre.
I am not much interested in Cassandra topology settings yet, but more in how to set Azure endpoints or inter-datacentre communication to allow nodes to connect remotely. Do I need to set endpoint for node communication inside one datacentre?
What about security of azure endpoints?
Thank you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string