Connection timed out exception with spark-redshift on EMR

Connection timed out exception with spark-redshift on EMR - apache-spark

I am using spark-redshift library provided by data bricks to read data from a redshift table in Spark. Link: https://github.com/databricks/spark-redshift.
Note: The AWS account for the redshift cluster and the EMR cluster are different in my case.
I am able to connect to redshift using spark-redshift in Spark LOCAL mode. But the same code fails on EMR with the following exception: java.sql.SQLException: Error setting/closing connection: Connection timed out.
I have tried adding Redshift in the inbound rule on the EC2 security group of my EMR cluster but it didn't help. I had used Source as MyIP while doing this.

I found the solution to this using VPC peering: http://docs.aws.amazon.com/AmazonVPC/latest/PeeringGuide/Welcome.html
We connected the redshift and the EMR VPCs using VPC peering and updated the route tables of individual VPCs to accept traffic from IPv4 CIDR of the other VPC. VPC peering can also be done across AWS accounts too. Refer to the link above to get more details.
Once this is done, go to the VPC peering connection in both the accounts and enable DNS resolution from peer VPC. For this, select the VPC peering connection -> go to Actions option at the top -> Select Edit DNS settings -> Select Allow DNS resolution from peer VPC.

I was in a similar situation and rather adding the Redshift in the inbound rule of the EC2 security group of EMR cluster, please add public IP of EMR cluster to redshift's security group and this worked for me. Hope this helps!

Related

EKS node unable to connect to RDS

I have an EKS cluster where I have a Keycloak service that is trying to connect to RDS within the same VPC.
I have also added a inbound rule to the RDS Security Group which allow postgresql from source eksctl-prod-cluster-ClusterSharedNodeSecurityGroup-XXXXXXXXX
When the application tries to connect to RDS i get the following message:
timeout reached before the port went into state "inuse"

I ended up replacing the inbound rule on the RDS Security Group from the eksctl-prod-cluster-ClusterSharedNodeSecurityGroup-XXXXXXXXX with an inbound rule allowing access from the EKS VPC CIDR address instead.

AWS EKS node to access RDS

I have AWS EKS nodes access RDS where I have have whitelisted EKS node's public IPs in RDS's security group. But this is not viable solution because EKS Nodes can get replaced and its public IP can changes with it.
How can I make this EKS node's connecting to RDS more stable ?

Last year we have introduced a new feature to assign Security Groups to Kubernetes pods directly to overcome having to assign them at the node level (to avoid ephemerality problems you call out and to create a more secure environment where only the pod that needs to talk to RDS can do so Vs the ENTIRE node). You can follow this tutorial to configure this feature or refer to the official documentation.

If your eks cluster is in the same vpc as the Rds instance, then you can just whitelist your vpc's private ip-address (cidr) range in RDS security group. If they are in different vpc's, then connect both vpc with vpc-peering and whitelist's eks vpc's IP range in rds security group. Dont use public ip's as they will go through outside AWS network. Instead, always use private connections wherever possible as they are faster, reliable and more secure. If you don't want to whitelist complete cidr Then you can also create a NAT gateway for your eks cluster and make routes for outside traffic going outside the EKS cluster go through that NAT gateway and then you can whitelist NAT's IP in rds security group

Aurora serverles V2 connection

I've created an Aurora MySQL serverless db cluster in AWS and I want to connect to it from my computer using mySQL Workbench. I've entered the endpoint as well as master user and password, however when I try to connect , it hangs for about one minute and then it says that cannot connect (no further info is given).
Also trying to ping the endpoint, it resolves the name but don't get any answer.
I've read all the documentation from AWS but I really cannot find how to connect. In the vpc security group I've enabled all inbound and outbound traffic on all ports and protocols. The AWS doc says to enable public access in DB settings but I cannot find such an option.

You can't give an Amazon Aurora Serverless V1 DB cluster a public IP address. You can access an Aurora Serverless V1 DB cluster only from within a virtual private cloud (VPC), based on the Amazon VPC service. For Aurora Serverless V2 you can make a cluster public. Make sure you have the proper ingress rules set up and enable public access in database configuration. For more information, see Using Amazon Aurora Serverless.
https://aws.amazon.com/premiumsupport/knowledge-center/aurora-private-public-endpoints/ .

Failing to create Azure Databricks cluster because of unreachable instances

I'm trying to create a cluster in Azure Databricks and getting a such error messgae
Resources were not reachable via SSH. If the problem persists, this usually indicates a network environment misconfiguration. Please check your cloud provider configuration, and make sure that Databricks control plane can reach Spark clusters instances.
I have such the default configuration:
Cluster mode: Standard
Pool: None
Runtime version: 5.5 LTS
Autoscaling enabled
Worker Type: Standard_DS3_v2
Driver Type: Standard_DS3_v2
From Logs Analytics I see Azure tried to create virtual machines and without any reason (I suppose because they were unreachable) had to delete all of them.
Did anyone face such issue?

If the issue is temporary, this may be caused by the driver of the virtual machine going down or a networking issue since Azure Databricks was able to launch the cluster, but lost the connection to the instance hosting the Spark driver referring to this. You could try to remove it and create the cluster again.
If the problem persists, this may happen when you have an Azure Databricks workspace deployed to your own VNet. If the virtual network where the workspace is deployed is already peered or has an ExpressRoute connection to on-premises resources, the virtual network cannot make an ssh connection to the cluster node when Azure Databricks is attempting to create a cluster. You could add a user-defined route (UDR) to give the Azure Databricks control plane ssh access to the cluster instances.
For detailed UDR instructions, see Step 3: Create user-defined routes and associate them with your Azure Databricks virtual network subnets. For more VNet-related troubleshooting information, see Troubleshooting
Hope this could help you.

Issue: Instances Unreachable: Resources were not reachable via SSH.
Possible cause: traffic from control plane to workers is blocked. If you are deploying to an existing virtual network connected to your on-premises network, review your setup using the information supplied in Connect your Azure Databricks Workspace to your On-Premises Network.
Reference: Azure Databricks - Troubleshooting
Hope this helps.

Connect Redshift and AWS Lambda located in different regions

I am trying to connect to my Redshift database (located in N. Virginia region) from Lambda function (located in Ireland region). But on trying to establish a connection, I am getting timeout error stating:
"errorMessage": "2019-10-20T13:34:04.938Z 5ca40421-08a8-4c97-b730-7babde3278af Task timed out after 60.05 seconds"
I have closely followed the solution provided to the AWS Lambda times out connecting to RedShift but the main issue is that the solution provided is valid for services located in same VPC (and hence, same region).
On researching further, I came across Inter-region VPC Peering and followed the guidelines provided in AWS Docs. But after configuring VPC Peering also, I am unable to connect to Redshift
Here are some of the details that I think can be useful for understanding the situation:
Redshift cluster is publicly accessible, running port 8192 and has a VPC configured (say VPC1)
Lambda function is located in another VPC (say VPC2)
There is a VPC Peering connection between VPC1 and VPC2
CIDR IPv4 blocks of both VPCs are different and have been added to each other's Route tables (VPC1 has 172.31.0.0/16 range and VPC2 has 10.0.0.0/16 range)
IAM Execution role for Lambda function has Full Access of Redshift service
In VPC1, I have a security group (SG1) which has an inbound rule of type: Redshift, protocol: TCP, port: 5439 and source: 10.0.0.0/16
In VPC2, I am using default security group which has outbound rule of 0.0.0.0/0
In Lambda, I am providing private IP of Redshift (172.31.x.x) as hostname and 5439 as port (not 8192!)
Lambda function is in NodeJS 8.10 and I am using node-redshift package for connecting to Redshift
After all this, I have tried accessing Redshift with both public IP as well as through its DNS name (with port 8192)
Kindly help me out in establishing connection between these services.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Connection timed out exception with spark-redshift on EMR - apache-spark

I was in a similar situation and rather adding the Redshift in the inbound rule of the EC2 security group of EMR cluster, please add public IP of EMR cluster to redshift's security group and this worked for me. Hope this helps!

Related

EKS node unable to connect to RDS

AWS EKS node to access RDS

Aurora serverles V2 connection

Failing to create Azure Databricks cluster because of unreachable instances

Connect Redshift and AWS Lambda located in different regions

Categories

Resources