Errors trying to resize instance group with aws cli - aws-cli

I have a standing EMR cluster and a daily job I want to run. I was trying to use the aws cli to resize the cluster, with the plan of adding this to the crontab so the cluster would grow and then shrink later. (I don't have ability for auto-scaling, so that's out)
I have read the Amazon documentation and the examples they give don't work. I've tried the natural variations, but end up getting nowhere.
According to the documentation, the command is
aws emr modify-instance-groups --instance-groups InstanceGroupId=ig-31JXXXXXXBTO,InstanceCount=4
However, when I try this with my own instance ID, i get:
Error parsing parameter '--instance-groups': Expected: '<second>', received: '<none>' for input:InstanceGroupId=ig-31JXXXXXXBTO,
I've tried doing things like removing the instance count, hoping for more documentation...
aws emr modify-instance-groups --instance-groups InstanceGroupId=ig-WCXEP0AXCGJS
which gives the response
An error occurred (ValidationException) when calling the ModifyInstanceGroups operation: Please provide either an instance count or a list of EC2 instance ids to terminate.
I've tried several variations without luck. Any ideas? Thanks.

I ended up submitting a trouble ticket through Amazon.
The resize command requires that no space occur after the comma. The trouble shooter has reported this behavior and unhelpful error to the developers.
aws emr modify-instance-groups --instance-groups InstanceGroupId=ig-31JXXXXXXBTO,InstanceCount=4
will work, as long as there is no space after the comma. hopefully they'll either fix that or provide a better error message.

Related

AKS + ACI: ProviderFailed "Duplicate volumes 'kube-api-access-XXXXX' are found"

I'm using AKS with virtual nodes (ACI). Until today everything worked fine.
Today when I deploy my pod again, I get this odd error:
Reason: ProviderFailed
Message: api call to https://management.azure.com/subscriptions/blabla/resourceGroups/blabla/providers/Microsoft.ContainerInstance/containerGroups/blabla?api-version=2021-07-01: got HTTP response status code 400 error code "DuplicateVolumes": Duplicate volumes 'kube-api-access-XXXXX' are found.
Of course, the pod is failing in ACI. If I run it on master node (non-virtual node), it works as always.
The image is the same, the code is the same, nothing changed. I also tried a commit from last week (I know it worked) and fails anyway.
I checked, no volume with the same name...
I have opened a ticket about Container Group quota and today they asked me to test it since it would be "completed"... I didn't work and it coincides with the first time I have this error...
Any ideas?
Thanks in advance!

Databricks error:Internal error, sorry. Attach your notebook to a different cluster or restart the current cluster

I'm working in databricks 8.1. If I’m running my Pyspark code line by line using Shift + Enter then my code is running fine but if I’m running the entire notebook using Run All button then I’m getting internal error message. The complete error message is
Internal error, sorry. Attach your notebook to a different cluster or restart the current cluster.
com.databricks.rpc.RPCResponseTooLarge: rpc response (of 20984709 bytes) exceeds limit of 20971520
bytes at com.databricks.rpc.Jetty9Client$$anon$1.onContent(Jetty9Client.scala:370) at
shaded.v9_4.org.eclipse.jetty.client.api.Response$Listener$Adapter.onContent(Response.java:248) at
shaded.v9_4.org.eclipse.jetty.client.ResponseNotifier.notifyContent(ResponseNotifier.java:135) at
shaded.v9_4.org.eclipse.jetty.client.ResponseNotifier.notifyContent(ResponseNotifier.java:126) at
shaded.v9_4.org.eclipse.jetty.client.HttpReceiver.responseContent(HttpReceiver.java:340) at
shaded.v9_4.org.eclipse.jetty.client.http.HttpReceiverOverHTTP.content(HttpReceiverOverHTTP.java:283)
at shaded.v9_4.org.eclipse.jetty.http.HttpParser.parseContent(HttpParser.java:1762) at
shaded.v9_4.org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1490) at
shaded.v9_4.org.eclipse.jetty.client.http.HttpReceiverOverHTTP.parse(HttpReceiverOverHTTP.java:172)
at
shaded.v9_4.org.eclipse.jetty.client.http.HttpReceiverOverHTTP.process(HttpReceiverOverHTTP.java:135)
at
shaded.v9_4.org.eclipse.jetty.client.http.HttpReceiverOverHTTP.receive(HttpReceiverOverHTTP.java:73)
at
shaded.v9_4.org.eclipse.jetty.client.http.HttpChannelOverHTTP.receive(HttpChannelOverHTTP.java:133)
at shaded.v9_4.org.eclipse.jetty.client.http.HttpConnectionOverHTTP.onFillable(HttpConnectionOverHTTP.java:151)
at shaded.v9_4.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at shaded.v9_4.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at shaded.v9_4.org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:426)
at shaded.v9_4.org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:320) at
shaded.v9_4.org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:158) at
shaded.v9_4.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at
shaded.v9_4.org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at
shaded.v9_4.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) at
shaded.v9_4.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
at
shaded.v9_4.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at shaded.v9_4.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
at shaded.v9_4.org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367) at shaded.v9_4.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782) at shaded.v9_4.org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:914) at java.base/java.lang.Thread.run(Thread.java:834)
Several times, I restarted the cluster but still I'm getting same issue
Can you suggest me the steps to resolve this issue?

How to resolve create MySQL 5.5.53 read replica with error InvalidParameterCombination, Status 400?

Today I tried to create read replica for MySQL 5.5.53 RDS, it give me below error
Cannot find version 5.5.53 for mysql (Service: AmazonRDS; Status Code:
400; Error Code: InvalidParameterCombination;
Create read replica in UI version did not worked. I tried there AWS cli mode to create
aws rds create-db-instance-read-replica --db-instance-identifier <read_replica_name> --source-db-instance-identifier <master-server-name> --db-instance-class <class-name> --availability-zone <zone> --no-multi-az --auto-minor-version-upgrade --no-publicly-accessible --vpc-security-group-ids <vpc-id>
And it worked.
I was getting this error today when trying to load the "Modify" page for one of my RDS instances. I discovered that this happens when I navigate to the instance from the "Resources" tab in a CloudFormation stack, but not when I navigate to the instance from the "Instances" list in the RDS console. (The two paths do result in different URLs but what looks like the same page.)
Thought I'd add this in case it's what was behind your error message, or for someone else who searches and finds this question as I did.

"Error: Key not loaded" in h2o deployed through a K3s cluster, using python3 client

I can confirm the 3-replica cluster of h2o inside K3s is correctly deployed, as executing in the Python3 interpreter h2o.init(ip="x.x.x.x") works as expected. I followed the instructions noted here: https://www.h2o.ai/blog/running-h2o-cluster-on-a-kubernetes-cluster/
Nevertheless, I had to modify the service.yaml and comment out the line which says clusterIP: None, as K3s was complaining about something related to its inability to set the clusterIP to None. But even though, I can certify it is working correctly, and I am able to use an external IP to connect to the cluster.
If I try to load the dataset using the h2o cluster inside the K3s cluster using the exact same steps as described here http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html, this is the output that I get:
>>> train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
...
h2o.exceptions.H2OResponseError: Server error java.lang.IllegalArgumentException:
Error: Key not loaded: Key<Frame> https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv
Request: POST /3/ParseSetup
data: {'check_header': '0', 'source_frames': '["https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv"]'}
The same error occurs if I use the h2o.upoad_file("x.csv") method.
There is a clue about what may be happening here: Key not loaded: Key<Frame> while POSTing source frame through ParseSetup in H2O API call but I am not using curl, and I can not find any parameter that could help me overcome this issue: http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/h2o.html?highlight=import_file#h2o.import_file
I need to use the Python client inside the same K3s cluster due to different technical reasons, so I am not able to kick off nor Flow nor Firebug to know what may be happening.
I can confirm it is working correctly when I simply issue a h2o.init(), using the local Java instance.
UPDATE 1:
I have tried in different K3s clusters without success. I changed the service.yaml to a NodePort, and now this is the error traceback:
>>> train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
...
h2o.exceptions.H2OResponseError: Server error java.lang.IllegalArgumentException:
Error: Job is missing
Request: GET /3/Jobs/$03010a2a016132d4ffffffff$_a2366be93ec99a78d7bc161de8c54d67
UPDATE 2:
I have tried using different services (NodePort, LoadBalancer, ClusterIP) and none of them work. I also have tried using Minikube with the official image, and with a custom image made by me, without success. I suspect this is something related to either h2o itself, or the clustering between pods. I will keep digging and let's think there will be some gold in it.
UPDATE 3:
I also found out that the post about running H2O in Docker is really outdated https://www.h2o.ai/blog/h2o-docker/ nor is working the Dockerfile present at GitHub (I changed it to uncomment the ENTRYPOINT section without success): https://github.com/h2oai/h2o-3/blob/master/Dockerfile
Even though, I tried with the custom image I built for h2o-k8s and it is working seamlessly in pure Docker. I am wondering why it is still not working in K8s...
UPDATE 4:
I have tried modifying the environment variable called H2O_KUBERNETES_SERVICE_DNS without success.
In the meantime, the cluster started to be unavailable, that is, the readinessProbe's would not successfully complete. No matter what I change now, it does not work.
I spinned up a K3d cluster in local to see what happened, and surprisingly, the readinessProbe's were not failing, using v3.30.0.6. But now I started testing it with R instead of Python. I am glad I tried, because I may have pinpointed what was wrong. There is a version mismatch between the client and the server. So I updated accordingly the image to v3.30.0.1.
But now again, the readinessProbe is not working in my k3d cluster, so I am unable to test it.
It seems it is working now. R client version 3.30.0.1 with server version 3.30.0.1. Also tried with Python version 3.30.0.7 and server version 3.30.0.7 and it started working. Marvelous. The problem was caused by a version mismatch between the client and the server, as the python client was updated to 3.30.0.7 while the latest server for docker was 3.30.0.6.

Is it possible to update only part of a Glue Job using AWS CLI?

I am trying to include in my CI/CD development the update of the script_location and only this parameter. AWS is asking me to include the required parameters such as RoleArn. How can I only update the part of the job configuration I want to change ?
This is what I am trying to use
aws glue update-job --job-name <job_name> --job-update Command="{ScriptLocation=s3://<s3_path_to_script>}
This is what happens :
An error occurred (InvalidInputException) when calling the UpdateJob operation: Command name should not be null or empty.
If I add the default Command Name glueetl, this is what happens :
An error occurred (InvalidInputException) when calling the UpdateJob operation: Role should not be null or empty.
An easy way to update via CLI a glue-job or a glue-trigger is using --cli-input-json option. In order to use correct json you could use aws glue update-job --generate-cli-skeleton what returns a complete structure to insert your changes.
EX:
{"JobName":"","JobUpdate":{"Description":"","LogUri":"","Role":"","ExecutionProperty":{"MaxConcurrentRuns":0},"Command":{"Name":"","ScriptLocation":"","PythonVersion":""},"DefaultArguments":{"KeyName":""},"NonOverridableArguments":{"KeyName":""},"Connections":{"Connections":[""]},"MaxRetries":0,"AllocatedCapacity":0,"Timeout":0,"MaxCapacity":null,"WorkerType":"G.1X","NumberOfWorkers":0,"SecurityConfiguration":"","NotificationProperty":{"NotifyDelayAfter":0},"GlueVersion":""}}
Well here just fill the name of the job and change the options.
After this you have to transform your json into a one-line json and send into the command using ' '
aws glue update-job --cli-input-json '<one-line-json>'
I hope help someone with this problem too.
Ref:
https://docs.aws.amazon.com/cli/latest/reference/glue/update-job.html
https://w3percentagecalculator.com/json-to-one-line-converter/
I don't know whether you've solved this problem, but I managed using this command:
aws glue update-job --job-name <gluejobname> --job-update Role=myRoleNameBB,Command="{Name=<someupdatename>,ScriptLocation=<local_filename.py>}"
You don't need the the ARN of the role, rather the role name. The example above assumes that you have a role with the name myRoleNameBB and it has access to AWS Glue.
Note: I used a local file on my laptop. Also, the "Name" in "Command" part is also compulsory.
When I run it I go this output:
{
"JobName": "<gluejobname>"
}
Based on what I have found, there is no way to update just part of the job using the update-job API.
I ran into the same issue and I provided the role to get past this error. The command worked but the update-job API actually resets other parameters to defaults such as Type of application, Job Language,Class, Timeout, Max Capacity, etc.
So if your pre-existing job is a Spark Application in scala, it will fail as AWS defaults to Python Shell and python as job language as part of the update-job API. And this API provides no way to set job Language type to scala and set a main class (required in case of scala). It provides a way to set the application type to Spark application.
If you do not want to specify the Role to the update-job API. One approach is to copy the new script with the same name and same location that your pre-existing ETL job uses and then trigger your ETL using start-job API as part of the CI process.
Second approach is to run your ETL directly and force it to use the latest script in the start-job API call:
aws glue start-job-run --job-name <job-name> --arguments=scriptLocation="<path to your latest script>"
The only caveat with the second approach is when you look in the console the ETL job will still be referencing the old script Location. The above command just forces this run of the job to use the latest script which you can confirm by looking in the History tab on the Glue ETL console.

Resources