"Error: Key not loaded" in h2o deployed through a K3s cluster, using python3 client - python-3.x

I can confirm the 3-replica cluster of h2o inside K3s is correctly deployed, as executing in the Python3 interpreter h2o.init(ip="x.x.x.x") works as expected. I followed the instructions noted here: https://www.h2o.ai/blog/running-h2o-cluster-on-a-kubernetes-cluster/
Nevertheless, I had to modify the service.yaml and comment out the line which says clusterIP: None, as K3s was complaining about something related to its inability to set the clusterIP to None. But even though, I can certify it is working correctly, and I am able to use an external IP to connect to the cluster.
If I try to load the dataset using the h2o cluster inside the K3s cluster using the exact same steps as described here http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html, this is the output that I get:
>>> train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
...
h2o.exceptions.H2OResponseError: Server error java.lang.IllegalArgumentException:
Error: Key not loaded: Key<Frame> https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv
Request: POST /3/ParseSetup
data: {'check_header': '0', 'source_frames': '["https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv"]'}
The same error occurs if I use the h2o.upoad_file("x.csv") method.
There is a clue about what may be happening here: Key not loaded: Key<Frame> while POSTing source frame through ParseSetup in H2O API call but I am not using curl, and I can not find any parameter that could help me overcome this issue: http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/h2o.html?highlight=import_file#h2o.import_file
I need to use the Python client inside the same K3s cluster due to different technical reasons, so I am not able to kick off nor Flow nor Firebug to know what may be happening.
I can confirm it is working correctly when I simply issue a h2o.init(), using the local Java instance.
UPDATE 1:
I have tried in different K3s clusters without success. I changed the service.yaml to a NodePort, and now this is the error traceback:
>>> train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
...
h2o.exceptions.H2OResponseError: Server error java.lang.IllegalArgumentException:
Error: Job is missing
Request: GET /3/Jobs/$03010a2a016132d4ffffffff$_a2366be93ec99a78d7bc161de8c54d67
UPDATE 2:
I have tried using different services (NodePort, LoadBalancer, ClusterIP) and none of them work. I also have tried using Minikube with the official image, and with a custom image made by me, without success. I suspect this is something related to either h2o itself, or the clustering between pods. I will keep digging and let's think there will be some gold in it.
UPDATE 3:
I also found out that the post about running H2O in Docker is really outdated https://www.h2o.ai/blog/h2o-docker/ nor is working the Dockerfile present at GitHub (I changed it to uncomment the ENTRYPOINT section without success): https://github.com/h2oai/h2o-3/blob/master/Dockerfile
Even though, I tried with the custom image I built for h2o-k8s and it is working seamlessly in pure Docker. I am wondering why it is still not working in K8s...
UPDATE 4:
I have tried modifying the environment variable called H2O_KUBERNETES_SERVICE_DNS without success.
In the meantime, the cluster started to be unavailable, that is, the readinessProbe's would not successfully complete. No matter what I change now, it does not work.
I spinned up a K3d cluster in local to see what happened, and surprisingly, the readinessProbe's were not failing, using v3.30.0.6. But now I started testing it with R instead of Python. I am glad I tried, because I may have pinpointed what was wrong. There is a version mismatch between the client and the server. So I updated accordingly the image to v3.30.0.1.
But now again, the readinessProbe is not working in my k3d cluster, so I am unable to test it.

It seems it is working now. R client version 3.30.0.1 with server version 3.30.0.1. Also tried with Python version 3.30.0.7 and server version 3.30.0.7 and it started working. Marvelous. The problem was caused by a version mismatch between the client and the server, as the python client was updated to 3.30.0.7 while the latest server for docker was 3.30.0.6.

Related

Airflow can't reach logs from webserver due to 403 error

I use Apache Airflow for daily ETL jobs. I installed it in Azure Kubernetes Service using the provided Helm chart. It's been running fine for half a year, but since recently I'm unable to access the logs in the webserver (this used to always work fine).
I'm getting the following error:
*** Log file does not exist: /opt/airflow/logs/dag_id=analytics_etl/run_id=manual__2022-09-26T09:25:50.010763+00:00/task_id=copy_device_table/attempt=18.log
*** Fetching from: http://airflow-worker-0.airflow-worker.default.svc.cluster.local:8793/dag_id=analytics_etl/run_id=manual__2022-09-26T09:25:50.010763+00:00/task_id=copy_device_table/attempt=18.log
*** !!!! Please make sure that all your Airflow components (e.g. schedulers, webservers and workers) have the same 'secret_key' configured in 'webserver' section and time is synchronized on all your machines (for example with ntpd) !!!!!
****** See more at https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#secret-key
****** Failed to fetch log file from worker. Client error '403 FORBIDDEN' for url 'http://airflow-worker-0.airflow-worker.default.svc.cluster.local:8793/dag_id=analytics_etl/run_id=manual__2022-09-26T09:25:50.010763+00:00/task_id=copy_device_table/attempt=18.log'
For more information check: https://httpstatuses.com/403
What have I tried:
I've made sure that the log file exists (I can exec into the airflow-worker-0 pod and read the file on command line in the location specified in the error).
I've rolled back my deployment to an earlier commit from when I know for sure it was still working, but it made no difference.
I was using webserverSecretKeySecretName in the values.yaml configuration. I changed the secret to which that name was pointing (deleted it and created a new one, as described here: https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#webserver-secret-key) but it didn't work (no difference, same error).
I changed the config to use a webserverSecretKey instead (in plain text), no difference.
My thoughts/observations:
The error states that the log file doesn't exist, but that's not true. It probably just can't access it.
The time is the same in all pods (I double checked be exec-ing into them and typing date in the command line)
The webserver secret is the same in the worker, the scheduler, and the webserver (I double checked by exec-ing into them and finding the corresponding env variable)
Any ideas?
Turns out this was a known bug with the latest release (2.4.0) of the official Airflow Helm chart, reported here:
https://github.com/apache/airflow/discussions/26490
Should be resolved in version 2.4.1 which should be available in the next couple of days.

Issue on Connection with Cordapp to Springboot

I am new to Springboot. Currently, I am working on Corda Blockchain, I am facing some issues in connecting my cordapp through spring-boot.
enter image description here
enter image description here
If anyone have any reference or solution,kindly share it help me to solve this problem
You don't set the values inside NodeRPCConnection; see example here:
https://github.com/corda/samples/blob/release-V4/cordapp-example/clients/src/main/kotlin/com/example/server/NodeRPCConnection.kt
They are injected when you run one of the runPartyServer gradle tasks:
https://github.com/corda/samples/blob/release-V4/cordapp-example/clients/build.gradle
For example to start PartyA webserver, all you have to do is type: ./gradlew runPartyAServer.
Obviously, you must first start your node then run the webserver.

AWS ECS error in prisma container - environment variable PRISMA_CONFIG

I'm new to AWS, and I'm trying to deploy my local web app on AWS using ECR and ECS, but got stuck when running a cluster, it throws the error about the PRISMA_CONFIG environment variable in prisma container.
In my local environment, i'm using docker to build the app using nodejs, prisma and mongodb, it's working fine.
Now on ECS, i created a task definition and for prisma container, i tried to copy the yml config from my local docker-compose.yml file to make it work.
There is field called "ENVIRONMENT", I've inputted the value in the Environment variables, it's just not working and throw the error while the cluster was running, then the task Stopped.
the yml is in multiple lines, but the input box supports string only
the variable key is PRISMA_CONFIG
and the following are the values that i've already tried
| port: 4466\n databases:\n default:\n connector: mongo\n uri: mongodb://prisma:prisma#mongo\n
| \nport: 4466 \ndatabases: \ndefault: \nconnector: mongo \nuri: mongodb://prisma:prisma#mongo
|\nport: 4466\n databases:\n default:\n connector: mongo\n uri: mongodb://prisma:prisma#mongo
\nport: 4466\n databases:\n default:\n connector: mongo\n uri: mongodb://prisma:prisma#mongo
port: 4466\n databases:\n default:\n connector: mongo\n uri: mongodb://prisma:prisma#mongo\n
and the errors
Exception in thread "main" java.lang.RuntimeException: Unable to load Prisma config: java.lang.RuntimeException: No valid Prisma config could be loaded.
expected a comment or a line break, but found p(112)
expected chomping or indentation indicators, but found \(92)
i expected that all containers will run without errors, but actual results are the container stopped after running for a minute.
Please help for this.
or suggest other way to deploy to AWS?
THANK YOU VERY MUCH.
I've been looking for a similar solution to load the prisma config without the multiline string.
There are repositories that load the prisma environment variables separately without a prisma config:
Check out this repo for example:
https://github.com/akoenig/prisma-docker-compose/blob/master/.prisma.env
Here akoenig uses the following env variables using a env_file. So, I'm assuming you can just pass in these environment variables separately to achieve what prisma is looking for.
# CONTENTS OF env_file
PORT=4466
SQL_CLIENT_HOST_CLIENT1=database
SQL_CLIENT_HOST_READONLY_CLIENT1=database
SQL_CLIENT_HOST=database
SQL_CLIENT_PORT=3306
SQL_CLIENT_USER=root
SQL_CLIENT_PASSWORD=prisma
SQL_CLIENT_CONNECTION_LIMIT=10
SQL_INTERNAL_HOST=database
SQL_INTERNAL_PORT=3306
SQL_INTERNAL_USER=root
SQL_INTERNAL_PASSWORD=prisma
SQL_INTERNAL_DATABASE=graphcool
CLUSTER_ADDRESS=http://prisma:4466
SQL_INTERNAL_CONNECTION_LIMIT=10
SCHEMA_MANAGER_SECRET=graphcool
SCHEMA_MANAGER_ENDPOINT=http://prisma:4466/cluster/schema
#CLUSTER_PUBLIC_KEY=
BUGSNAG_API_KEY=""
ENABLE_METRICS=0
JAVA_OPTS=-Xmx1G
This is for a mySQL database. You would need to tailor this to suit your values. But in theory you should just be able to pass these variables one by one into single variables in AWS's GUI.
I've also asked this question on the Prisma Slack channel and am waiting to see if they have other suggestions: https://prisma.slack.com/archives/CA491RJH0/p1569689413383000
Let me know how it goes.
Not and expert here but, have you set up an environment variable PRISMA_API_MANAGEMENT_SECRET you would have defined the secret when you configured your fargate instance.
have a look at the following artical
https://www.prisma.io/tutorials/deploy-prisma-to-aws-fargate-ct14

Unable to start Kudu master

While starting kudu-master, I am getting the below error and unable to start kudu cluster.
F0706 10:21:33.464331 27576 master_main.cc:71] Check failed: _s.ok() Bad status: Invalid argument: Unable to initialize catalog manager: Failed to initialize sys tables async: on-disk master list (hadoop-master:7051, slave2:7051, slave3:7051) and provided master list (:0) differ. Their symmetric difference is: :0, hadoop-master:7051, slave2:7051, slave3:7051
It is a cluster of 8 nodes and i have provided 3 masters as given below in master.gflagfile on master nodes.
--master_addresses=hadoop-master,slave2,slave3
TL;DR
If this is a new installation, working under the assumption that master ip addresses are correct, I believe the easiest solution is to
Stop kudu masters
Nuke the <kudu-data-dir>/master directory
Start kudu masters
Explanation
I believe the most common (if not only) cause of this error (Failed to initialize sys tables async: on-disk master list (hadoop-master:7051, slave2:7051, slave3:7051) and provided master list (:0) differ.) is when a kudu master node gets added incorrectly. The error suggests that kudu-master thinks it's running on a single node rather than 3-node cluster.
Maybe you did not intend to "add a node", but that's most likely what happened. I'm saying this because I had the same problem; after some googling and debugging, I discovered that during the installation, I started kudu-master before putting the correct IP address in master.gflagfile, so that kudu-master was spun up thinking it was running on a single node, not 3 node. Using steps above to clean install kudu-master again, my problem was solved.

DataStax Enterprise Portfolio Manager Demo Fails

I am trying to experiment with Spark and found this demo provided by DataStax:
http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/anaHome/anaPortfolioMgrSpark.html
I was able to run the pricer utility and data populated into created tables without any issues.
Then next command is to start the website which I did:
$ cd website
$ sudo ./start
And then open the url on the browser:
http://localhost:8983/portfolio
The website appears to load but a script error comes up:
fname is undefined
The initial error seems to stem from here:
if (mtype == Thrift.MessageType.EXCEPTION) {
var x = new Thrift.TApplicationException();
x.read(this.input);
this.input.readMessageEnd();
throw x;
}
The error is occurring in thrift.js and is shown in the screenshot below:
Any ideas on how to resolve this? I believe this is probably a bug (or limitation) that must be addressed by the DataStax team. I am using DSE on Ubuntu. My best guess is because I don't use localhost in my cassandra.yaml file and the website gives me no way to specify the IP of my database server. There appears to be a duplicate of the cassandra.yaml file in the website resource directory (maybe it uses that?) but it has the same settings as my primary cassandra.yaml file.

Resources