How to setup RPC Node of solana? - rpc

I want to setup a full node of solana not a validator or voter node jsut to get the data of blockchain on local machine how could I do it?

If you want a local RPC node, know that the specs required are very high, currently 12 cores, 256GB RAM, and 1TB of NVME SSD space. More info at https://docs.solana.com/running-validator/validator-reqs
If you want to run an RPC node, the only additional command line argument that you must provide is --no-voting, and you don't need the voting args, so for example, you'd run:
solana-keygen new -o identity.json
solana-validator \
--rpc-port 8899 \
--entrypoint entrypoint.devnet.solana.com:8001 \
--limit-ledger-size \
--log ~/solana-validator.log \
--no-voting \
--identity identity.json
Otherwise, you can follow all of the instructions at https://docs.solana.com/running-validator/validator-start

Related

How to get basic Spark program running on Kubernetes

I'm trying to get off the ground with Spark and Kubernetes but I'm facing difficulties. I used the helm chart here:
https://github.com/bitnami/charts/tree/main/bitnami/spark
I have 3 workers and they all report running successfully. I'm trying to run the following program remotely:
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("spark://<master-ip>:<master-port>").getOrCreate()
df = spark.read.json('people.json')
Here's the part that's not entirely clear. Where should the file people.json actually live? I have it locally where I'm running the python code and I also have it on a PVC that the master and all workers can see at /sparkdata/people.json.
When I run the 3rd line as simply 'people.json' then it starts running but errors out with:
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
If I run it as '/sparkdata/people.json' then I get
pyspark.sql.utils.AnalysisException: Path does not exist: file:/sparkdata/people.json
Not sure where I go from here. To be clear I want it to read files from the PVC. It's an NFS share that has the data files on it.
Your people.json file needs to be accessible to your driver + executor pods. This can be achieved in multiple ways:
having some kind of network/cloud drive that each pod can access
mounting volumes on your pods, and then uploading the data to those volumes using --files in your spark-submit.
The latter option might be the simpler to set up. This page discusses in more detail how you could do this, but we can shortly go to the point. If you add the following arguments to your spark-submit you should be able to get your people.json on your driver + executors (you just have to choose sensible values for the $VAR variables in there):
--files people.json \
--conf spark.kubernetes.file.upload.path=$SOURCE_DIR \
--conf spark.kubernetes.driver.volumes.$VOLUME_TYPE.$VOLUME_NAME.mount.path=$MOUNT_PATH \
--conf spark.kubernetes.driver.volumes.$VOLUME_TYPE.$VOLUME_NAME.options.path=$MOUNT_PATH \
--conf spark.kubernetes.executor.volumes.$VOLUME_TYPE.$VOLUME_NAME.mount.path=$MOUNT_PATH \
--conf spark.kubernetes.executor.volumes.$VOLUME_TYPE.$VOLUME_NAME.options.path=$MOUNT_PATH \
You can always verify the existence of your data by going inside of the pods themselves like so:
kubectl exec -it <driver/executor pod name> bash
(now you should be inside of a bash process in the pod)
cd <mount-path-you-chose>
ls -al
That last ls -al command should show you a people.json file in there (after having done your spark-submit of course).
Hope this helps!

How to run Cardano Wallet?

I have installed cardano-wallet using this documentation. Everything is OK, Just I don't know how to run it so I can have interaction with it via node js:
const { WalletServer } = require('cardano-wallet-js');
let walletServer = WalletServer.init('http://127.0.0.1:1337/v2');
async function test() {
let information = await walletServer.getNetworkInformation();
console.log(information);
}
test()
Does's anyone have an idea?
According to IOHK documentation, prior to running a server you have to run a node:
cardano-node run \
--topology ~/cardano/config/mainnet-topology.json \
--database-path ~/cardano/db/ \
--socket-path ~/cardano/db/node.socket \
--host-addr 127.0.0.1 \
--port 1337 \
--config ~/cardano/config/mainnet-config.json
And after that call a serve command with appropriate flags:
cardano-wallet serve \
--port 8090 \
--mainnet \
--database ~/cardano/wallets/db \
--node-socket $CARDANO_NODE_SOCKET_PATH
If you need more details, read my medium post.
you have to run cardano node in order query blockchain.
follow this article
https://developers.cardano.org/docs/get-started/cardano-wallet-js
you have to first download this file docker-compose.yml
wget https://raw.githubusercontent.com/input-output-hk/cardano-wallet/master/docker-compose.yml
then run your node either testnet or mainnet by this command
NETWORK=testnet docker-compose up
then you can able to connect with blockchain
ref - https://github.com/tango-crypto/cardano-wallet-js

Dataproc cluster creation fails with free Google Cloud credits

I am using the free credits of Google Cloud. I followed Dataproc tutorial but when I am running the following command I have an error regarding the storage capacity.
gcloud beta dataproc clusters create ${CLUSTER_NAME} \
--region=${REGION} \
--zone=${ZONE} \
--image-version=1.5 \
--master-machine-type=n1-standard-4 \
--worker-machine-type=n1-standard-4 \
--bucket=${BUCKET_NAME} \
--optional-components=ANACONDA,JUPYTER \
--enable-component-gateway \
--metadata 'PIP_PACKAGES=google-cloud-bigquery google-cloud-storage' \
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh
Do you have any idea how to fix this? I changed n1-standard-4 to n1-standard-1 but I could not fix it. However, when I removed --image-version=1.5 the command works. Does it create any problem for the rest of the program?
Also from the web interface when I click on JupyterLab link, I can not see Python 3 icon among the kernels available on my Dataproc cluster. I only have Python 2 and it keeps saying connection with the server is gone.
Here is picture of JupyterLab error:
You are seeing an error regarding storage capacity because in 1.5 image version Dataproc uses bigger 1000 GiB disks for master and worker nodes to improve performance. You can reduce disk size by using --master-boot-disk-size=100GB and --worker-boot-disk-size=100GB command flags:
gcloud beta dataproc clusters create ${CLUSTER_NAME} \
--region=${REGION} \
--zone=${ZONE} \
--image-version=1.5 \
--master-machine-type=n1-standard-4 \
--master-boot-disk-size=100GB \
--worker-machine-type=n1-standard-4 \
--worker-boot-disk-size=100GB \
--bucket=${BUCKET_NAME} \
--optional-components=ANACONDA,JUPYTER \
--enable-component-gateway \
--metadata 'PIP_PACKAGES=google-cloud-bigquery google-cloud-storage' \
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh
When you removed --image-version=1.5 flag this command used default 1.3 image version that does not support Python 3 by default, that's why you are not seeing Python 3 kernel in JupyterLab.

Docker run cant find google authentication "oauth2google.DefaultTokenSource: google: could not find default credentials"

Hey there I am trying to figure out why i keep getting this error when running the docker run command. Here is what i am running
docker run -p 127.0.0.1:2575:2575 -v ~/.config:/home/.config gcr.io/cloud-healthcare-containers/mllp-adapter /usr/mllp_adapter/mllp_adapter --hl7_v2_project_id=****** --hl7_v2_location_id=us-east1 --hl7_v2_dataset_id=*****--hl7_v2_store_id=*****--export_stats=false --receiver_ip=0.0.0.0
I have tried both ubuntu and windows with an error that it failed to connect and to see googles service authentication documentation. I have confirmed the account is active and the keys are exported to the config below
randon#ubuntu-VM:~/Downloads$ gcloud auth configure-docker
WARNING: Your config file at [/home/brandon/.docker/config.json] contains these credential helper entries:
{
"credHelpers": {
"gcr.io": "gcloud",
"us.gcr.io": "gcloud",
"eu.gcr.io": "gcloud",
"asia.gcr.io": "gcloud",
"staging-k8s.gcr.io": "gcloud",
"marketplace.gcr.io": "gcloud"
}
I am thinking its something to do with the -v command on how it uses the google authentication. Any help or guidance to fix, Thank you
-v ~/.config:/root/.config is used to give the container access to gcloud credentials;
I was facing the same for hours and I decided check the source code even I not being a go developer.
So, there I figured out the we have a credentials option to set the credentials file. It's not documented for now.
The docker command should be like:
docker run \
--network=host \
-v ~/.config:/root/.config \
gcr.io/cloud-healthcare-containers/mllp-adapter \
/usr/mllp_adapter/mllp_adapter \
--hl7_v2_project_id=$PROJECT_ID \
--hl7_v2_location_id=$LOCATION \
--hl7_v2_dataset_id=$DATASET_ID \
--hl7_v2_store_id=$HL7V2_STORE_ID \
--credentials=/root/.config/$GOOGLE_APPLICATION_CREDENTIALS \
--export_stats=false \
--receiver_ip=0.0.0.0 \
--port=2575 \
--api_addr_prefix=https://healthcare.googleapis.com:443/v1 \
--logtostderr
Don't forget to put your credentials file inside your ~/.config folder.
Here it worked fine. I hope helped you.
Cheers

PySpark Job fails with workflow template

To follow with this question I decided to try the workflow template API.
Here's what it looks like :
gcloud beta dataproc workflow-templates create lifestage-workflow --region europe-west2
gcloud beta dataproc workflow-templates set-managed-cluster lifestage-workflow \
--master-machine-type n1-standard-8 \
--worker-machine-type n1-standard-16 \
--num-workers 6 \
--cluster-name lifestage-workflow-cluster \
--initialization-actions gs://..../init.sh \
--zone europe-west2-b \
--region europe-west2 \
gcloud beta dataproc workflow-templates add-job pyspark gs://.../main.py \
--step-id prediction \
--region europe-west2 \
--workflow-template lifestage-workflow \
--jars gs://.../custom.jar \
--py-files gs://.../jobs.zip,gs://.../config.ini \
-- --job predict --conf config.ini
The template is correctly created.
The job works when I run it manually from one of my already existing clusters. It also runs when I use an existing cluster instead of asking the workflow to create one.
The thing is I want the cluster to be created before running the job and deleted just after, that's why I'm using a managed cluster.
But with the managed cluster I just can't make it run. I tried to use the same configuration as my existing clusters but it doesn't change anything.
I always get the same error.
Any idea why my job runs perfectly except for when it is run from a generated cluster ?
The problem came from the version of the managed cluster.
By default the image version was 1.2.31 and my existing cluster was using the image 1.2.28. When I changed the config to add --image-version=1.2.28 it worked.
Dataproc image 1.2.31 Upgraded Spark to 2.2.1 which introduced [SPARK-22472]:
SPARK-22472: added null check for top-level primitive types. Before
this release, for datasets having top-level primitive types, and it
has null values, it might return some unexpected results. For example,
let’s say we have a parquet file with schema , and we read it
into Scala Int. If column a has null values, when transformation is
applied some unexpected value can be returned.
This likely added just enough generated code to take classes of 64k limit.

Resources