How to run Presto discovery service standalone? - presto

How to run Presto Discovery Service standalone so it's neither a coordinator nor a worker? What are the requirements of a HTTP endpoint to become a discovery service for a Presto cluster?
I found this thread on presto-users mailing list where David Phillips wrote:
If you want to run discovery as a standalone service, separate from
Presto, that is an option. We used to publish instructions for doing
this, but got rid of them years ago, as running discovery inside the
coordinator worked fine (even on large clusters with hundreds of
machines).
Does this still hold?

Yes, you can run a standalone discovery service. The cases for this are rare and in general I recommend just running it on the coordinator.
On your discovery node:
Download the discovery service tar.gz with the version that is compatible with your Presto nodes. (e.g. presto version 347 is compatible with discovery service 1.29) and untar it to a directory.
Similar to a Presto Server setup, create an /etc directory under the service root and configure the node.properties and jvm.config.
Add the config.properties, which for discovery service is as simple as this.
http-server.http.port=8081
Update these lines in your coordinator/worker config.properties.
discovery-server.enabled=false
discovery.uri=http://discovery.example.com:8081
Restart your services. (Discovery service is started the same way the presto services are started using bin/launcher)
Once all the servers and workers come up, you should be able to check curl -XGET http://discovery.example.com:8081/v1/service and should expect to see some output that contains:
{
"environment": "production",
"services": [
{
"id": "d2b7141e-d83f-4d23-be86-285ff2a9f53d",
"nodeId": "57ac8bd3-c55e-4170-b363-80d10023ece8",
"type": "presto",
"pool": "general",
"location": "/57ac8bd3-c55e-4170-b363-80d10023ece8",
"properties": {
"node_version": "347",
"coordinator": "true",
"http": "http://coord.example.com:8080",
"http-external": "http://coord.example.com:8080",
"connectorIds": "system"
}
},
{
"id": "f0abafae-052a-4758-95c6-d19355043bc6",
"nodeId": "57ac8bd3-c55e-4170-b363-80d10023ece8",
"type": "presto-coordinator",
"pool": "general",
"location": "/57ac8bd3-c55e-4170-b363-80d10023ece8",
"properties": {
"http": "http://coord.example.com:8080",
"http-external": "http://coord.example.com:8080"
}
},
{
"id": "1f5096de-189e-4e25-bac3-adc079981d86",
"nodeId": "8d7e820f-dd01-4227-ad6e-f74b97202647",
"type": "presto",
"pool": "general",
"location": "/8d7e820f-dd01-4227-ad6e-f74b97202647",
"properties": {
"node_version": "347",
"coordinator": "false",
"http": "http://worker1.example.com:8080",
"http-external": "http://worker1.example.com:8080",
"connectorIds": "system"
}
},
....
]
}

Related

Solution for using native Spark REST API for triggering dotnet for Spark jobs (stand alone cluster)

I looked for quite some time for a way to use the native spark REST api to trigger a dotnet spark job and did not find anything, but finally figured out a solution by running spark-submit via the CLI and specifying --master spark://spark:6066 and then comparing the driver launch commands that were executed on the worker node.
In case this helps anybody else, here is the body of an example post command (using Postman) for the native Spark REST API to trigger a dotnet spark application.
Spark REST API endpoint: http://[localhost or dns name or ip address]:6066/v1/submissions/create
{ "action": "CreateSubmissionRequest",
"appArgs": [
"dotnet", "/path/to/your/compiled_dotnet_app.dll", "app arg 1", "app arg 2", "etc..."
],
"appResource": "file:/opt/bitnami/spark/jars/microsoft-spark-3-2_2.12-2.1.0.jar",
"clientSparkVersion": "3.2.1",
"environmentVariables": {
"SPARK_ENV_LOADED": "1"
},
"mainClass": "org.apache.spark.deploy.dotnet.DotnetRunner",
"sparkProperties": {
"spark.driver.supervise": "false",
"spark.app.name": "org.apache.spark.deploy.dotnet.DotnetRunner",
"spark.submit.deployMode": "cluster",
"spark.master": "spark://spark:7077",
"spark.jars":"file:/opt/bitnami/spark/jars/microsoft-spark-3-2_2.12-2.1.0.jar",
"spark.submit.pyFiles":""
}
}
HTH

How do I access npm log files in GKE?

I'm running different nodejs microservices on Google Kubernetes Services.
Sometimes these services crash and according to Cloud Logging, I can find detailed information in a logging file. For example, the logging message says
{
"textPayload": "npm ERR! /root/.npm/_logs/2021-10-27T11_26_28_534Z-debug.log\n",
"insertId": "zoqxk8wvkuofhslm",
"resource": {
"type": "k8s_container",
"labels": {
"pod_name": "client-depl-7f679c6b49-5d9tz",
"container_name": "client",
"namespace_name": "production",
"cluster_name": "cluster-1",
"location": "europe-west3-a",
"project_id": "XXX"
}
},
"timestamp": "2021-10-27T11:26:28.701252670Z",
"severity": "ERROR",
"labels": {
"k8s-pod/app": "client",
"k8s-pod/skaffold_dev/run-id": "b5518659-05d6-4c08-9b55-9d58fdd5807f",
"k8s-pod/pod-template-hash": "7f679c6b49",
"compute.googleapis.com/resource_name": "gke-cluster-1-pool-1-8bfc60b2-ag86",
"k8s-pod/app_kubernetes_io/managed-by": "skaffold"
},
"logName": "projects/xxx-productive/logs/stderr",
"receiveTimestamp": "xxx"
}
Where do I find these logs on Google Cloud Platform?
---------------- Edit 2021.10.28 ---------------------------
I should clarify that I am already using the logs explorer. This is what I see there:
The logs show 7 consecutive error entries about npm failing. The last two entries indicate that there are more information in a log file "/root/.npm/_logs/2021-10-27T11_26_28_534Z-debug.log".
Does this log file has more info about the failure or is all the info I get in these 7 error log entries?
Thanks
kubectl logs <your_pod>
You can use GCP Logs Explorer
Assuming you already Enable Logging and Monitoring, You can view logs on:
a. Go to the Logs explorer in the Cloud Console.
b. Click Resource. Under ALL_RESOURCE_TYPES, select Kubernetes Container.
c. Under CLUSTER_NAME, select the name of your user cluster.
d. Under NAMESPACE_NAME, select default.
e. Click Add and then click Run Query.
f. Under Query results, you can see log entries from the monitoring-example Deployment. For example:
{
"textPayload": "2020/11/14 01:24:24 Starting to listen on :9090\n",
"insertId": "1oa4vhg3qfxidt",
"resource": {
"type": "k8s_container",
"labels": {
"pod_name": "monitoring-example-7685d96496-xqfsf",
"cluster_name": ...,
"namespace_name": "default",
"project_id": ...,
"location": "us-west1",
"container_name": "prometheus-example-exporter"
}
},
"timestamp": "2020-11-14T01:24:24.358600252Z",
"labels": {
"k8s-pod/pod-template-hash": "7685d96496",
"k8s-pod/app": "monitoring-example"
},
"logName": "projects/.../logs/stdout",
"receiveTimestamp": "2020-11-14T01:24:39.562864735Z"
}
How about
log into the pod while it is alive
kubectl exec -it your-pod -- sh
wait for it to crash and watch the crash file in real time while the pod is not restarted yet :)
How to login to a GCP Pod:
From the Google Cloud Platform main menu go to Kubernetes Engine -> Workloads
Click on the workload you're interested in:
Find the Managed Pods section and click on the Pod you want to access:
Click on KUBECTL -> Exec -> [name of workload/namespace]
A terminal should appear at the bottom of the browser page, SSHing you into the pod. You can look around for your log file from inside here

Azure Function (PowerShell)

After installing azure-core-functions v3 and migrating a project to v3 Powershell projects began to fail. Narrowed down the issue to be that Az modules loaded as dependencies were no longer recognized at runtime. Further testing revealed the managed dependency setting in the functions host.json file is properly loading the Az modules as deleting the data/ManagedDependencies folder via Kudu and restarting the Function App restores the Az Modules, so requirements.psd1 is working - Powershell just cannot find the downloaded modules.
After reverting to v2 I find the same issue in v2. I was able to get around the issue temporarily by adding the required AZ modules to the modules folder in the Azure Function project. Note: Dev and Deploy is currently via VS Code.
How are Managed Dependencies referenced by Powershell?
What are the next avenues to pursue to resolve the reference issues?
Host.json contents:
"version": "2.0",
"managedDependency": {
"enabled": true
}
}
requirements.psd1 contents:
# This file enables modules to be automatically managed by the Functions service.
# See https://aka.ms/functionsmanageddependency for additional information.
#
#{
'Az' = '3.*'
}
Function App Config:
[
{
"name": "APPINSIGHTS_INSTRUMENTATIONKEY",
"value": "32178670-77eb-40aa-afbc-ca17946f0350",
"slotSetting": false
},
{
"name": "AzureWebJobsStorage",
"value": "DefaultEndpointsProtocol=https;AccountName=REDACTED;EndpointSuffix=core.windows.net",
"slotSetting": false
},
{
"name": "FUNCTIONS_EXTENSION_VERSION",
"value": "~2",
"slotSetting": false
},
{
"name": "FUNCTIONS_WORKER_RUNTIME",
"value": "powershell",
"slotSetting": false
},
{
"name": "WEBSITE_CONTENTAZUREFILECONNECTIONSTRING",
"value": "DefaultEndpointsProtocol=REDACTED;EndpointSuffix=core.windows.net",
"slotSetting": false
},
{
"name": "WEBSITE_CONTENTSHARE",
"value": "rightrezmonitor7d0758",
"slotSetting": false
},
{
"name": "WEBSITE_NODE_DEFAULT_VERSION",
"value": "~10",
"slotSetting": false
},
{
"name": "WEBSITE_RUN_FROM_PACKAGE",
"value": "1",
"slotSetting": false
}
]
The data/ManagedDependencies/200103210646931.r directory in Kudu contains folders for AZ and AZ.Module folders
are you still seeing this issue? Does your function app have a dependency on .Net Core 2.2?
I have a PowerShell function app which uses the Get-AzKeyVaultSecret cmdlet to retrieve secrets from KeyVault. This function app was originally created to run on V2. However, I manually made the change to move it to V3, and everything continued to work as expected.
To answer your questions:
How are Managed Dependencies referenced by Powershell?
A: The managed dependencies path, which points to the storage account, is appended to $env:PSModulePath in the first invocation.
What are the next avenues to pursue to resolve the reference issues?
A: You could try force reinstalling the function app dependencies. To do so, go to the Portal and select your function app. Go to Overview and stop the function app. After that, select Platform features and go to Kudu as shown below.
Once in Kudu, go to Debug console, and select PowerShell as shown below.
From there, navigate to the D:\home\data\ManagedDependencies. Once there run Remove-Item * -Recurse -Force, e.g.,
cd D:\home\data\ManagedDependencies
Remove-Item * -Recurse -Force
Next, start the function app, and on the first function invocation, the dependencies will be downloaded and the path will be appended to $env:PSModulePath.
If you are still seeing issues after moving your app to V3, please open a issue at https://github.com/Azure/azure-functions-powershell-worker/issues, provide me with your function app name, and I will take a look.
Cheers,
Francisco

How to deploy a Linux Azure Function using the Github Docker Registry

I cannot get a deployment of an Azure Function by private repository, using then new Github artifact repo for Docker to work (https://github.com/features/packages).
My linux_fx_version is:
'linux_fx_version': 'DOCKER|{}'.format(self.docker_image_id)
with docker_image_id having the value organisation/project-name/container-name:latest
For the other settings, I am using
{ "name": "DOCKER_REGISTRY_SERVER_PASSWORD", "value": self.docker_password },
{ "name": "DOCKER_REGISTRY_SERVER_USERNAME", "value": self.docker_username },
{ "name": "DOCKER_REGISTRY_SERVER_URL", "value": self.docker_url },
with the docker_url being https://docker.pkg.github.com/, and the password being the token with read:packages
Things look good, and yet I get the following (I am not able to fetch any deployment logs as the runtime is unreachable).
Error:
Azure Functions Runtime is unreachable. Click here for details on storage configuration.
Solution found.
Use https://docker.pkg.github.com/ as the docker URL,
and docker.pkg.github.com/<org>/<project-name>/<container-name>:<version> as the linux_fx_version

Unable to create a pool with custom images on MS Azure

I'm trying to create a pool of virtual machines built on my custom image. I've successfully created a custom image and added it to my batch account.
But when I try to create a pool, based on this image from the azure portal, I get an error.
There was an error encountered while performing the last resize on the
pool. Please try resizing the pool again. Code: AllocationFailed
Message: Desired number of dedicated nodes could not be allocated
Details: Reason - The source managed disk or snapshot associated with
the virtual machine Image Id was not found.
While creating a pool in the portal I use my image name, as there's no option to set an image id. But the image Id in the json is correct. And I can see the image listed in the portal in the correct batch account.
Here's my pool properties json:
{
"id": "my-pool-0",
"displayName": "my-pool-0",
"lastModified": "2018-12-04T15:54:06.467Z",
"creationTime": "2018-12-04T15:44:18.197Z",
"state": "active",
"stateTransitionTime": "2018-12-04T15:44:18.197Z",
"allocationState": "steady",
"allocationStateTransitionTime": "2018-12-04T16:09:11.667Z",
"vmSize": "standard_a2",
"resizeTimeout": "PT15M",
"currentDedicatedNodes": 0,
"currentLowPriorityNodes": 0,
"targetDedicatedNodes": 1,
"targetLowPriorityNodes": 0,
"enableAutoScale": false,
"autoScaleFormula": null,
"autoScaleEvaluationInterval": null,
"enableInterNodeCommunication": false,
"maxTasksPerNode": 1,
"url": "https://mybatch.westeurope.batch.azure.com/pools/my-pool-0",
"resizeErrors": [
{
"message": "Desired number of dedicated nodes could not be allocated",
"code": "AllocationFailed",
"values": [
{
"name": "Reason",
"value": "The source managed disk or snapshot associated with the virtual machine Image Id was not found."
}
]
}
],
"virtualMachineConfiguration": {
"imageReference": {
"publisher": null,
"offer": null,
"sku": null,
"version": null,
"virtualMachineImageId": "/subscriptions/79b59716-301e-401a-bb8b-22edg5c1he4j/resourceGroups/resource-group-1/providers/Microsoft.Compute/images/my-image"
},
"nodeAgentSKUId": "batch.node.ubuntu 18.04"
},
"applicationLicenses": null
}
It seems like the error text has nothing to do with what actually is wrong. Has anyone encountered this error or now a way to troubleshoot this?
UPDATE
Packer json used to create the image (taken from here)
{
"builders": [{
"type": "azure-arm",
"client_id": "ffxcvbd0-c867-429a-bxcv-8ee0acvb6f99",
"client_secret": "cvb54cvb-202d-4wq-bb8b-22cdfbce4f",
"tenant_id": "ae33sdfd-a54c-40af-b20c-80810f0ff5da",
"subscription_id": "096da34-4604-4bcb-85ae-2afsdf22192b",
"managed_image_resource_group_name": "resource-group-1",
"managed_image_name": "my-image",
"os_type": "Linux",
"image_publisher": "Canonical",
"image_offer": "UbuntuServer",
"image_sku": "18.04-LTS",
"azure_tags": {
"dept": "Engineering",
"task": "Image deployment"
},
"location": "West Europe",
"vm_size": "Standard_DS2_v2"
}],
"provisioners": [{
"execute_command": "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'",
"inline": [
"export DEBIAN_FRONTEND=noninteractive",
"apt-get update",
"apt-get upgrade -y",
"apt-get -y install nginx",
...
"/usr/sbin/waagent -force -deprovision+user && export HISTSIZE=0 && sync"
],
"inline_shebang": "/bin/sh -x",
"type": "shell"
}]
}
With your issue, I did the test as you. The steps here:
Create the managed image through Packer.
Create the Batch Pool with the managed image in the same subscription and region.
And then I get the same error as you. Then I make another test that creates the image from a snapshot and then create the Batch Pool with the image. Luck! The pool works well.
In Azure you can prepare a managed image from snapshots of an Azure
VM's OS and data disks, from a generalized Azure VM with managed
disks, or from a generalized on-premises VHD that you upload.
Reference to this description, it seems the custom image cannot create through Packer. I'm not sure about this. But it really works. Hope this will help you.
Update
Take a look at the document Custom Images with Batch Shipyard. The description:
Note: Currently creating an ARM Image directly with Packer can only be
used with User Subscription Batch accounts. For standard Batch Service
pool allocation mode Batch accounts, Packer will need to create a VHD
first, then you will need to import the VHD to an ARM Image. Please
follow the appropriate path that matches your Batch account pool
allocation mode.
In my test, I have followed the steps that Packer does to create the image. When the source VM exists, the custom image can be used normally for Batch Pool. But it will fail if you delete the source VM. So, as the description, the standard Batch Service just can use the image created from VHD file that Packer create and the VHD file should exist in the Pool lifetime.
If your using a managed image then your imageReference section should look like this:
"imageReference": {
"id": "/subscriptions/79b59716-301e-401a-bb8b-22edg5c1he4j/resourceGroups/resource-group-1/providers/Microsoft.Compute/images/my-image"
},

Resources