Run a python script from the startup script - python-3.x

An instance is created when a new file is uploaded to the storage. The startup runs a python script that generates a pdf file for the new file, uploads the pdf back to the storage and deletes the instance. Since the python script has pretty lengthy, I have stored the startup script and the python script in the same location (Cloud Storage). I have passed the paths are metadata while creating the instance. The input to the python script is the file name of the new file. I checked the logs of the instance, its throwing some errors there. Can someone point out what is it that I am doing wrong.
Edited
Error Message:
{
"cpuPlatform": "Intel Haswell",
"creationTimestamp": "2021-08-02T06:40:36.346-07:00",
"deletionProtection": false,
"disks": [
{
"autoDelete": true,
"boot": true,
"deviceName": "xyz",
"diskSizeGb": "10",
"guestOsFeatures": [
{
"type": "UEFI_COMPATIBLE"
},
{
"type": "VIRTIO_SCSI_MULTIQUEUE"
}
],
"index": 0,
"interface": "SCSI",
"kind": "compute#attachedDisk",
"licenses": [
"projects/debian-cloud/global/licenses/debian-10-buster"
],
"mode": "READ_WRITE",
"source": "projects/patch-us/zones/us-central1-a/disks/instance-name",
"type": "PERSISTENT"
}
],
"fingerprint": "XlZ7biyVpAI=",
"id": "3984870299667155772",
"kind": "compute#instance",
"labelFingerprint": "42WmSpB8rSM=",
"lastStartTimestamp": "2021-08-02T06:40:46.210-07:00",
"machineType": "projects/project-name/zones/us-central1-a/machineTypes/e2-medium",
"metadata": {
"fingerprint": "f5o3Pxed5VY=",
"items": [
{
"key": "startup-script-url",
"value": "https://storage.cloud.google.com/project-name.appspot.com/start-up-script/start-script.sh"
},
{
"key": "file_name",
"value": "123456"
},
{
"key": "python_script_name",
"value": "https://storage.cloud.google.com/project-name.appspot.com/start-up-script/generate_fd_report.py"
}
],
"kind": "compute#metadata"
},
"name": "instance-name",
"networkInterfaces": [
{
"accessConfigs": [
{
"kind": "compute#accessConfig",
"name": "External NAT",
"natIP": "35.202.255.222",
"networkTier": "PREMIUM",
"type": "ONE_TO_ONE_NAT"
}
],
"fingerprint": "565TD6a2Y2c=",
"kind": "compute#networkInterface",
"name": "nic0",
"network": "projects/project-name/global/networks/default",
"networkIP": "10.128.0.29",
"stackType": "IPV4_ONLY",
"subnetwork": "projects/project-name/regions/us-central1/subnetworks/default"
}
],
"scheduling": {
"automaticRestart": true,
"onHostMaintenance": "MIGRATE",
"preemptible": false
},
"selfLink": "projects/project-name/zones/us-central1-a/instances/instance-name",
"serviceAccounts": [
{
"email": "project-id-compute#developer.gserviceaccount.com",
"scopes": [
"https://www.googleapis.com/auth/cloud-platform"
]
}
],
"shieldedInstanceConfig": {
"enableIntegrityMonitoring": true,
"enableSecureBoot": false,
"enableVtpm": true
},
"shieldedInstanceIntegrityPolicy": {
"updateAutoLearnPolicy": true
},
"startRestricted": false,
"status": "RUNNING",
"tags": {
"fingerprint": "42WmSpB8rSM="
},
"zone": "projects/project-name/zones/us-central1-a"
}
start-script-sh
#! /bin/bash
ECG_FILE_PATH = $(curl http://metadata/computeMetadata/v1/instance/attributes/file_path -H "Metadata-Flavor: Google")
PYTHON_FILE_PATH = $(curl http://metadata/computeMetadata/v1/instance/attributes/python_script_name -H "Metadata-Flavor: Google")
ECG_FILE_NAME = $(curl http://metadata/computeMetadata/v1/instance/attributes/file_name -H "Metadata-Flavor: Google")
curl -s -o generate_fd.py PYTHON_FILE_PATH
chmod +x generate_fd.py
python3 generate_fd.py ECG_FILE_PATH &
generate_fd_report.py
#!/usr/bin/env python3
def main(file_name):
print("Hello")
main(file_name)
Logs

To download the script, URL path to which you've saved as the metadata value is as follows:
curl -s -o filename.txt $(curl -s http://metadata/computeMetadata/v1/instance/attributes/filename -H "Metadata-Flavor: Google")

Related

How can I get DAG of Spark Sql Query execution plan?

I am doing some analysis on spark sql query execution plans. the execution plans that explain() api prints are not much readable. If we see spark web UI, a DAG graph is created which is divided into jobs, stages and tasks and much more readable. Is there any way to create that graph from execution plans or any apis in the code? if not, are there any apis that can read that grap from UI?
As close I can see, this project (https://github.com/AbsaOSS/spline-spark-agent) is able to interpret the execution plan and generate it in a readable way.
This spark job is reading a file, convert it to a CSV file, write to local.
A sample output in JSON look like
{
"id": "3861a1a7-ca31-4fab-b0f5-6dbcb53387ca",
"operations": {
"write": {
"outputSource": "file:/output.csv",
"append": false,
"id": 0,
"childIds": [
1
],
"params": {
"path": "output.csv"
},
"extra": {
"name": "InsertIntoHadoopFsRelationCommand",
"destinationType": "csv"
}
},
"reads": [
{
"inputSources": [
"file:/Users/liajiang/Downloads/spark-onboarding-demo-application/src/main/resources/wikidata.csv"
],
"id": 2,
"schema": [
"6742cfd4-d8b6-4827-89f2-4b2f7e060c57",
"62c022d9-c506-4e6e-984a-ee0c48f9df11",
"26f1d7b5-74a4-459c-87f3-46a3df781400",
"6e4063cf-4fd0-465d-a0ee-0e5c53bd52b0",
"2e019926-3adf-4ece-8ea7-0e01befd296b"
],
"params": {
"inferschema": "true",
"header": "true"
},
"extra": {
"name": "LogicalRelation",
"sourceType": "csv"
}
}
],
"other": [
{
"id": 1,
"childIds": [
2
],
"params": {
"name": "`source`"
},
"extra": {
"name": "SubqueryAlias"
}
}
]
},
"systemInfo": {
"name": "spark",
"version": "2.4.2"
},
"agentInfo": {
"name": "spline",
"version": "0.5.5"
},
"extraInfo": {
"appName": "spark-spline-demo-application",
"dataTypes": [
{
"_typeHint": "dt.Simple",
"id": "f0dede5e-8fe1-4c22-ab24-98f7f44a9a5a",
"name": "timestamp",
"nullable": true
},
{
"_typeHint": "dt.Simple",
"id": "dbe1d206-3d87-442c-837d-dfa47c88b9c1",
"name": "string",
"nullable": true
},
{
"_typeHint": "dt.Simple",
"id": "0d786d1e-030b-4997-b005-b4603aa247d7",
"name": "integer",
"nullable": true
}
],
"attributes": [
{
"id": "6742cfd4-d8b6-4827-89f2-4b2f7e060c57",
"name": "date",
"dataTypeId": "f0dede5e-8fe1-4c22-ab24-98f7f44a9a5a"
},
{
"id": "62c022d9-c506-4e6e-984a-ee0c48f9df11",
"name": "domain_code",
"dataTypeId": "dbe1d206-3d87-442c-837d-dfa47c88b9c1"
},
{
"id": "26f1d7b5-74a4-459c-87f3-46a3df781400",
"name": "page_title",
"dataTypeId": "dbe1d206-3d87-442c-837d-dfa47c88b9c1"
},
{
"id": "6e4063cf-4fd0-465d-a0ee-0e5c53bd52b0",
"name": "count_views",
"dataTypeId": "0d786d1e-030b-4997-b005-b4603aa247d7"
},
{
"id": "2e019926-3adf-4ece-8ea7-0e01befd296b",
"name": "total_response_size",
"dataTypeId": "0d786d1e-030b-4997-b005-b4603aa247d7"
}
]
}
}

when sending a cli command to AWS getting "null" even though I can see the data exist

when sending CLI command - ec2 describe-instances --instance-id , I am getting all the data, but I need to get specifically the private ip's , and its returning null, even though I can see them .
The CLI command : ec2 describe-instances --instance-id i-0b7xxxxxxxxxxx --query Reservations[] --output json , is returning the following output :
[
{
"Groups": [],
"Instances": [
{
"AmiLaunchIndex": 0,
"ImageId": "ami-1bxxxxxxx",
"InstanceId": "i-0b7xxxxxxxxx",
"InstanceType": "r4.2xlarge",
"KeyName": "QA-xxx-xxxxxyz",
"LaunchTime": "2019-05-21T06:40:57.000Z",
"Monitoring": {
"State": "disabled"
},
"Placement": {
"AvailabilityZone": "eu-west-1c",
"GroupName": "",
"Tenancy": "default"
},
"PrivateDnsName": "ip-172-xxx-11-211.eu-west-
1.compute.internal",
"PrivateIpAddress": "172.xxx.11.211",
"ProductCodes": [],
"PublicDnsName": "",
"State": {
"Code": 16,
"Name": "running"
},
"StateTransitionReason": "",
"SubnetId": "subnet-3362797a",
"VpcId": "vpc-02a19a65",
"Architecture": "x86_64",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/sda1",
"Ebs": {
"AttachTime": "2019-04-28T11:19:09.000Z",
"DeleteOnTermination": true,
"Status": "attached",
"VolumeId": "vol-02a052466755e023d"
}
}
],
"ClientToken": "qa-sip-sc1-1FBXNRII3WO13",
"EbsOptimized": false,
"EnaSupport": true,
"Hypervisor": "xen",
"IamInstanceProfile": {
"Arn": "arn:aws:iam::1xxxxxxx14:instance-profile/qa.tester.SBC-HA",
"Id": "AIPAI2xxxxxRPSC"
},
"NetworkInterfaces": [
{
"Attachment": {
"AttachTime": "2019-04-28T11:19:09.000Z",
"AttachmentId": "eni-attach-05xxxxxa8",
"DeleteOnTermination": false,
"DeviceIndex": 0,
"Status": "attached"
},
"Description": "SC1 interface for HA and cluster maintenance",
"Groups": [
{
"GroupName": "qa-sip-EvgenyZ-qa-Auto-network-clusterSecurityGroup-A4xxxxxxxC8",
"GroupId": "sg-0a2xxxxxxx2a"
}
],
"Ipv6Addresses": [],
"MacAddress": "06:xx:xx:xx:xx:xa",
"NetworkInterfaceId": "eni-xxxxxxxx",
"OwnerId": "xxxxxxx",
"PrivateDnsName": "ip-172-xxx-11-211.eu-west-1.compute.internal",
"PrivateIpAddress": "172.xxx.11.211",
"PrivateIpAddresses": [
{
"Primary": true,
"PrivateDnsName": "ip-172-xxx-11-211.eu-west-1.compute.internal",
"PrivateIpAddress": "172.xxx.11.211"
},
{
"Primary": false,
"PrivateDnsName": "ip-172-xxx-9-204.eu-west-1.compute.internal",
"PrivateIpAddress": "172.xxx.9.204"
}
],
"SourceDestCheck": true,
"Status": "in-use",
"SubnetId": "subnet-3xxxxa",
"VpcId": "vpc-xxxxx5"
}
I want to get the PrivateIpAddresses:172-xxx-9-204 and 172.xxx.11.211.
for this I am using the following CLI command
ec2 describe-instances --instance-id i-0b722cc96f7a14bfc --query
Reservations[].Instances[].PrivateIpAddress[].PrivateIpAddress --output
json
getting null.
expecting : 72-xxx-9-204 and 172.xxx.11.211
In the output of the query with --query=Reservations[] the Instances object is inside a list. So you have to index into the list first.
[*].Instances[*].PrivateIpAddress
This will give you:
[
[
"172.xxx.11.211"
]
]
Similarly,
[*].Instances[*].NetworkInterfaces[*].PrivateIpAddresses[*].PrivateIpAddress
Gives you:
[
[
[
[
"172.xxx.11.211",
"172.xxx.9.204"
]
]
]
]
Side Note: AWS CLI uses the JMESPath query language. You can experiment with your queries here: http://jmespath.org/
For me following query worked:
aws ec2 describe-instances --instance-id <id> --query Reservations[].Instances[].NetworkInterfaces[].PrivateIpAddresses[].PrivateIpAddress --output json

Error creating a customContent on a confluence addon

Today I was trying to create a confluence addon for my company and I've try following atlassian documents.
My problem comes trying to run the express app when adding a new customContent to the atlassian-connect.json, after running npm start I get the following error.
Failed to register with host https‍://admin:xxx#xxx.atlassian.net/wiki (200)
{"type":"INSTALL","pingAfter":300,"status":{"done":true,"statusCode":200,"con
tentType":"application/vnd.atl.plugins.task.install.err+json","subCode":"upm.
pluginInstall.error.descriptor.not.from.marketplace","source":"https‍://1a0adc
8f.ngrok.io/atlassian-connect.json","name":"https‍://1a0adc8f.ngrok.io/atlassi
an-connect.json"},"links":{"self":"/wiki/rest/plugins/1.0/pending/b88594d3-c3
c2-4760-b687-c8d860c0a377","alternate":"/wiki/rest/plugins/1.0/tasks/b88594d3
-c3c2-4760-b687-c8d860c0a377"},"timestamp":1502272147602,"userKey":"xxx","id":"xxx"}
Add-on not registered; no compatible hosts detected
This is my atlassian-connect.json file:
{
"key": "my-add-on",
"name": "Ping Pong",
"description": "My very first add-on",
"vendor": {
"name": "Angry Nerds",
"url": "https://www.atlassian.com/angrynerds"
},
"baseUrl": "{{localBaseUrl}}",
"links": {
"self": "{{localBaseUrl}}/atlassian-connect.json",
"homepage": "{{localBaseUrl}}/atlassian-connect.json"
},
"authentication": {
"type": "jwt"
},
"lifecycle": {
"installed": "/installed"
},
"scopes": [
"READ"
],
"modules": {
"generalPages": [
{
"key": "hello-world-page-jira",
"location": "system.top.navigation.bar",
"name": {
"value": "Hello World"
},
"url": "/hello-world",
"conditions": [{
"condition": "user_is_logged_in"
}]
},
{
"key": "customersViewer",
"location": "system.header/left",
"name": {
"value": "Hello World"
},
"url": "/hello-world",
"conditions": [{
"condition": "user_is_logged_in"
}]
}
],
"customContent": [
{
"key": "customer",
"name": {
"value": "Customers"
},
"uiSupport": {
"contentViewComponent": {
"moduleKey": "customersViewer"
},
"listViewComponent": {
"moduleKey": "customerList"
},
"icons": {
"item": {
"url": "/images/customers.png"
}
}
},
"apiSupport": {
"supportedContainerTypes": ["space"]
}
}
]
}
}
Does anybody has an idea on whats going on?
The contentViewComponent can't find the generalPage it is referencing in moduleKey.
From the docs:
In the snippet above, the moduleKey “customersViewer” maps to a
generalPage module we have defined in our add-on. This generalPage is
passed the context parameters we specify, and visualizes our content
accordingly.
If you change the generalPage with the key hello-world-page-confluence to customersVieweryou be able to install and get up and running.

How to get a Private IP of Edge node using ARM template or Ambari API in Azure

How to get a Private IP of Edge node using ARM template and Ambari API?
I am installing Edge node using following edgenode part of ARM template. I want to get Private IP Edge node for my custom application. How can I get it using ARM template or using Ambari by using edgenodeName?
{
'name': '[concat(parameters('clusterName'),'/', parameters('edgenodeName'))]',
'type': 'Microsoft.HDInsight/clusters/applications',
'apiVersion': '2015-03-01-preview',
'dependsOn': [
'[concat('Microsoft.HDInsight/clusters/', parameters('clusterName'))]'
],
'properties': {
'marketPlaceIdentifier': 'EmptyEdgeNode',
'computeProfile': {
'roles': [{
'name': 'edgenode',
'targetInstanceCount': 1,
'hardwareProfile': {
'vmSize': '[parameters('edgenodeSize')]'
}
}]
},
'installScriptActions': [],
'uninstallScriptActions': [],
'httpsEndpoints': [],
'applicationType': 'CustomApplication'
}
}
Update 1:-
Here is my json representation from resources.azure.com
{
"id": "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.HDInsight/clusters/$clusterName",
"name": "$clusterName",
"type": "Microsoft.HDInsight/clusters",
"location": "Central US",
"etag": "33908087-88d4-43e6-bad4-7668bb90fa39",
"tags": null,
"properties": {
"clusterVersion": "3.5.1000.0",
"osType": "Linux",
"clusterDefinition": {
"blueprint": "https://blueprints.azurehdinsight.net/spark-3.5.1000.0.9988582.json",
"kind": "SPARK",
"componentVersion": {
"Spark": "1.6"
}
},
"computeProfile": {
"roles": [
{
"name": "headnode",
"targetInstanceCount": 2,
"hardwareProfile": {
"vmSize": "Standard_D12_V2"
},
"osProfile": {
"linuxOperatingSystemProfile": {
"username": "$userName"
}
},
"virtualNetworkProfile": {
"id": "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Network/virtualNetworks/$vnetName",
"subnet": "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Network/virtualNetworks/$vnetName/subnets/default"
}
},
{
"name": "workernode",
"targetInstanceCount": 1,
"hardwareProfile": {
"vmSize": "Standard_D12_V2"
},
"osProfile": {
"linuxOperatingSystemProfile": {
"username": "$userName"
}
},
"virtualNetworkProfile": {
"id": "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Network/virtualNetworks/$vnetName",
"subnet": "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Network/virtualNetworks/$vnetName/subnets/default"
}
},
{
"name": "zookeepernode",
"targetInstanceCount": 3,
"hardwareProfile": {
"vmSize": "Medium"
},
"osProfile": {
"linuxOperatingSystemProfile": {
"username": "$userName"
}
},
"virtualNetworkProfile": {
"id": "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Network/virtualNetworks/$vnetName",
"subnet": "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Network/virtualNetworks/$vnetName/subnets/default"
}
},
{
"name": "edgenode1",
"targetInstanceCount": 1,
"hardwareProfile": {
"vmSize": "Standard_D3_v2"
},
"osProfile": {
"linuxOperatingSystemProfile": {
"username": "$userName"
}
},
"virtualNetworkProfile": {
"id": "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Network/virtualNetworks/$vnetName",
"subnet": "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Network/virtualNetworks/$vnetName/subnets/default"
}
}
]
},
"provisioningState": "Succeeded",
"clusterState": "Running",
"createdDate": "2017-04-26T07:44:54.4",
"quotaInfo": {
"coresUsed": 16
},
"connectivityEndpoints": [
{
"name": "SSH",
"protocol": "TCP",
"location": "$clusterName-ssh.azurehdinsight.net",
"port": 22
},
{
"name": "HTTPS",
"protocol": "TCP",
"location": "$clusterName.azurehdinsight.net",
"port": 443
}
],
"tier": "standard"
}
}
You could use Ambari API to get Edge node IP. When your template deploy succeed, you could the following script to list edge node IP.
#!/bin/bash
PASSWORD=$1
CLUSTERNAME=$2
###list all host private IP
for HOSTNAME in $(curl -u admin:$PASSWORD -sS -G "https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/hosts" | jq -r '.items[].Hosts.host_name')
do
IP=$(curl -u admin:$PASSWORD -sS -G "https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/hosts/$HOSTNAME" | jq -r '.Hosts.ip')
echo "$HOSTNAME <--> $IP" >>host.txt
done
cat host.txt |grep '^ed'|awk -F\> '{print $2 }'
In host.txt, you will get all host IP such like this.
ed11-******.gx.internal.cloudapp.net <--> 10.4.0.4
ed20-******.gx.internal.cloudapp.net <--> 10.4.0.8
hn0-******.gx.internal.cloudapp.net <--> 10.4.0.18
hn1-******.gx.internal.cloudapp.net <--> 10.4.0.13
wn0-******.gx.internal.cloudapp.net <--> 10.4.0.7
zk1-******.gx.internal.cloudapp.net <--> 10.4.0.12
zk3-******.gx.internal.cloudapp.net <--> 10.4.0.9
zk5-******.gx.internal.cloudapp.net <--> 10.4.0.10
You could execute the script like below:
[root#shui home]# ./deploy.sh <password> <clustername>
10.4.0.4
10.4.0.8

Unable to start a ubuntu container in openshift-origin

Am trying to bring up a ubuntu container in a POD in openshift. I have setup my local docker registry and have configured DNS accordingly. Starting the ubuntu container with just docker works fine without any issues. When I deploy the POD, I can see that my docker ubuntu image is pulled successfully, but doesnt succeed in starting the same. It fails with back-off pulling image error. Is this because my entry point does not have any background process running in side the container ?
"openshift.io/container.ubuntu.image.entrypoint": "[\"top\"]",
Snapshot of the events
Deployment-config :
{
"kind": "DeploymentConfig",
"apiVersion": "v1",
"metadata": {
"name": "ubuntu",
"namespace": "testproject",
"selfLink": "/oapi/v1/namespaces/testproject/deploymentconfigs/ubuntu",
"uid": "e7c7b9c6-4dbd-11e6-bd2b-0800277bbed5",
"resourceVersion": "4340",
"generation": 6,
"creationTimestamp": "2016-07-19T14:34:31Z",
"labels": {
"app": "ubuntu"
},
"annotations": {
"openshift.io/deployment.cancelled": "4",
"openshift.io/generated-by": "OpenShiftNewApp"
}
},
"spec": {
"strategy": {
"type": "Rolling",
"rollingParams": {
"updatePeriodSeconds": 1,
"intervalSeconds": 1,
"timeoutSeconds": 600,
"maxUnavailable": "25%",
"maxSurge": "25%"
},
"resources": {}
},
"triggers": [
{
"type": "ConfigChange"
},
{
"type": "ImageChange",
"imageChangeParams": {
"automatic": true,
"containerNames": [
"ubuntu"
],
"from": {
"kind": "ImageStreamTag",
"namespace": "testproject",
"name": "ubuntu:latest"
},
"lastTriggeredImage": "ns1.myregistry.com:5000/ubuntu#sha256:6d9a2a1bacdcb2bd65e36b8f1f557e89abf0f5f987ba68104bcfc76103a08b86"
}
}
],
"replicas": 1,
"test": false,
"selector": {
"app": "ubuntu",
"deploymentconfig": "ubuntu"
},
"template": {
"metadata": {
"creationTimestamp": null,
"labels": {
"app": "ubuntu",
"deploymentconfig": "ubuntu"
},
"annotations": {
"openshift.io/container.ubuntu.image.entrypoint": "[\"top\"]",
"openshift.io/generated-by": "OpenShiftNewApp"
}
},
"spec": {
"containers": [
{
"name": "ubuntu",
"image": "ns1.myregistry.com:5000/ubuntu#sha256:6d9a2a1bacdcb2bd65e36b8f1f557e89abf0f5f987ba68104bcfc76103a08b86",
"resources": {},
"terminationMessagePath": "/dev/termination-log",
"imagePullPolicy": "Always"
}
],
"restartPolicy": "Always",
"terminationGracePeriodSeconds": 30,
"dnsPolicy": "ClusterFirst",
"securityContext": {}
}
}
},
"status": {
"latestVersion": 5,
"details": {
"causes": [
{
"type": "ConfigChange"
}
]
},
"observedGeneration": 5
}
The problem was with the http proxy. After solving that image pull was successful

Resources