I have installed the new version of the spark-monitoring library which is supposed to support Databricks Runtime 11.0. See here: spark-monitoring-library. I have successfully attached the init script to my cluster. However, when I run jobs on this cluster, I do not see any logs of the Databricks jobs in Log Analytics. Does anyone have the same problem and has it resolved?
Related
I'm using Airflow to schedule for spark job and using a conf.properties file.
I want to change this file in Airflow UI not in server CLI.
How cant I do??
Airflow webserver doesn't support files edit in its UI. But it allows you to add your plugins and customize the UI by adding flask_appbuilder views (here is the doc).
You can also use an unofficial open source plugins to do that (ex: airflow_code_editor).
I am creating an Azure pipeline for the first time in my life (and a pipeline too) and there are some basic concepts that I don't understand.
First of all I have trouble understanding how the installation works, if my .yaml file installs Liquibase, will Liquibase installation run every time the pipeline is triggered? (by pushing on github)
Also, I don't know how to run liquibase commands from the agent, I see here that they use the liquibase bat file, I guess you have to download the zip from the Liquibase website and put it in the agent, but how do you do that?
You can setup Liquibase in a couple of different ways:
You can use Liquibase Docker image in your Azure pipeline. You can find more information about using Liquibase Docker image here: https://docs.liquibase.com/workflows/liquibase-community/using-liquibase-and-docker.html
You can install Liquibase on Azure agent and ensure that all Liquibase jobs run on that specific agent where Liquibase is installed. Liquibase releases can be downloaded from: https://github.com/liquibase/liquibase/releases
The URL you point to shows that Liquibase commands are invoked from C:\apps\Liquibase directory.
I have created an AWS DataPipeline using EMR template, but its not installing Spark on EMR cluster. Do I need to set any special action for that ?
I see some bootstrapaction is need for spark installation but that is also not working.
That install-spark bootstrap action is only for 3.x AMI versions. If you are using a releaseLabel (emr-4.x or beyond), the applications to install are specified in a different way.
When you are creating a pipeline, you click "Edit in Architect" at the bottom or edit your pipeline on pipelines home page then you can then click on the EmrCluster node and select Applications from the "Add an optional field..." dropdown. That is where you may add Spark.
Hope you are doing well.
I am new to Spark as well as Microsoft Azure. As per our project requirement we have developed a pyspark script though the jupyter notebook installed in our HDInsight cluster. Till date we ran the code from the jupyter itself but now we need to automate the script. I tried to use Azure Datafactory but could not find a way to run the pyspark script from there. Also tried to use oozie but could not figure out how to use it.
May you people please help me how I can automate/ schedule a pyspark script in azure.
Thanks,
Shamik.
Azure Data Factory today doesn't have first class support for Spark. We are working to add that integration in future. Till that time, we have published a sample on Github that uses ADF Map Reduce Activity to submit a jar that invokes spark submit.
Please take a look here:
https://github.com/Azure/Azure-DataFactory/tree/master/Samples/Spark
I'm trying to create backup and restore for cassandra node using Priam. I want to upload the snapshot into S3 and restore from here.
I found priam setup Priam setup but I didn't understand the steps given here.
I have created git clone and ran
./gradlew build
I have already setup for ASGs.
Can someone give me fully described steps on how to install and execute backup and restore?
Hopefully you solved it already. You had to investigate more on how to deploy wars (for example in tomcat server which is basically just moving a war file and starting the server service) and create ASGs (Autoscaling groups) in Amazon webservices (see http://docs.aws.amazon.com/autoscaling/latest/userguide/GettingStartedTutorial.html).
Basically Priam runs as a webapp in tomcat, is configured in a *.yaml file and helps you manage the cassandra nodes through a REST interface (see: https://github.com/Netflix/Priam/wiki/REST-API)