How to modify config file of spark job in Airflow UI? - apache-spark

I'm using Airflow to schedule for spark job and using a conf.properties file.
I want to change this file in Airflow UI not in server CLI.
How cant I do??

Airflow webserver doesn't support files edit in its UI. But it allows you to add your plugins and customize the UI by adding flask_appbuilder views (here is the doc).
You can also use an unofficial open source plugins to do that (ex: airflow_code_editor).

Related

Import a CSV file using Databricks CLI in Repos

We are using Databricks to generate ETL scripts. One step requires us to upload small csvs into a Repos folder. I can do this manually using the import window in the Repos GUI. However, i would like to do this programmatically using the databricks cli. Is this possible? I have tried using the Workspace API, but this only works for sourcecode files.
Unfortunately it's not possible as of right now, because there is no API for that that could be used by databricks-cli. But you can add and commit files to the Git repository, and then use databricks repos update to pull them inside the workspace.

How to use Airflow-API-Plugin?

I want to List and Trigger DAGs using this https://github.com/airflow-plugins/airflow_api_plugin github repo. How and where should I place this plugin in my airflow folder so that I can call the endpoints?
Is there anything that I need to change in the airflow.cfg file?
The repository you listed has not been updated in a while. Why not just use the experimental REST APIs included in Airflow? You can find them here: https://airflow.apache.org/docs/stable/api.html .
Use:
GET /api/experimental/dags//dag_runs
to get a list of DAG runs and
POST /api/experimental/dags//dag_runs
to trigger a new dag run

AWS DataPipeline EMR cluster with spark

I have created an AWS DataPipeline using EMR template, but its not installing Spark on EMR cluster. Do I need to set any special action for that ?
I see some bootstrapaction is need for spark installation but that is also not working.
That install-spark bootstrap action is only for 3.x AMI versions. If you are using a releaseLabel (emr-4.x or beyond), the applications to install are specified in a different way.
When you are creating a pipeline, you click "Edit in Architect" at the bottom or edit your pipeline on pipelines home page then you can then click on the EmrCluster node and select Applications from the "Add an optional field..." dropdown. That is where you may add Spark.

How to Automate Pyspark script in Microsoft Azure

Hope you are doing well.
I am new to Spark as well as Microsoft Azure. As per our project requirement we have developed a pyspark script though the jupyter notebook installed in our HDInsight cluster. Till date we ran the code from the jupyter itself but now we need to automate the script. I tried to use Azure Datafactory but could not find a way to run the pyspark script from there. Also tried to use oozie but could not figure out how to use it.
May you people please help me how I can automate/ schedule a pyspark script in azure.
Thanks,
Shamik.
Azure Data Factory today doesn't have first class support for Spark. We are working to add that integration in future. Till that time, we have published a sample on Github that uses ADF Map Reduce Activity to submit a jar that invokes spark submit.
Please take a look here:
https://github.com/Azure/Azure-DataFactory/tree/master/Samples/Spark

How to place Email-Ext groovy script on the jenkins file system

I need to dynamically modify the notification e-mail recipients based on the build, so I'm using a groovy script. I want this script to be available to all jobs, so I want it to reside on the Jenkins file system and not within each project. It can be either in the recipients fields (using ${SCRIPT,...}) or in the pre-send script. A short (fixed) script that evaluates the master script is also good, as long it is the same for all projects.
You should try Config File Provider plugin. It works together with the Credentials configuration of Jenkins.
Go to Manage Jenkins > Managed files
Click Add a new Config
Select Groovy file
Enter the contents of your script
Your script will now be saved centrally on Jenkins, available to master/slave nodes.
In your Job Configuration:
Under Build Environment
Check Provide Configuration file
Select your configured groovy File from dropdown
Set Variable with a name, for example: myscript
Now, anywhere in your job, you can say ${myscript} and it will refer to absolute location of the file on filesystem (it will be somewhere in Jenkins dir).
My impression it that you would probably want to completely switch to Jenkins pipelines where the entire job is groovy file (Jenkinsfile) in the root of the repository.
Email-Ext already supports it even if it may be lacking some documentation.
https://jenkins.io/doc/pipeline/steps/email-ext/

Resources