We are using Databricks DBX the following way:
dbx execute for development in IDE.
Upload resulting package as Python wheel to GCS bucket using dbx deploy <workflow> --assets-only. We don't create a permanent job in Databricks workflows.
Execute Python wheel on Databricks job cluster through Airflow DatabricksSubmitRunOperator.
I have two questions related to the artifact location. This location is specified in the .dbx/project.json file.
Q1: Each time a dbx deploy is done, a new version of the wheel is uploaded to the GCS bucket. Is there a possibility to have no versions (as our code is versioned) and just overwrite the wheel each time on the same location? The multiple versions make it difficult to add the filepath to the wheel in our Airflow DatabricksSubmitRunOperator.
Q2: We have a GCS bucket for dev, test and prod. The artifact_location is hard-coded in the JSON file. Is there a way to parameterize this for different environments? Or what is the recommended pattern in CICD pipeline? Deploy wheel to DEV bucket using dbx deploy and then copy that wheel to TEST and PROD?
Related
We are using Databricks to generate ETL scripts. One step requires us to upload small csvs into a Repos folder. I can do this manually using the import window in the Repos GUI. However, i would like to do this programmatically using the databricks cli. Is this possible? I have tried using the Workspace API, but this only works for sourcecode files.
Unfortunately it's not possible as of right now, because there is no API for that that could be used by databricks-cli. But you can add and commit files to the Git repository, and then use databricks repos update to pull them inside the workspace.
I am creating an Azure pipeline for the first time in my life (and a pipeline too) and there are some basic concepts that I don't understand.
First of all I have trouble understanding how the installation works, if my .yaml file installs Liquibase, will Liquibase installation run every time the pipeline is triggered? (by pushing on github)
Also, I don't know how to run liquibase commands from the agent, I see here that they use the liquibase bat file, I guess you have to download the zip from the Liquibase website and put it in the agent, but how do you do that?
You can setup Liquibase in a couple of different ways:
You can use Liquibase Docker image in your Azure pipeline. You can find more information about using Liquibase Docker image here: https://docs.liquibase.com/workflows/liquibase-community/using-liquibase-and-docker.html
You can install Liquibase on Azure agent and ensure that all Liquibase jobs run on that specific agent where Liquibase is installed. Liquibase releases can be downloaded from: https://github.com/liquibase/liquibase/releases
The URL you point to shows that Liquibase commands are invoked from C:\apps\Liquibase directory.
I using self-managed GitLab to manage many java application. I also use gitlab package registry to store the artifacts (jar file) and use AWS S3 as the storage path. My company want to setup a plan for the gitlab backup. I review gitlab document: https://docs.gitlab.com/ee/raketasks/backup_restore.html. I don't see any mention relate to how we can backup the packages in package registry
I don't know when I restore gitlab with new instance, the new package registry will recognize my packages in S3?
Anyone has exp about this, please advise me. Thanks a lot!!!
Since you are storing your artifacts on S3, I believe they should just be available when you restore from backup. The new instance would still be pointing at the same S3 bucket. You should make sure the S3 retention polices are appropriate for your backup needs.
If you are storing your packages on the local filesystem, the Gitlab backup process doesn't currently include those files, though it does include the package metadata. In that case, you'll need to manually copy the packages directory at /var/opt/gitlab/gitlab-rails/shared/packages/ to the new server after restoring the metadata using the normal backup/restore process.
There is an open ticket for this in the Gitlab issue tracker, which is where I found the above workaround.
I'm very confused about how one of the build task currently works.
I have been using Grunt locally in VS-Code to minify a JS file. All seems to be working well. In Azure DevOps, as a Build Task, I am using the same package.json the minification takes place but on the agent VM:
D:\a\1\s\Build\Hello.js
Looking in my repo, this file does not exist. I am assuming that I need to copy the file and upload to my own repo. Does anyone know how I do this?
A build usually creates a build ** artifact** that gets copied to a drop location. You will use the build artifacts inside your release definitions to deploy the binaries / minified or optimized code to an environment.
You probably don't want/need to upload any file back to your repo.
See: What is Azure pipelines
Is it possible to setup continuous delivery for a simple html page in under 1 hour?
Suppose I have a hello world index.html page being hosted by npm serve, a Dockerfile to build the image and a image.sh script using docker build. This is in a github repo.
I want to be able to check-in a change to the index.html file and see it on my website immediately.
Can this be done in under 1 hour. Either AWS or Google Cloud. What are the steps?
To answer your question. 1 hour. Is it possible? Yes.
Using only AWS,
Services to be used:
AWS CodePipeline - To trigger Github webhooks and send the source files to AWS CodeBuild
AWS CodeBuild - Takes the source files from the CodePipeline and build your application, serve the build to S3, Heroku, Elastic Beanstalk, or any alternate service you desire
The Steps
Create an AWS CodePipeline
Attach your source(Github) in your Pipeline (Each commit will trigger your pipeline to take the new commit and use it as a source and build it in CodeBuild)
Using your custom Docker build environment, CodeBuild uses a yml file to specify the steps to take in your build process. Use it to build the newly committed source files, and deploy your app(s) using the AWS CLI.
Good Luck.
I think I would start with creating a web-enabled script which would be a Github commit hook. Probably in Node on a AWS instance which would then trigger the whole process of cleaning up (deleting) the old AWS instance and reinstalling a new AWS instance with the contents of your repository.
The exact method will be largely dependant on how your whole stack is setup.