How to change blob for one specific repo in Databricks - azure

I have a specific environment cluster in DataBricks. There I have many different repos used by other people. Each Repo loads data from one specific "x" blob container created for this environment. Question is how to match one of these repos with another already created blob named "y"?
This is how it works now - blob "x" used by:
repo 1
repo 2
repo 3
How I want it to work - blob "x" used by:
repo 1
repo 3
blob "y" used by:
repo 2

I reproduce following procedure in my environment with two repo's
blob x -> blob_container
blob y -> blob_container2
repo 1
repo 2
reading file from blob_container
loading it to repo2 to blob_container2

Related

how to use existing github yml files for Azure devops pipelines

I want to create Azure DevOps pipelines, but instead of writing new yaml files, use prepared ones that are in a github repository.
I have connected GitHub to my Azure DevOps account, but I Can't see an option to use yaml files in that repository.
I only have an option to create a new pipeline yaml, and then set it in the repo folder structure.
If I try and set it on the location of the yaml file I want to use, which is already in the repo, I get - of course, an error stating there's a file there.
My work around is to set a new yaml file with a different name, copy the content from the existing file and then delete that one and rename the new file to the name of the file I copied from.
Surely there must be a better, easier, more logical and short way.
I would appreciate any help.
Under project settings you should link your Github account.
Then you can go and create a new pipeline and select the Github location
after this step, your available github repositories will appear and you can select your existing .YML file.
Existing pipeline:

Azure DevOps Pipines - Git Respository local directory location

I have created a Pipeline in Azure DevOps and have associated a git repository.
It is cloned to my agent, but I can't get control over in which local directory the repository is cloned to. I am working with self hosted Agent.
The next task need to use a specific file in the repository to complete the task.
The last things tha should happen in the pipline, is push back changes made in the respository.
I think what you want is WorkingDirectory, the local path on the agent where your source code files are downloaded. For example: c:\agent_work\1\s

Problem deploying a Bitbucket repo to Azure Storage

I've been trying to deploy my web app on Bitbucket to Azure Storage using Bitbucket Pipelines. I'm having issues with the SOURCE option. I need to copy the entire source code in the current repository, but SOURCE option seems to require a directory name.
My pipeline script is something like this:
- pipe: microsoft/azure-storage-deploy:2.0.0
variables:
SOURCE: './*'
DESTINATION: 'https://mystorageaccount.blob.core.windows.net/mycontainer'
How can I deploy everything in current repository?
The problem is fixed now.
There is a constant $BITBUCKET_CLONE_DIR which holds current repository.
You can find more pre-defined constants here: https://confluence.atlassian.com/bitbucket/variables-in-pipelines-794502608.html

Docker issue with python script in Azure logic app not connecting to current Azure blob storage

Every day, an Excel file is automatically uploaded to my Azure blob storage account. I have a Python script that reads the Excel file, extracts the necessary information, and saves the output as a new blob in the Azure storage account. I set up a Docker container that runs this Python script. It works correctly when run locally.
I pushed the Docker image to the Azure container registry and tried to set up an Azure logic app that starts a container with this Docker image every day at the same time. It runs, however, it does not seem to be working with the most updated version of my Azure storage account.
For example, I pushed an updated version of the Docker image last night. A new Excel file was added to the Azure storage account this morning and the logic app ran one hour later. The container with the Docker image, however, only found the files that were present in Azure storage account yesterday (so it was missing the most recent file, which is the one I needed analyzed).
I confirmed that the issue is not with the logic app as I added a step in the logic app to list the files in the Azure storage account, and this list included the most recent file.
UPDATE: I have confirmed that I am accessing the correct version of the environment variables. The issue remains: the Docker container seems to access Azure blob storage as it was at the time I most recently pushed the Docker image to the container registry. My current work around is to push the same image to the registry everyday, but this is annoying.
ANOTHER UPDATE: Here is the code to get the most recent blob (an Excel file). The date is always contained in the name of the blob. In theory, it finds the blob with the most recent date:
blobs = blob_service.list_blobs(container_name=os.environ.get("CONTAINERNAME"))
blobstring = blob_service.get_blob_to_text(os.environ.get("CONTAINERNAME"),
backup_csv).content
current_df = pd.read_csv(StringIO(blobstring))
add_n = 1
blob_string = re.compile("sales.xls")
for b in blobs:
if blob_string.search(b.name):
dt = b.name[14:24]
dt = datetime.strptime(dt, "%Y-%m-%d").date()
date_list.append(dt)
today = max(date_list)
print(today)
However, the blobs don't seem to update. It returns the most recent blob as of the date that I last pushed the image to the registry.
I also checked print(date.today()) in the same script and this works as expected (it prints the current date).
Figured out that I just needed to make all of the variables in my .env file and add them as environment variables with appropriate values in the 'containers environment' section of the image above. This https://learn.microsoft.com/en-us/azure/container-instances/container-instances-environment-variables was a helpful resource.
ALSO the container group needs to be deleted as the last action in the logic app. I named the wrong container group, so when the logic app ran each day, it used the cached version of the container.

Azure Blob storage azCopy replace container contents

Lets say I have container one with 3 files and I have container two with those same 3 files + 1 extra for a total of 4.
Is it possible to do a copy/replace so that container 2 only contains 3 files when doing azcopy?
AzCopy currently don't support to delete any blobs, it focus on transfer blobs from source to destination.
You might can write your own code the compare the 2 container and do clean up.

Resources