MarkLogic - Forest data folder & Azure Blob - azure

Technical Stack
MarkLogic 9.0
Cenos Linux
Azure Blob
Blobfuse
To make sure we do not have to worry about data disk size for MarkLogic Forest, we have configured Azure Blob to one of folder in Linux machine, so we do not have to worry about disk size.
There are few things i noticed
Need to create folder in Linux
Create folder and point it to above folder
Then configure Blobfuse else we are getting permission denied while creating forest
Use below command to give permission to all
chmod 777 -R
Now when we started importing using MarkLogic Content Pump (MLCP)
19/03/15 17:01:19 ERROR mapreduce.ContentWriter: SVC-FILSTAT: File status error: stat64 '/mnt/mycontainer/Forests/forest-01/000043e5': Permission denied
So if you look at below image
1st we tried with mycontainer but as soon as we map it to Azure Blob, it does not looks green as azureblob which is. We still need to map azureblob to "azureblob" folder.
It seems i am missing something here, anything to do with Azure Blob security settings?

With the test, when you mount the Azure Blob to Linux, for example, Ubuntu 18.04 (which I'm using), if you want to allow other users to use the mount directory, you can add the parameter -o allow_other when you execute the command blobfuse.
To allow access to all users, you can mount via the option -o
allow_other.
Also, I think you should give others permission through the command chown. For more details, see How to mount Blob storage as a file system with blobfuse.

First i would like to thanks Charles for his efforts and extended help on this issue, Thanks Charls :). I am sure this will help me sometime, somewhere.
I got link on how to setup MarkLogic on Aure
On Page No. 27, steps to Configuring MarkLogic for Azure Blob Storage
In summary it is
Create Storage account in Azure
Create Blob container
Go to MarkLogic server (http://localhost:8001)
Go to Security -> Credentials
Provide Storage account and Azure storage key
While creating MarkLogic Forest, mentioned container path in data directory
azure://mycontainer/mydirectory/myfile
And you are done. No Blobfuse, no drive mount, just a configuration in MarkLogic
Awesome!!
Its working like dream :)

Related

Yaml configuration to mount Azure Blob Container share

How to configure Azure Blob Storage Container on an Yaml
- name: scripts-file-share
azureFile:
secretName: dev-blobstorage-secret
shareName: logs
readOnly: false```
The above is for the logs file share to configure on yaml.
But if I need to mount blob container? How to configure it?
Instead of azureFile do I need to use azureBlob?
And what is the configuration that I need to have below azureBlob? Please help
After the responses I got from the above post and also went through the articles online, I see there is no option for Azure Blob to mount on Azure AKS except to use azcopy or rest api integration for my problem considering the limitations I have on my environment.
So, after little bit research and taking references from below articles I could able to create a Docker image.
1.) Created the docker image with the reference article. But again, I also need support to run a bash script as I am running azcopy command using bash file. So, I tried to copy the azcopy tool to /usr/bin.
2.) Created SAS tokens for Azure File Share & Azure Blob. (Make sure you give required access permissions only)
3.) Created a bash file that runs the below command.
azcopy <FileShareSASTokenConnectionUrl> <BlobSASTokenConnectionUrl> --recursive=true
4.) Created a deployment yaml that runs on AKS. Added the command to run bash file in that.
This gave me the ability to copy the files from Azure File Share Folders to Azure Blob Container
References:
1.) https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#obtain-a-static-download-link
2.) https://github.com/Azure/azure-storage-azcopy/issues/423

Unable to mount file share on Linux VM

I have two Linux Azure VMs (Redhat 7.4) that need share a common location for processing files. The VMs are located in Australia East.
I also have a Storage Account that's in Australia East and have created a file share in the Storage Account. I generated the commands to connect the VMs to the file share (by clicking on the file share, then choosing "Connect"), but I get this error when I run the final generated command in the VM:
sudo mount -t cifs //<storageaccount>/<fileshare> /mnt/<storageaccount>/<fileshare> -o vers=3.0,credentials=/etc/smbcredentials/<storageaccount>.cred,dir_mode=0777,file_mode=0777,serverino
... I get this message:
mount error(13): Permission denied
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs)
I've run the file diagnostic tool script (https://gallery.technet.microsoft.com/Troubleshooting-tool-for-02184089) in the VM and got this error:
Error: Client is not Azure VM in the region as Storage account, mount will fail
I'm confused as it seems to be saying that the VMs and Storage Account are in different locations, when they aren't.
Thanks in advance for any assistance.
I can reproduce this same issue with Redhat 7.4. To fix this, you can make sure that Secure transfer required setting is disabled on the storage account. View more info. Or verify that you input the correct value in each of the parameters in the commands. You could refer to these steps to mount the Azure file share.
sudo mount -t cifs $smbPath $mntPath -o vers=3.0,username=$storageAccountName,password=$storageAccountKey,serverino

Connect to Azure File Share from Azure Cloud Shell

I don't know how to connect to an existing Azure File Share from Azure Cloud Shell.
The command clouddrive seems to move my default cloud shell storage account. But I don't want to do that. I just want to access my existing Azure File Share storage. This can exist in any Azurea region (not just what's available for Cloud Shell, which is currently very limited)
When I tried to use clouddrive to mount my existing Azure Files account, I get the following error message:
ERROR: The storage account is not in the valid location. Expect: eastus Actual: canadacentral
I'd prefer not to move my existing Azure File Shares from canadacentral to eastus. Is there a workaround for this?
I'd like to just connect to my existing Azure File Shares through Cloud Shell and run commands in those directories.
Thank you!
Same question asked here:
https://github.com/MicrosoftDocs/azure-docs/issues/42001
https://serverfault.com/questions/992834/connect-to-azure-file-share-from-azure-cloud-shell
Azure cloud shell is an interactive, authenticated, browser-accessible shell which backend is running on cloud shell hosts. The cloud shell machines are temporary but your files are persisted through a mounted file share named clouddrive.
By using the advanced option, you can associate existing resources. Also, the associated Azure storage accounts must reside in the same region as the Cloud Shell machine that you're mounting them to. To find your current region you may run env in Bash and locate the variable ACC_LOCATION.
As the document stated, the canadacentral is not an available region for Cloud Shell, you should mount file storage in the available region. If so, you can run clouddrive unmount to unmount the current file share then select the existing file storage in the available region via clicking advanced settings in the initial login.

How to upload a file from azure blob storage to Linux VM created on azure

I have one large file on my azure blob storage container. I want to move my file from blob storage to Linux VM created on azure> How can I do that using data factory? or any Powershell Command?
The easiest and without any tools is to generate SAS token for the blob and run CURL.
Generate SAS
And then CURL
curl <blob_sas_url> -o output.txt
If you need this automated every time you can generate SAS URL from the script or just use AzCopy.
Please reference this blog:How to copy data to VM from blob storage, it gives you a way to solve the problem with Data Factory:
"To anyone who might get into same problem in future, I solved my problem by using 'copy wizard' present in ADF.
We need to install Data Management Gateway on VM and register it before we use 'copy wizard'.
We need to specify blob storage as source and in destination we need to choose 'File Server Share' option. In 'File Server Share' option we need to specify user credentials which I suppose pipeline uses to login to VM, folder on VM where pipeline will copy the data."
From the Azure Blog Storage document, there is another way can help you Mount Blob storage as a file system with blobfuse on Linux.
Blobfuse is a virtual file system driver for Azure Blob storage. Blobfuse allows you to access your existing block blob data in your storage account through the Linux file system. Blobfuse uses the virtual directory scheme with the forward-slash '/' as a delimiter.
This guide shows you how to use blobfuse, and mount a Blob storage container on Linux and access data. To learn more about blobfuse, read the details in the blobfuse repository.
If you want to use AzCopy, you can reference this document Transfer data with AzCopy and Blob storage. You can download the AzCopy for Linux. It provided the command for upload and download files.
For example, upload file:
azcopy copy "<local-file-path>" "https://<storage-account-name>.<blob or dfs>.core.windows.net/<container-name>/<blob-name>"
For PowerShell, you need to use PowerShell Core 6.x and later on all platforms. It works with Windows and Linux virtual machines using Windows PowerShell 5.1 (Windows only) or PowerShell 6 (Windows and Linux).
You can find the PowerShell commands in this document:Quickstart: Upload, download, and list blobs by using Azure PowerShell
Here is another link talked about Copy Files to Azure VM using PowerShell Remoting 6 (Windows and Linux).
Hope this helps.
You have many options to copy content from the blob store to the disk on the VM:
1. Use AzCopy
2. Use Azure Pipelines - File copy task
3. Use Powershell cmdlets
A lot of content is available on these approaches on SO!
It seems this is not properly documented anywhere so I am sharing the most basic approach which is to use the azcopy tool that is available for both windows/linux OS. This approach doens't need the complexity of creating the credentials/tokens.
Download azcopy
Its simple executable which can be run directly after extraction
Create a managed identity(system-assigned identity) for your Virtual machine. Navigate to VM-> Identity -> Turn the Status to 'ON' -> Save
Now the VM can be assigned permission at the following levels:
Storage account
Container (file system)
Resource group
Subscription
For this case, navigate to storage account -> IAM -> Add role assignment -> Select role 'Storage Blob Data Contributor' -> Assign access to 'Virtual machine' -> Select the desired VM -> SAVE
NOTE: If you give access to the VM on IAM properties of a Resource Group, the VM will be able to access all the storage accounts of the RG.
Login to VM and assume the identity (run the command from the same location where the azcopy is located)
For windows : azcopy login --identity
For linux : ./azcopy login --identity
Upload or download the files now:
azcopy cp "source-file" "storageUri/blob-container/" --recursive=true
Example: azcopy cp "C:\test.txt" "https://mystorageaccount.blob.core.windows.net/backup/" --recursive=true
IAM permission can take few minutes to propagate. If you change/add the permissions/access level anywhere, run the azcopy login --identity command again to get the updated identity.
More info on Azcopy is available here

share storage account on different subscription and location

I successfully mounted a Azure File Storage to a VM which is inside the same subscription and location. Now I cloned this VM to a new subscription and another location - so my new machine is exactly the same except of subscription and location. When I run the same command for mounting the same file storage:
sudo mount -t cifs //MYACCOUNT.file.core.windows.net/MY/FOLDER /MY/LOCAL/FOLDER/ -o vers=3.0,username=USER,password=ACCESSKEY,file_mode=0777,dir_mode=0777
then I get
mount error(13): Permission denied
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs)
I think it has something to do with the different subscription and location, because in the portal when I click on connect in my file storage it is said:
To connect to this file share, run this command from any Windows virtual machine on the same subscription and location:
So is there any possibility to connect to my file storage from within another subscription and location?
You need to use the SMB 3.0 protocol to connect outside of that Azure region. This page says it's not supported on Linux:
https://azure.microsoft.com/en-us/documentation/articles/storage-how-to-use-files-linux/
Note that since the Linux SMB client doesn’t yet support encryption, mounting a file share from Linux still requires the client to be in the same Azure region as the file share. However, encryption support for Linux is on the roadmap of Linux developers responsible for SMB functionality.
Edit: There is an update on Ubuntu here.

Resources