I got into an issue with the size of the Log folder on the first node of a Service Fabric cluster.
It seems that there isn't any upper limit on the disk space it can use and it will eventually eat up the entire local disk.
The cluster was created via an ARM template and I set up two storage accounts associated to the cluster. The variable names for the storage accounts are: supportLogStorageAccountName and applicationDiagnosticsStorageAccountName
However, the etl files are written only to local cluster nodes disks and not to the storage (where I could find dtr files).
Is there any way to set the destination to the etl files to an external storage or at least to limit the size of the Log folder? I wonder if the parameter overallQuotaInMB in the ARM template could be related to that.
You can override the MaxDiskQuotaInMb setting from the default of 10240 to reduce the disk usage by etl's. (We use the local logs in cases where Azure storage isn't available for some reason)
https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-fabric-settings
Related
In Azure Batch when creating a pool in the portal you can create a DataDisk and set it's size in GB as well as choose between Standard LRS and Premium LRS.
When using Powershell and/or the .NET libraries you can also set up a MountConfiguration to a FileShare (as well as Blobs, etc).
I'm confused as to what the difference is between the two. Specifically between a DataDisk and a Mounted FileShare.
For my scenario I want to use the lowest powered Linux VM possible but need at least 500GB of storage isolated to each node (no need for sharing across nodes).
I added a DataDisk to my pool since it seemed simpler than mounting a FileShare but my nodes do not have access to the additional file storage. Are there additional configurations that need to be made to the job or task? Does it need to be mounted to a drive letter like a FileShare does?
If I add a 500GB DataDisk to my pool is that shared across all the nodes that are running or does each new node get their own 500GB partition?
There does not seem to be much documentation on DataDisks for Azure Batch. In fact searching for the term within the Batch documentation has 0 results!
• When you add a data disk of a particular size to a batch pool, it is added to all the nodes existing or created in that batch pool, i.e., if you are adding a data disk of 500 GB to a batch pool and you created 4 nodes in that pool, then all the 4 nodes will be attached with a data disk of 500 GB individually. If these nodes are Linux VMs, then they will be attached with the data disk individually and you need to initialize these data disks from within the VM. To mount the disks and partition them, please follow the below documentation: -
https://learn.microsoft.com/en-us/azure/virtual-machines/linux/attach-disk-portal#connect-to-the-linux-vm-to-mount-the-new-disk
By following the above documentation, you will be able to mount these data disks individually to all the nodes from within the VM.
• When you add a data disk in a VM, you won’t be able to see them until you initialize them or format them from within the VM, thus you will need to login to every node and then partition it or initialize the disk for it to be visible and used.
Data disks are dedicated storage spaces or attached disks to a system/VM which can be shared with another resource likewise unless enabled but File shares are network mounted and partitioned storage volumes that are available over the network to all provisioned resources/VMs/systems. File shares like data disks have a fixed disk space/size but it is shared equally amongst the shared resources unless quota is allocated to each resource accessing the file share.
The above is same for nodes in an Azure batch pool also.
Please find the below links for your reference: -
https://learn.microsoft.com/en-us/azure/batch/virtual-file-mount?tabs=linux
I use AKS to deploy the application services. I have used the VM size of Standard_D4a_v4 with 4vCPU and 16Gi Memory configuration for the worker nodes.
Max data disks specified for the above mentioned configuration is 8. I need to clarify that, if only 8 pvcs of azure disks provisioner can be mounted to the worker node or is that possible to mount more pvcs (>8) with azure file provisioner?
• Since you are using a VM size of Standard_D4a_v4 with the said configuration of 4 vCPUs and 16GB of memory, the maximum number of data disks that can be created for this size are ‘8’ as for each VM vCPU, two data disks can be attached up to an absolute maximum of 64 disks per virtual machine. Also, I tested the above said in my environment also as below: -
Also, after attaching the maximum number of data disks as above, I also tried to connect an Azure file share as a volume in a VM as below and I was able to successfully mount it and access it. Thus, though you cannot add more data disks to a VM according to the said limitations, but you can surely connect file shares successfully and access them as network volumes through azure file provisioner.
Please find the below links for more information: -
https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/tutorial-use-disks-cli#azure-data-disks
https://learn.microsoft.com/en-us/azure/virtual-machines/dav4-dasv4-series#dav4-series
Not able to set up persistent volume using Azure disk
We are trying to deploy an application on AKS and the application is to use persistent volume. If we use Azure disk, we have noticed if the node having the pod running the application container is stopped / not working , another pod from another node is spinned up but it is no longer accessing the persistent volume.
As per documentation ,azure disk is mapped to a particular node and file share is shared across nodes. What is the way to ensure that a application running on AKS using persistent volume is not lost if a pod/node does not work ?
We are looking for a solution with regard to persistent storage so that an application with 3 pods as a replica set can use an Azure disk persistent volume in AKS.
The Azure disk to work as the persistent storage volume in AKS, it should associates to the actual node, so it cannot share the files between multiple pods. So if you want to share files and persist files between pods whenever the pods in any node, the Azure File Share is a good way for you.
Finally, all of all, if you have multiple nodes and the deployment has 3 replicas. Then the best way to share and persist data between pods is using the Azure File Share or the NFS.
In Azure, is it possible to have master VM that writes to a disk which has read-only slave replicas on other VMs?
Our app needs to download ~100GB of files when scaling to a new VM. This is loaded slowly from an external provider but we want to make it available quickly when we scale out more VMs.
I don't think you can do streaming replication (which I think is what you're asking for), or read only slave through the Azure service without implementing this yourself over network or through a relational database management system.
As of this writing, one disk cannot be connected to multiple Azure VMs (See FAQ for Managed Disks. One option would be to create a snapshot of the disk, and create a new disk from the snapshot. You could automate this via the Azure Managed Disk Service API (eg: an Azure Powershell script), and it would have to happen on a VM that isn't running.
If your data is same and doesn't change per new VM created then you can have it stored on the Azure File Storage Standard/ Premium. Then have Azure File storage attached to every new VM whenever it is created. snapshot disk will make it pretty complex. Azure Files Storage is good choice in this scenario.
When using HDInsight and choosing Azure Storage Blob to store the data that needs to be computed, you still have to choose the number of data nodes when provisioning a new cluster. If your data is being stored on an Azure Storage Blob, what impact does the number of data nodes have? Is the data from the blob actually replicated onto the data nodes?
If you put data on the Azure Blob Store, it stays there, and is read directly from Azure Storage.
The data nodes in the HDInsight cluster have two purposes. Firstly, they run the actual compute jobs, which read from Azure Storage Directly. This is not as crazy as it might sound to an HDFS user because of Azure's consistent underlying fabric, which keeps the storage nice and close to the compute.
Secondly, the data nodes are running an HDFS filesystem on their local disk. This is generally only used for intermediate and tmp files in HDInsight, since it is transitory (only lasts as long as the cluster).
So, choosing the number of data nodes is essentially choosing how many job running nodes (yarn application containers, or job tracker slots depending on version) you want to be able to handle, and to a lesser extent, choosing how much temp space your jobs need.