Kubernetes Persistent Volume multithread access

Kubernetes Persistent Volume multithread access - multithreading

I have a service that has single thread writer with multiple thread reader to a file deployed on k8.
Now I want to take advantage of the k8 persistent storage to save the huge load time in between pod restarts, by moving the file (1 writer, multi reader) to k8 persistent storage with local storage type. How would this affect my file lock?
I researched a lot online, and there are not a lot of mentioning of how multi threaded access would work on a persistent volume. Hope I could get some pointers on wether multi-threaded access would even work on persistent volume.

by moving the file (1 writer, multi reader) to k8 persistent storage with local storage type. How would this affect my file lock?
In both cases the application interacts with the file on the filesystem. So there will be no logical difference for your application.

Related

Issue with persistent storage in Azure Kubernetes Service using Azure Disk

Not able to set up persistent volume using Azure disk
We are trying to deploy an application on AKS and the application is to use persistent volume. If we use Azure disk, we have noticed if the node having the pod running the application container is stopped / not working , another pod from another node is spinned up but it is no longer accessing the persistent volume.
As per documentation ,azure disk is mapped to a particular node and file share is shared across nodes. What is the way to ensure that a application running on AKS using persistent volume is not lost if a pod/node does not work ?
We are looking for a solution with regard to persistent storage so that an application with 3 pods as a replica set can use an Azure disk persistent volume in AKS.

The Azure disk to work as the persistent storage volume in AKS, it should associates to the actual node, so it cannot share the files between multiple pods. So if you want to share files and persist files between pods whenever the pods in any node, the Azure File Share is a good way for you.
Finally, all of all, if you have multiple nodes and the deployment has 3 replicas. Then the best way to share and persist data between pods is using the Azure File Share or the NFS.

Does OpenEBS support shared storage?

Does OpenEBS provide any kind of storage class that enables shared storage among containers? let's say I have 5 containers that need to access the exact same data. I have several containers doing AI training that need access to a shared image database, so it be like ReadWrite Many way

No. Currently OpenEBS is working on block storage.

What is Azure Lease Blob?

What is Azure Lease Blob?
Is it just a mechanism which provides an exclusive lock to the blob storage?
What is special about it? Where can I use it? I tried to read Azure documents, but it's not clear to me yet.

What is Azure Lease Blob? Is it just a mechanism which provides an
exclusive lock to the blob storage?
Your understanding is correct. When you lease a blob, you acquire an exclusive lock on that blob. As long as the blob is leased, no one other than lease holder can modify or delete the blob. You can either acquire a temporary lease, duration of which could be anywhere between 15 to 60 seconds or an infinite lease.
Where can I use it?
Most commonly this functionality is used by Azure Virtual Machines. As you know Azure VMs are backed by Azure Page Blobs. So when a VM is created, an infinite lease is acquired on the blob holding the VHD for that Azure VM so that no other process can modify or delete that blob.
Yet another place where you would want to use lease blob functionality is when you want to implement Leader Election pattern.
For example, consider a scenario where you would want to process a file stored in blob storage through a background Worker Role. Further assume that there are multiple instances of that Worker Role running where each instance wakes up after 30 seconds and checks for blobs in a container and if any blob is found that instance processes it.
Since Azure Worker Roles are stateless in nature, without locking that blob each instance will process that blob (not something you would want). To avoid this situation what you could do is have each instance try to acquire a lease on that blob. Only one instance will succeed (and hence elected as leader). Other instances will fail to acquire the lock on that blob. The leader instance will process the blob.
Another example would be where you want to distribute work in Competing Consumers where you would want to distribute work among them. In one of our products, we are using this pattern with Leader Election pattern. Through blob lease functionality, we find a leader (consumer which was able to acquire blob lease) and then this leader distributes work amongst other consumers.

In Microservice Architecture, we can also use it as a distributed lock. Imaging many instances/services try to do the same task on a resource/object/artifact, anyone can acquire the lock can do the job. Otherwise, it has to wait until lock gets released.

Do I need Azure blob storage or just a simple web server on a VM?

I have a VM on Azure which is my content management system using nodejs and mongodb.
One of things the CMS does is have a social sharing function where html pages are created and users are given the url to this page.
I expect a large volume of users (probably 5000 at a given time) access the http pages. I do not want this load to be on the same server as my CMS.
So I was thinking about moving the html pages to another server. My question is do I need to look at Azure blob storage to do this or should I just use another VM and put files there?
The files are very small and minified. I want to keep my costs down whilst at the same time if I get more than 5000 requests, the server should auto scale.

The question itself is somewhat subjective/opinion-soliciting. And how you solve this problem is really up to you.
But from an objective perspective:
Blobs themselves are not the same as local file storage. If you're going to store content in them, either your CMS needs to support them natively or you're going to need to build that support into it (if that's even possible). Since they have their own REST API (and related SDKs) you cannot simple do file I/O operations against them. They are, however, accessible via URI (which may be made private or public).
Azure VMs store their disks (vhd's) in page blobs (so, you're already using blob storage technically speaking). And each VM may have attached disks (1TB each) also in page blobs, two disks per core (so a dual-core VM supports 4 attached 1TB disks). Just like your OS disk, these attached disks are durable, in blob storage. A CMS may access an attached disk once it's formatted and given a drive letter (Windows) or mounted (Linux). EDIT - forgot to mention: If you go with the attached-disk approach, you need to consider the fact that these disks are per-VM. That is, they are not shared across multiple VM's (in the event you scale your CMS to multiple instances).
Azure File Service is an SMB share sitting atop Azure Blob Storage. Again, durable storage, and drive-mappable. EDIT unlike attached disks, Azure File Service SMB shares are accessible across multiple VM's.

Azure WCF accessing disk files

I have a WCF service hosted on Windows Azure as a "cloud service." When the service starts, it needs to populate data from files/disk to its memory so it is accessed fast (cached in other words). Right now I'm using like C:\Documents\Filestoprocess folder so that the WCF calls the folder and populates data data in that folder in its memory. I have like 5,000 small files. How do I do this in Azure? Is there a folder path that I can call within the WCF so that the WCF calls these files and opens each files and saves each data in the files? I'm not really looking for complicated Blob access through network using bandwidth. I'm looking for simple disk I/O access to these files from the WCF "cloud service" that is running on its own public web address.

You should try to use a cloud storage service to store data, as if you write to the local file system it can get destroyed on a restart of the service or recycling of the service.
You can look into using the azure drive service, which is like creating a disk dive. It is on top of blob storage.
But if you really want to write and read data on the local file system check out this blog post http://blog.codingoutloud.com/2011/06/12/azure-faq-can-i-write-to-the-file-system-on-windows-azure/
It talks about setting up your service definition to allow writing to the local file system.

Depending on the size of your instances you'll get a non-presistent disk where you can store this kind of temporary data. The minimum is 20GB for an extra small instance. You shouldn't access the disk directly, but you need to use a local resource instead which you can configure in your service definition file or in Visual Studio (double click your Web / Worker Role).
This storage is non-persistent, this means if you delete your deployment, if you decrease the number of instances, in case of hardware problems, ... you loose all data saved here. If you want to persist your files you should use blob storage instead. But in your case, where you need the files as some kind of caching mechanism, local resources are perfect.
And if your goal is to cache data you might want to take a look at the caching features included in Windows Azure: Caching in Windows Azure

Blob access is not complex. In fact, you could do a single download of a zip file from blob storage to local disk, unzip it, then prime your wcf service from those 5,000 small files.
Check out this msdn page documenting DownloadBlobToFile(). The essential parts:
CloudBlobClient blobClient =
new CloudBlobClient(blobEndpoint, new StorageCredentialsAccountAndKey(accountName, accountKey));
// Return a reference to the blob.
CloudBlob blob = blobClient.GetBlobReference("mycontainer/myblob.txt");
// Download the blob to a local file.
blob.DownloadToFile("c:\\mylocalblob.txt");
Now: I don't agree with saving to the root folder on C:. Rather, you should grab some local storage (easily configurable). Once you configure local storage in your role configuration, just ask the role environment for it, and ask for root path:
var localResource = RoleEnvironment.GetLocalResource("mylocalstorage");
var rootPath = localResource.RootPath;
Note: As #KingPancake mentioned, you could use an Azure drive. However: remember that an Azure drive can only be writeable by one instance. You'd need to make additional snapshots for your other instances. I think it's much simpler for you to go with a simple blob, copy your files down (either as single zip or individual files), and go from there.
You mentioned concern with network+bandwidth. You don't pay for bandwidth within the same data center. Also: It's extremely fast: 100Mbps per core. So even with a Small instance, you'll have your files copied down very quickly, moreso when you go to larger instance sizes.
One last thought: The only other ways to gain access to your 5,000 files, without using blob storage or Azure Drives (which are mounted as vhd's in blob storage) would be to either download the files from an external source or bundle them with your Windows Azure package (and then they'd show up in your app's folder, under whatever subfolder you stuck them in). Bundling has two downsides:
Longer time to upload your deployment package due to added size
Inability to change any of the individual files without redeploying the package.
By storing in a blob, you can easily change one (or all) of your small files without redeploying your code - you'd just need to signal it to either re-read from blob storage or restart the instances so they automatically download the new files.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string