Object storage system

Object storage system - object

what is the basic requirement of object storage system . is there any minimum set of interfaces that must be exposed by object storage system.Are all vendor exposing same interfaces

Are you willing to develop your own object storage system?
Object storage is a term which in turn used for the approach for storing, addressing, manipulating and managing as a single storage unit called objects.
There are many object storage system like Amazon-S3, Openstack-SWIFT etc.
They all have same kind of specifications but implementation is different.
There are some standard specifications also by SNIA called Cloud Data Management Interface (CDMI) which has been declared as an ISO standard for the Cloud Storage Systems.
Following is the link for more info How-to guide: Object storage system implementation

Related

How to store temporary data in an Azure multi-instance (scale set) virtual machine?

We developed a server service that (in a few words) supports the communications between two devices. We want to make advantage of the scalability given by an Azure Scale Set (multi instance VM) but we are not sure how to share memory between each instance.
Our service basically stores temporary data in the local virtual machine and these data are read, modified and sent to the devices connected to this server.
If these data are stored locally in one of the instances the other instances cannot access and do not have the same information. Is it correct?
If one of the devices start making some request to the server the instance that is going to process the request will not always be the same so the data at the end is spread between instances.
So the question might be, how to share memory between Azure instances?
Thanks

Depending on the type of data you want to share and how much latency matters, as well as ServiceFabric (low latency but you need to re-architect/re-build bits of your solution), you could look at a shared back end repository - Redis Cache is ideal as a distributed cache; SQL Azure if you want to use a relation db to store the data; storage queue/blob storage - or File storage in a storage account (this allows you just to write to a mounted network drive from both vm instances). DocumentDB is another option, which is suited to storing JSON data.

You could use Service Fabric and take advantage of Reliable Collections to have your state automagically replicated across all instances.
From https://azure.microsoft.com/en-us/documentation/articles/service-fabric-reliable-services-reliable-collections/:
The classes in the Microsoft.ServiceFabric.Data.Collections namespace provide a set of out-of-the-box collections that automatically make your state highly available. Developers need to program only to the Reliable Collection APIs and let Reliable Collections manage the replicated and local state.
The key difference between Reliable Collections and other high-availability technologies (such as Redis, Azure Table service, and Azure Queue service) is that the state is kept locally in the service instance while also being made highly available.
Reliable Collections can be thought of as the natural evolution of the System.Collections classes: a new set of collections that are designed for the cloud and multi-computer applications without increasing complexity for the developer. As such, Reliable Collections are:
Replicated: State changes are replicated for high availability.
Persisted: Data is persisted to disk for durability against large-scale outages (for example, a datacenter power outage).
Asynchronous: APIs are asynchronous to ensure that threads are not blocked when incurring IO.
Transactional: APIs utilize the abstraction of transactions so you can manage multiple Reliable Collections within a service easily.
Working with Reliable Collections -
https://azure.microsoft.com/en-us/documentation/articles/service-fabric-work-with-reliable-collections/

Azure data storage encryption?

I am creating an azure based application that must be pci compliant. There is an understanding within my company that to meet this compliancy any personally identifiable information (PII) should be stored encrypted.
I have a number of questions.
Is it true that pci compliance means encrypting PII within the data store?
What are my options with this on Azure?
I would like to be storing data in documentdb as this would be the closest match to the format of the data within the application. Most of the data is document based and json. Would this meet the PCI compliance standards?
Does it make a difference if the data store that contains payment and card info is different to that containing the PII?

The question regarding what PCI compliance requires is best directed to your organization's compliance officer. They are the one that will ultimately have to "sign off" on your solution so they control the specifications you're working towards.
As for what your options are, mfanto pointed out the SQL support for the new tiers. There's also Azure Storage which now has encryption extensions. Document DB doesn't have anything yet to my knowledge. And if you're running your own database, Windows VMs have had support for bitlocker drive encryption on data drives for some time now.

While the sample uses local files, it should be noted that Azure Encryption Extensions supports streams as well for all upload/download methods - and nothing is ever written to disk (streams are encrypted/decrypted on the fly).
UploadFromStreamEncrypted(...)
DownloadToStreamEncrypted(...)
https://github.com/stefangordon/azure-encryption-extensions/blob/master/AzureEncryptionExtensionsTests/FunctionalTests.cs#L107

Cosmos DB (formerly DocumentDB) now supports encryption-at-rest. It is enabled by default in every region. There is no impact to cost or performance SLA. Note: The local emulator does not support encryption-at-rest (though the emulator is only for dev/test purposes).
As far as compliance goes, you'll need to talk with a compliance/legal expert for that.
For more info on Cosmos DB encryption-at-rest, see this post.

About windows azure blob storage, the implementation of a project should not depends on the cloud platform

We plan to migrate the existing website to Windows azure, and i have been told that we need to store files to blob storage.
My questions is:
If we want to use blob storage, that means i need to re-write the file storage function(we use file system for now), call blob service api to store files, that's very strange for me just because we want to use windows azure, how about in the future we want to use Amazon EC2 or other cloud platform, they might have there own way to store file, then may be i need to re-write the file storage function again, in my opinion , the implementation of a project should not depends on the cloud platform(or cloud server)! Can any body correct me, thanks!

I won't address the commentary about whether an app should have a dependency on a particular cloud environment (or specific ways to deal with that particular issue), as that's subjective and it's a nice debate to have somewhere else. What I will address is the actual storage in Azure, as your info is a bit out-of-date.
One reason to use blob storage directly (and possibly the reason you were told to use blob storage) is that it provides access from multiple instances of your app. Also, blob storage provides 500TB of storage per storage account, and it's triple-replicated within the deployed region (and optionally geo-replicated). With attached storage (either with local disk or blob-backed Azure Disk), the access is specific to a particular instance of your app. Shifting from file system access to blob storage access does require app modification.
If you choose not to modify your app's file I/O operations, then you can also consider the new Azure File Service, which provides SMB access to storage (backed by blob storage). Using File Service, your app would (hopefully) not need to be modified, although you might need to change your root path.
More information on Azure File Service may be found here.

Why does it seem strange? You need to store your files somewhere and the cloud is a good a place as any IF it suits your needs. The obvious advantages are redundancy and geo replication, sharing files across multiple projects and servers, The list goes on. It's difficult to advise on whether it would be a good idea or not without hearing some specifics.
You could use windows azure storage with amazon in the future if you wanted to (you'd just need to set up the access for it), obviously with slighter longer delay. Then again that slight performance drop may be significant and you may end up re-writing it.
Most importantly, swapping over from one cloud provider to another is not trivial depending on just how much you use it or how much data you've got in it, so I would strongly suggest looking at the advantages / disadvantages of each platform closely before putting your lot in with either one and then fully learn that platform.
Personally, I went for Azure cloud services + storage etc even though it was slightly more expensive at the time, because i'm a Microsoft Person (not that I didn't do my research). It was annoying in the early days when key features were missing, but it's really matured now and I like the pace that it's improving.
It's cheap to test, why not try both and see which one suits you? A small price to pay when you have big decisions to make.
Disclaimer: I don't know the current state of Amazon web services.

Nice question. We are in the middle of a migration of an old PHP/MySQL/LocalShare to WebRole/SQLAzure/AzureStorage ERP application. We faced the same problem and decision. Let me write some thoughts about the issue :
It is a good option to just be able to switch the storage provider but is it reasonable? You can always build the abstraction but do you plan how to do the actual change of storage provider - migration/sync while in production? What kind of argument will exactly drive the transition to another storage provider? How much users and data do you have? Do you plan to shard-rebalance the storage in the future? How reliable must be this system during this storage provider switch? Do you want to totally move the data when you want to switch or you just want to shard it so that you start using this different provider? Does the cost development of these (reliable) storage layers and the cost of development of reliable transitions (or bi-directional syncs) outweighs the money difference between any two storage providers?
Just switching storage mechanism from Azure Blob to Amazon will incur heavy latency penalty if your other services are on Azure - When you create Storage and Services on Azure you set affinity groups by region so that you minimize the network latency.
These are only a few of the questions to answer before doing all the weightlifting. We have abstracted the file repository (blob) because we planned to move from local NFS to Blob transparently and gradually and it answers our needs.

Azure addon - accessing WADPerformanceCountersTable?

If I write an Azure addon, can it access the WADPerformanceCountersTable table (of the business application that provisioned this addon)? Especially in terms of security/permissions.
E.g. say I wanted my addon to monitor some performance counters, and send an email alert if they pass some thresholds (regardless of whether there are already such commercial products, I'm just interested in the technical capability). What will I have to do? I'm guessing WADPerformanceCountersTable isn't publicly exposed to the entire worlds - so how can I make them accessible to my addon?
thanks very much

WADPerformanceCountersTable is nothing different from other Azure tables, and it's stored in the storage defined by Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString in the configuration file. You will need the storage account name/key pair to read from this table.
FYI, here is an article about how to effectively fetching performance counter data from this table: http://gauravmantri.com/2012/02/17/effective-way-of-fetching-diagnostics-data-from-windows-azure-diagnostics-table-hint-use-partitionkey/

Why we need Transient Fault Handling for storage?

I saw some thread said storage library already have retry policy,
So why should us use this :Transient Fault Handling block
Can any one show me some samples about how to use this Transient Fault handling block for myy blob and Table storage properly?
thanks!

For implementation examples, read on in the link you sent - the Key Scenarios section. If you aren't having problems connecting, then don't implement. We use it, but it hasn't helped as far as we know. Every fault we've encountered has been a longer term, Azure internal network related issue that caused faults the TFHB couldn't handle.

One reason (and the reason I use it in my application) is that the transient fault handling application block provides retry logic for not only storage (tables, blobs and queues) but also for SQL Azure as well as Service Bus Queues. If your project makes use of these additional resources (namely SQL Azure and Service Bus Queues) and you would want to have a single library to handle transient faults, I would recommend using this over storage client library.
Another reason I would give for using this library is it's extensibility. You could probably extend this library to handle other error scenarios (not covered by storage client library retry policies) or use it against other web resources like service management API.
If you're just using blob and table storage, you could very well use the retry policies which come with storage client library.

I don't use the Transient Fault Handling block for blob storage, it may be more applicable to table storage or when transmitting larger chunks of data. Given that I use blob storage containers to archive debug information (in the form of short txt files) on certain areas of the site, it seems a little convoluted. I've never once witnessed any failures writing to storage and we write 10s of thousands of logs a week. Of course different usage of storage may yield different reliability.

For tables and blobs you do not need to use any external transient retry blocks afaik. The ones that are implemented within the sdk are fairly robust. If you think you should implement a special retry policy the way to do that is to implement your own retry policy inheriting from Azure Storage IRetryPolicy interface and pass that on to your storage requests as part of the TableRequestOptions.RetryPolicy property.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string