Sharepoint - external storage for documents

Sharepoint - external storage for documents - sharepoint

I am now extending some features of our corporate sharepoint, and one of the wishes of my customer is following:
Sharepoint farm has little space allocation, about 2 Gb per department. The department I am working with has hundreds of midsized documents (4-10 Mb in average, pptx, doc, pdf...) per month to be stored permanently. Currently they do it in a NAS file share. They want however to store them in Sharepoint for accessibility, but avoiding the 2Gb limit. Is there a way to integrate the external file storage to the Sharepoint?
Alternatively, is it possible to store the file data in a database (i.E. Access or MS SQL) but avoiding the farm-wide installations of frameworks like RBS?

You can using Remote BLOB Storage (RBS) as part of your data storage solution.
Check the article: Manage RBS in SharePoint Server
Before you migrate your BLOBs out of the database, you’ll need to choose an option. Here are several typical options:
1.File System: You can use a normal file system (perhaps a large disk partition on a file server) to store your BLOBs.
2.SAN/NAS Storage: Storage area network (SAN) and network attached storage (NAS) are usually high-end storage options. They’re expensive, but well-suited if the business value of your documents and their size can justify the cost. Both SAN and NAS provide data replication and mirroring and seamless growth into terabytes of data.
3.Cloud Storage: This is useful in at least two situations. First, when you’re running SharePoint in the cloud, but still want to externalize the BLOBs, your natural choice is to store them in a nearby, vendor-provided cloud storage. The second is when you’re running SharePoint within your own datacenter but want to store or archive all BLOBs in the cloud due to space limitations or reliability issues in your datcenter. Archiving is the most common reason in this situation. Make sure that if a user is creating or modifying document for cloud storage that your EBS or RBS provider does this in the background, as it could degrade performance. You also want to make sure your third-party EBS or RBS provider supports storing BLOBs in the cloud storage.
More information is here: Optimize SharePoint with External Storage

RBS is the way to leverage non-SQL Server storage for your BLOBs. Any other approach you take will be a hack/work around. Only your customer's requirements can tell us if the limitations of these workarounds are acceptable. For example, you could store the files elsewhere (network share or cloud share) and have SharePoint Search index them. In this case you loose out on a consistent UI for managing content and you'd still need the SP hosting team's help to setup the search crawling.
The real answer is to work with your customer to document their business needs and why IT's offering doesn't meet that need so that they can give you more space.

Related

Azure Filestore & Backup

I have a physical server that holds documents customer orders in XML and their resultant orders PDF. The locations are mapped from the application server that generates them and the desktops that need to access them via drive mapping.
These files need to be kept for a number of years for regulatory purposes, the current file server needs expanding. So I was thinking that as this data will grow to about 5-8tb over time as the data needs to be held for approx 10 years, then it can be removed.
I could create a VM in Azure with the appropriate storage and then I presume to use MARS to create a backup strategy as if this was an onsight server. But to meet the disk sizing I need a large server as the processing of the server does not need to be very much its just storage.
So I would need to still be able to map the server and desktops to the drive where the files are stored
So I was wondering if anyone could suggest an approach. The data from the desktop would need to be available for the application to access for up to 18 months. So the old data could be archived but still needs to be backed up as retrieval of archive data would be via a manual search.
thanks in advance.

You can map the Azure file storage as a network drive on your desktop and then you can take a backup of files stored in Azure file storage for long-term retention.
Azure Backup for Files is in Limited Preview today. It enables you to take scheduled copies of yourFiles and can be managed from the Recovery Services Vault. If you are interested in signing up for this preview, you can drop in a mail with their subscription ID(s) to AskAzureBackupTeam#microsoft.com

Do I need Azure blob storage or just a simple web server on a VM?

I have a VM on Azure which is my content management system using nodejs and mongodb.
One of things the CMS does is have a social sharing function where html pages are created and users are given the url to this page.
I expect a large volume of users (probably 5000 at a given time) access the http pages. I do not want this load to be on the same server as my CMS.
So I was thinking about moving the html pages to another server. My question is do I need to look at Azure blob storage to do this or should I just use another VM and put files there?
The files are very small and minified. I want to keep my costs down whilst at the same time if I get more than 5000 requests, the server should auto scale.

The question itself is somewhat subjective/opinion-soliciting. And how you solve this problem is really up to you.
But from an objective perspective:
Blobs themselves are not the same as local file storage. If you're going to store content in them, either your CMS needs to support them natively or you're going to need to build that support into it (if that's even possible). Since they have their own REST API (and related SDKs) you cannot simple do file I/O operations against them. They are, however, accessible via URI (which may be made private or public).
Azure VMs store their disks (vhd's) in page blobs (so, you're already using blob storage technically speaking). And each VM may have attached disks (1TB each) also in page blobs, two disks per core (so a dual-core VM supports 4 attached 1TB disks). Just like your OS disk, these attached disks are durable, in blob storage. A CMS may access an attached disk once it's formatted and given a drive letter (Windows) or mounted (Linux). EDIT - forgot to mention: If you go with the attached-disk approach, you need to consider the fact that these disks are per-VM. That is, they are not shared across multiple VM's (in the event you scale your CMS to multiple instances).
Azure File Service is an SMB share sitting atop Azure Blob Storage. Again, durable storage, and drive-mappable. EDIT unlike attached disks, Azure File Service SMB shares are accessible across multiple VM's.

Azure Table Storage Vs On-premises NoSql

I need to consider a database to store large volumes of data. Though my initial requirement is to simply retrieve chunks of data and save them in excel file, I am expecting more complex use cases for this data in future where the data will be consumed by different applications especially for analytics - hence need to use aggregated queries.
I am open to use either cloud based storage or on-premises storage. I am considering azure storage table (when there is a need to use aggregated data, I can have a wrapper service + cache around azure table storage but eventually will end up with nosql type storage) and on-premises MongoDB. Can someone suggest pros and cons of saving large data in azure table storage Vs on-premises MongoDB/couchbase/ravendb? Cost factor can be ignored.

I suspect this question may end up getting closed due to its broad nature and potential for gathering more opinions than fact. That said:
This is really going to be an app-specific architecture issue, dealing with latency and bandwidth, as well as the need to maintain on-premises servers and other resources. On-prem, you'll have full control of your hardware resources, but if you're doing high-volume querying against your database, from the cloud, your performance will be hampered by latency and bandwidth. Cloud-based storage (whether in MongoDB or any other database) will have the advantage of being neighbors with your app if set up in the same data center.
Keep in mind: Any persistent database store will need to back its data in Azure Storage, meaning a mounted disk backed by Blob storage. You'll need to deal with the 1TB-per-disk size limit (expanding to 16TB on an 8-core box via stripe), and you'll need to compare this to your storage needs. If you need to go beyond 16TB, you'll need to either shard, go with 200TB Table storage, or go with on-prem MongoDB. But... MongoDB and Table Storage are two different beasts, one being document-based with a focus on query strength, the other a key/value store with very high speed discrete lookups. Comparing the two on the notion of on-prem vs cloud is secondary (in my opinion) to comparing functionality as it relates to your app.

Is this a sensible Azure Blob Storage setup and are there restructuring tools to help me migrate to it?

I think we have gone slightly wrong on the way we have used Azure storage in a SAAS system. We created a storage account per client (Securtiy was prime consideration) and containers per system area e.g. Vehicle, Work etc
Having done further reading it seems a suggestion would be that we should have used one account for all clients. Each client should have a container (so we can programmatically create it) which we then secure. Then files should just be structured using "virtual" folder structure e.g. Container called "Client A". Then Files for the Jobs (in Work area of system) stored like Work/Jobs/{entity id}/blah.pdf. Does this sound sensible?
If so we now have about 10 accounts that we need to restructure. Are there any tools that will let us easily copy one accounts contents to another containers account? I appreciate we probably can't move the files between accounts (as we set them up ages ago so can't use native copy function) so I guess some sort of copy. There are GB of files across all the accounts.

It may not be such a bad idea to keep different storage accounts per client. The benefits of doing that (to me) are:
Better security as mentioned by you.
You'll be able to achieve better throughput / client as each client will have their own storage account. If you keep one storage account for all clients, and if one client starts hitting that account badly other clients will be impacted.
Better scalability. Each storage account can hold up to 200 TB of data. So if you keep just one storage account and assuming each client consumes 100 GB of data, you'll be able to accommodate only 2000 clients (I hope my math is right :)). With individual storage accounts, you won't be restricted in that sense.
There're some downsides as well. Some of them are:
Management would be a nightmare. Imagining you have 2000 customers then you would end up managing 2000 storage accounts.
You may be limited by Windows Azure. Currently by default you get about 10 or 20 storage accounts per subscription and you would need to contact support to manually up that limit. They can do that for you but I would imagine you would want this to be a self-service model where you would be able to create as many storage accounts as you want without contacting support.
Now coming to your question about tooling, you could possibly write something on your own which makes use of Copy Blob functionality. This functionality allows you to copy blob data across storage accounts asynchronously. Basically this is what you would do is:
First create a blob container for each client in the target storage account.
Enumerate all blob containers in source storage account.
For each blob container in source storage account, enumerate the blobs.
Copy each blob asynchronously to target storage account in the client's blob container.
If you're a PowerShell fan, you can look into Cerebrata's Azure Management Cmdlets (http://www.cerebrata.com/Products/AzureManagementCmdlets) as well which wraps this functionality. I could have recommended Cerebrata's Azure Management Studio as well but I haven't tried this functionality just yet there [Disclosure: I'm one of the devs on Cerebrata team].
Hope this helps.

Adding to Gaurav Mantri answer...
You can have shared storage account for customers and use Shared Access Signature(SAS) to limiting access to particular container or blobs(as well as for tables and queues)...
http://msdn.microsoft.com/en-us/library/windowsazure/hh508996.aspx

Azure blobs, what are they for?

I'm reading about Azure blobs and storage, and there are things I don't understand.
First, you can hire Azure for just hosting, but when you create a web role ... do you need storage for the .dll's and other files (.js and .css) ?? Or there are a small storage quota in a worker role you can use? how long is it? I cannot understand getting charged every time a browser download a CSS file, so I guess I can store those things in another kind of storage.
Second, you get charged for transaction and bandwidth, so it's not a good idea to provide direct links to the blobs in your websites, then... what do you do? Download it from your web site code and write to the client output stream on the fly from ASP.NET? I think I've read that internal trafic/transactions are for free, so it looks like a "too-good-for-be-truth" solution :D
Is the trafic between hosting and storage also free?
Thanks in advance.

First, to answer your main question: blobs are best used for dynamic data files. If you run a YouTube sorta site, you would use blobs to store videos in every compressed state and thumbnails to images generated from those videos. Tables within table storage are best for dynamic data that does not require files. For example comments on YouTube videos would likely be best stored by tables in ATS.
You generally want a storage account for at least: publishing your deployments into Azure and to have your compute nodes transfer their diagnostic data to, for when you're deployed and need to monitor your compute nodes
Even though you publish your deployments THROUGH a storage account, the deployment code lives on your compute nodes. .CSS/.HTML files served by your app are served through your node's storage space which you get plenty of (it is NOT a good place for your dynamic data however)
You pay for traffic/data that crosses the Azure data center boundary, irregardless where it came from. Furthermore, transactions (reads or writes) between your azure table storage and anywhere else are not free. You also pay for storing the data in the storage account (storing data on compute nodes themselves is not metered). Data that does not leave their data center is not subject to transfer fees. Now in reality, the costs are so low, that you have to be pushing gigabytes per day to start noticing
Don't store any dynamic data only on compute instances. That data will get purged whenever you redeploy your app or whenever they decide to move your app onto a different node.
Hope this helps

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string