Cut videos from Azure Blob Storage - azure

I have a web app that is hosted in Azure; one of it's functionalities is to be able to make a few cuts from the video(generate 2 or 3 small videos of 5-10 seconds from a larger video).
The videos are persisted in Azure Blob Storage.
How do you suggest to accomplish this in the Azure environment?
The actual cutting of the videos will be initiated by a web job. I'm also concerned about the pricing(within the Azure environment), I'm taking into account the possibility of high traffic.
Any feedback is appreciated.
Thank you.

Assuming you have video-cutting code that operates on files through normal I/O: You'd need to download the video file from blob, process it via code (or whatever library you've employed), and then store the result back in blob storage. You cannot reference a blob directly with normal standard IO libraries.
If, however, videos are stored in Azure File storage (which is an SMB layer on top of blob storage, then you will be able to directly manipulate your video files.
Web Jobs run within an App Service (just like Web Apps), so you have access to a certain amount of local disk space (depending on App Service tier) for use. You should have no problem temporarily storing a video file within your web app's disk space, for editing operations.
You asked about cost: Again, assuming you're talking about running code within a Web Job (app service), you're just paying for whatever App Service tier you've chosen.
How you actually do those edit operations is entirely up to you (language, library, etc).

Azure Blob Storage is simply an object store which stores the data. It does not have the capability you're looking for.
Azure Media Service however is the service you should look into. The media served by this service makes use of Azure Blob Storage.
For editing video, may I suggest you take a look at Video Editor Plugin for Azure Media Player. You can read more about this plugin here: https://azure.microsoft.com/en-in/blog/video-editor-plugin/. You can also try it out here: http://ampdemo.azureedge.net/amp_editor.html.

Related

moving locally stored documented documents to azure

I want to spike whether azure and the cloud is a good fit for us.
We have a website where users upload documents to our currently hosted website.
Every document has an equivalent record in a database.
I am using terraform to create the azure infrastructure.
What is my best way of migrating the documents from the local file path on the server to azure?
Should I be using file storage or blob storage. I am confused about the difference.
Is there anything in terraform that can help with this?
Based on your comments, I would recommend storing them in Blob Storage. This service is suited for storing and serving unstructured data like files and images. There are many other features like redundancy, archiving etc. that you may find useful in your scenario.
File Storage is more suitable in Lift-and-Shift kind of scenarios where you're moving an on-prem application to the cloud and the application writes data to either local or network attached disk.
You may also find this article useful: https://learn.microsoft.com/en-us/azure/storage/common/storage-decide-blobs-files-disks
UPDATE
Regarding uploading files from local computer to Azure Storage, there are actually many options available:
Use a Storage Explorer like Microsoft's Storage Explorer.
Use AzCopy command-line tool.
Use Azure PowerShell Cmdlets.
Use Azure CLI.
Write your own code using any available Storage Client libraries or directly consuming REST API.

Azure Back Ground Services For File Processing

We currently have Window service to process Inbound/outbound files.
In Bound files we read data and perform some calculations and store data in Database.
Out Bound files we generate data from the database.
We want to migrate to azure now. I have following questions .
1) what is the best way to store files in azure (Blob or File Share in azure) . We have only ".pdf,.txt,.xlsx" formats no videos
2) Which process is better to process files - WebJobs, Virtual Machine and install window service , Azure Batch Jobs, azure kubernetes service,Service Fabric.
Please some can help me on this.
Thanks
How are you receiving the files API, FTP or some other way? There are a ton of details that are needed to really answer this, but here are my thoughts.
Blob storage would be more cost effective. You only need to use a file share if you want to be able to map a network drive from a VM.
If processing one file would complete in less than 10 minutes I would look at Azure functions for that. If you’re processing thousands of files per day Azure functions would be expensive so I would look at running them on an App Service on VMs or moving to Service Fabric.
If you have a web site that’s used to upload the files and you’re already using Azure App service then you could use Web Jobs.

Best way to download many images into Azure App Service .Net Core app for processing

I have over 500 large image files that I need to process in my .NET Core app hosted in an Azure App Service. That said, I need to download all of the images and run them through a machine learning categorization function in my code. I currently use blob storage as my mechanism for storing the images, but downloading all those images via blob rest api is slow. Is there a better architecture in Azure that I should be making use of to greatly increase performance of processing these images? Perhaps a storage mechanism much faster than blob storage?
Yes, I tried at my side. Even the Storage Account is at the same location as my web app, it will take about 3-6 second to download a 30MB file. (In VM, it will only take less than 1 second)
My suggestions:
You can zip your pictures into one archive file, and download it. It would be faster than downloading them one by one.
You can use DownloadToFileParallelAsync method to download a file. It would be a little faster.
You can refer to the official tutorial to Download large amounts of random data from Azure storage

Upload 650,000 documents to Azure

I can't seem to find any reference to bulk uploading data to azure.
I have a document store with 650,000 pdf document that take up about 1.2 TB of disk space.
Uploading those files to Azure via the web will be difficult. Is there a way I can mail a hard drive and have your team upload them for me?
If not can you recommend the best way to upload this many documents?
Maybe not the answer you expected, but you could use Amazon's AWS Import/Export (this allows you to mail them a HDD and they'll import it in your S3 account).
To transfer the data to a Windows Azure Storage Account you can leverage one of the new features in the 1.7.1 SDK: the StartCopyFromBlob method. This method allows you to copy a file at a specific url in an asynchronous way (you could use this to copy all files from your S3 to your Azure storage account).
Read the following blogpost for a fully working example: How to Copy a Bucket from Amazon S3 to Windows Azure Blob Storage using “Copy Blob”
While Azure doesn't offer a physical ingestion process today, if you talk nicely to the Azure team they can do this as a one off. If you like I can get a contact on the product team for you (dave at greenbutton dot com).
Alternatively there are solutions such as Aspera which provide for accelerated data transfers over UDP and is being beta test in Azure along with the Azure Media Services offering.
We have some tools that help with this as well http://www.greenbutton.com and leverage Aspera's technology.
As disk shipment are not supported by Windows Azure, your best bet is use a 3rd party application (or write your own one) which supports parallel upload. This way you can still upload much faster. 3rd party applications like Gladinet, Cloudberry could be used for upload the data but I am not sure how configurable they are to get maximum parallel upload to achieve fastest upload.
If you decide to write by yourself here is the starting point: Asynchronous Parallel Block Blob Transfers with Progress Change Notification
I know this is a bit too late for the OP, but in the Azure Management Portal, under Storage, pick your storage instance, then click the Import/Export link at the top. At the bottom of that screen, there is a "Create Import Job" link and icon. Also, if you click the blue help icon on the far right side, it says this:
You can use the Windows Azure Import/Export service to transfer large amounts of file data to Windows Azure Blob storage in situations where uploading over the network is prohibitively expensive or infeasible. You can also use the Import/Export service to transfer large quantities of data resident in Blob storage to your on-premises installations in a timely and cost-effective manner. Use the Windows Azure Import/Export Service to Transfer Data to Blob Storage
To transfer a large set of file data into Blob storage, you can send one or more hard drives containing that data to a Microsoft data center, where your data will be uploaded to your storage account. Similarly, to export data from Blob storage, you can send empty hard drives to a Microsoft data center, where the Blob data from your storage account will be copied to your hard drives and then returned to you. Before you send in a drive that contains data, you'll encrypt the data on the drive; when Microsoft exports your data to send to you, the data will also be encrypted before shipping.
Both windows azure storage powershell and azcopy could bulk upload data to azure.
For azure storage powershell, you could use ls -File -Recurse | Set-AzureStorageBlobContent -Container upload.
You can refer http://msdn.microsoft.com/en-us/library/dn408487.aspx for more details.
For azcopy, you can refer this article http://blogs.msdn.com/b/windowsazurestorage/archive/2012/12/03/azcopy-uploading-downloading-files-for-windows-azure-blobs.aspx

Windows azure requests

I have an application that is deployed on Windows Azure, in the application there is a Report part, the reports works as shown below.
The application generates the report as a PDF file and save it in a certain folder in the application.
I have a PDF viewer in the application that takes the URL of the file and displays it.
As you know, in windows azure I will have several VMs that will handled through a Load balancer so I can not ensure that the request in step 2 will go to the same VM in step 1, and this will cause a problem for me.
Any help is very appreciated.
I know that I can use BLOB, but this is not the problem.
The problem is that after creating the file on a certain VM, I give the PDF viewer the url of the pdf viewer as "http://..../file.pdf". This will generate a new request that I cannot control, and I cannot know which VM will server, so even I saved the file in the BLOB it will not solve my problem.
as in any farm environment, you have to consider saving files in a storage that is common for all machines in the farm. In Windows Azure, such common storage is Windows Azure Blob Storage.
You have to make some changes to your application, so that it saves the files to a Blob stroage. If these are public files, then you just mark the Blob Container as public and provide the full URL to the file in blob to the PDF viewer.
If your PDF files are private, you have to mark your container as private. Second step is to generate a Shared Access Signature URL for the PDF and provide that URL to the PDF viewer.
Furthermore, while developing you can explore your Azure storage using any of the (freely and not so freely) available tools for Windows Azure Storage. Here are some:
Azure Storage Explorer
Azure Cloud Storage Studio
There are a lot of samples how to upload file to Azure Storage. Just search it with your favorite search engine. Check out these resources:
http://msdn.microsoft.com/en-us/library/windowsazure/ee772820.aspx
http://blogs.msdn.com/b/windowsazurestorage/archive/2010/04/11/using-windows-azure-page-blobs-and-how-to-efficiently-upload-and-download-page-blobs.aspx
http://wely-lau.net/2012/01/10/uploading-file-securely-to-windows-azure-blob-storage-with-shared-access-signature-via-rest-api/
The Windows Azure Training Kit has great lab named "Exploring Windows Azure Storage"
Hope this helps!
UPDATE (following question update)
The problem is that after creating the file on a certain VM, I give
the PDF viewer the url of the pdf viewer as "http://..../file.pdf".
This will generate a new request that I cannot control, and I cannot
know which VM will server, so even I saved the file in the BLOB it
will not solve
Try changing a bit your logic, and follow my instructions. When your VM create the PDF, upload the file to a blob. Then give the full blob URL for your pdf file to the PDF viewer. Thus the request will not go to any VM, but just to the blob. And the full blob URL will be something like http://youraccount.blob.core.windows.net/public_files/file.pdf
Or I am missing something? What I understand, your process flow is as follows:
User makes a special request which would cause PDF file generation
File is generated on the server
full URL to the file is sent back to the client so that a client PDF viewer could render it
If this is the flow, that with suggested changes will look like the following:
User make a special request which would cause PDF file generation
File is generated on the server
File is uploaded to a BLOB storage
Full URL for the file in the BLOB is returned back to the client, so that it can be rendered on the client.
What is not clear? Or what is different in your process flow? I do exaclty the same for on-the-fly report generation and it works quite well. The only difference is that my app is Silverlight based and I force file download instead of inline-displaying.
An alternative approach is not to persist the file at all.
Rather, generate it in memory, set the content type of the response to "application/pdf" and return the binary content of the report. This is particularly easy if you're using ASP.NET MVC, but you can use a HttpHandler instead. It is a technique I regularly use in similar circumstances (though lately with Excel reports rather than PDF).
The usefulness of this approach does depend on how you're generating the PDF, how big it is and what the load is on your application.
But if the report is to be served just once, persisting it just so that another request can be made by the browser to retrieve it is just wasteful (and you have to provide the persistence mechanism).
If the same file is to be served multiple times and it is resource-intensive to create, it makes sense to persist it, then.
You want to save your PDF to a centralized persisted storage. VM's hard drive is neither. Azure Blob Storage is likely the simplest and best solution. It is dirt cheap to store and access. API for storing files and access them is very simple
There are two things you could consider.
Windows Azure Blob + Queue Storage
Blob Storage is a cost effective way of storing binary and sharing that information between instances. You would most likely use a worker role to create the Report which would store the report to Blob Storage and drop a completed message on the Queue.
Your web role instance could monitor the queue looking for reports that are ready to be displayed.
It would be similar to the concept used in the Windows Azure Guest Book app.
Windows Azure Caching Service
Similarly [and much more expensive] you could share the binary using the Caching Service. This gives a common layer between your VMs in which to store things, however you won't be able to provide a url to the PDF you'd have to download the binary and use either an HttpHandler or change the content-type of the request.
This would be much harder to implement, very expensive to run, and is not guaranteed to work in your scenario. I'd still suggest Blobs over any other means
Another option would be to implement a sticky session handler of your own. Take a look at:
http://dunnry.com/blog/2010/10/14/StickyHTTPSessionRoutingInWindowsAzure.aspx

Resources