I'm trying to create an app which uses Windows Media Services REST API to upload video files and transcode them. Everything works great but i have encountered a situation, in which i'm not able to upload a 160 MB video file without knowing why. It's extremely painfull to debug the upload process in a regular way cause uploading 160 MB file takes ages so i decided to ask my question here:
I know about Azure Storage limitations of single file size (which is up to 64 MB). Is there such limitation for uploading files to Windows Media Services as well? Do i need to send that file in 4 MB chunks?
If so, how can actually do that using REST API ? I can send chunked file to a regular storage account but, when it comes to WMS, things are a bit different. Basically, when dealing with WMS, i need to upload my file (or file blocks) to a specific temporary url and i'm not sure how to combine that with chunks and setting a block id etc. I also can't find any info about that on the internet.
Thanks in advance for any advice!
You didn't say which platform are you using to build your application (I'm guessing it's .net?)
According to MSDN, single file (blob) is not limited to 64 MB:
Each block can be a different size, up to a maximum of 4 MB. The maximum size for a block blob is 200 GB, and a block blob can include no more than 50,000 blocks. If you are writing a block blob that is no more than 64 MB in size, you can upload it in its entirety with a single write operation.
This means you can upload files (blobs) which are up to 200 GB. If file is smaller than 64 MB, you can upload it as one big chunk (block). If it's bigger than 64 MB, you will have to split it into smaller blocks (up to 4 MB each) and upload it that way.
Full disclosure: I wrote this blog post which explains how to build an async upload page which can upload files in chunks. It uses the Azure REST API to upload all the file blocks and the Windows Azure Media Services SDK to communicate with the Media Service and create locators (temporary URL's you mentioned) which are used to upload files.
There is quite a bit of code involved in making this work so I created a simple demo app (written in JS and .net) to go together with the post. If you're not doing this in .net, you will still be able to use the JS portion of the code, it's just that you'll need to obtain the upload locators through Azure REST API as well.
Very simplified, the upload/transcoding workflow goes something like this:
Obtain a temporary upload URL (locator) through the back-end of your application (to secure your Azure credentials)
Split a file into smaller chunks, upload them all to Azure storage on behalf of WAMS (Windows Azure Media Services) through the REST API while keeping track of all block ID's
Submit an XML containing all block ID's to the REST API
If needed, transcode videos (I used the WAMS SDK in the back-end of the app to create the encoding jobs) - each video will be a new separate Asset
Publish assets - get locators (URL's) for accessing the original and/or transcoded videos
I also recommend you read Gaurav Mantri's post about uploading large files to Azure storage. It explains a lot and is a very good read about this topic.
Related
I have created a meeting app with angular as front-end and ASP.NET Core as back-end. The application is working fine, but sometimes when I am trying to download the recorded file with servercallid, I am getting the error like Service Unavailable. But most of the time the file is generating and storing to blob storage. But I do not know, why is getting error sometimes. We did not get any proper exception or reason as a response.
Do we have any specific size or timeframe constrain for generating the recorded file?
As per the documentation recordings are only available to download for a limit amount of time:
Azure Communication Services provides short term media storage for recordings. Recordings will be available to download for 48 hours. After 48 hours, recordings will no longer be available.
There is a Javascript library where you can get the file upload progress. This however only shows the progress of the upload to the api - when the progress reaches 100%, the API method is entered. I am using Azure Storage to store the files uploaded, it takes about the same time for the method to upload the stream to the storage container. Is there a good way to relay progress back? I had a look at SignalR and I don't think this is what im looking for, anyone have any ideas of how to relay this information?
I do use background jobs a lot too and was considering turning the stream to a byte array then sending as a job, this worries me for large files tho.
Any pointers would be greatly appreciated
EDIT: Front end is separate to API
I have a scenario I think could be a fit for Service Fabric. I'm working on premises.
I read messages from a queue.
Each message contains details of a file that needs to be downloaded.
Files are downloaded and added to a zip archive.
There are 40,000 messages so that's 40,000 downloads and files.
I will add these to 40 zip archives, so that's 1000 files per archive.
Would Service Fabric be a good fit for this workload?
I plan to create a service that takes a message off the queue, downloads the file and saves it somewhere.
I'd then scale that service to have 100 instances.
Once all the files have been downloaded I'd kick of a different process to add the files to a zip archive.
Bonus if you can tell me a way to incorporate adding the files to a zip archive as part of the service
Would Service Fabric be a good fit for this workload?
If the usage will be just for downloading and compressing the files, I think it will be overkill to setup a cluster and manage it to sustain an application that is very simple. I think you could find many alternatives where you don't have to setup an environment just to keep your application running and processing a message from the queue.
I'd then scale that service to have 100 instances.
The number of instances does not mean the download will be faster, you have also to consider the the network limit, otherwise you will just end up with servers with idle CPU and Memory, where the network might be the bottleneck.
I plan to create a service that takes a message off the queue, downloads the file and saves it somewhere.
If you want to stick with service fabric and the queue approach, I would suggest this answer I gave a while ago:
Simulate 10,000 Azure IoT Hub Device connections from Azure Service Fabric cluster
The information is not exactly what you plan to do, but might give directions based on the scale you have and how to process a large number of messages from a queue(IoT Hub messaging is very similar to service bus).
For the other questions I would suggest create them on a separate topic.
Agree to Diego, using service fabric for this would be an overkill, and it wont be the best utilization of the resources, moreover this seems to be more of a disk extensive problem where you would need lots of storage to download those file and than compressing it in a zip. An idea can be to use azure functions as the computation seems minimal to me in your case. Download the file in azure file share and than upload to whatever storage you want (BLOB for example). This way you wont be using much of resources and you can scale the function and the azure file share according to your needs.
I know there are two methods available to upload files in AWS S3 (i.e. PutObject and TransferUtility.Upload). Can someone please explain which one to use?
FYI, I have files ranging from 1kb to 250MB.
Thanks in advance.
Amazon deprecated the S3 Transfer Manager and migrated to the new
Transfer Utility.The Transfer Utility is a simple interface for handling the most common uses of S3.It has a single constructor, which
requires an instance of AmazonS3Client. Working with it is so easy
and let the develpers perform all operations with less code.
Following are key features of using Transfer Utility over Transfer Manager
When uploading large files, TransferUtility uses multiple threads to
upload multiple parts of a single upload at once.When dealing with
large content sizes and high bandwidth, this can increase throughput
significantly.TransferUtility detects if a file is large and switches into
multipart upload mode. The multipart upload gives the benefit of
better performance as the parts can be uploaded simultaneously as
well, and if there is an error, only the individual part has to be
retried.
Mostly we people try to upload large files on S3 that take too much
time to upload,at those situations we required progress information
such as the total number of bytes transferred and remaining amount of
data to transfer.To track current progress of transfer with the
Transfer Manager, developers pass an S3ProgressListener callback to
upload or download, which periodically fires the method below.
Pausing transfers using the Transfer Manager is not possible with
stream based uploads or downloads.But Transfer Utility provide us
pause and resume option, it also has one single-file-based method for
uploads, and downloads.
transferUtility.upload(MY_BUCKET,OBJECT_KEY,FILE_TO_UPLOAD)
transferUtility.download(MY_BUCKET,OBJECT_KEY,FILE_TO_UPLOAD)
The Transfer Manager only requires the INTERNET permission. However,
since the Transfer Utility automatically detects network state and
pauses/resumes transfers based on the network state
pause functionality to the Transfer Utility is easy, since all transfers can be paused and resumed.A transfer is paused because of a loss of network connectivity, it will automatically be resumed and there is no action you need to take.Transfers that are automatically paused and waiting for network connectivity will have the state.Additionally, the Transfer Utility stores all of the metadata about transfers to the local SQLite database, so developers do not need to persist anything.
Note :
Every thing else is good.But Transfer Utility does not support a copy() API.To accomplish it use AmazonS3Client class copyObject() method.
Based in Amazon docs, I would stick with TransferUtility.Upload:
Provides a high level utility for managing transfers to and from Amazon S3.
TransferUtility provides a simple API for uploading content to and downloading content from Amazon S3. It makes extensive use of Amazon S3 multipart uploads to achieve enhanced throughput, performance, and reliability.
When uploading large files by specifying file paths instead of a stream, TransferUtility uses multiple threads to upload multiple parts of a single upload at once. When dealing with large content sizes and high bandwidth, this can increase throughput significantly.
But please be aware of possible concurrency issues and the recommendation about using BeginUpload (the asynchronous version), like in this related post
Speed and cost in mind.
Say I have a few JS and images files shared for multiple websites. that is not huge images files, this is only few static files like PNG sprites and common JS files.
I'm kind of lost on the choice :
- Should i keep it in my webpackage to release in Azure ?
- Or should i put these in blobs ?
The things I don't know is if i have a lot of hits on the blob solution, it might cost more than the hits on the IIS level of the package ?
Right, wrong ?
Edit : I realize storing JS files on the blob won't deliver it gziped ?
No need for the blobs that I can see. The database round trip isn't adding value. I'd just put the static content on the web server and let it serve it up. Let the web server handle compressing the bytes on the wire for those cases where the client indicates that they can handle GZIP compression.
Will your JS and image files be modified often? If so, putting them into the service package would mean that every time you want to update those files, you will have to recompile the service package and redeploy your instance. If you find yourself needing to update often, this will become cumbersome. From a speed perspective, you're not going to see too much of a difference between service them files up from the blogs or serving them up from the web role (assuming the files are in fact not huge). Last but not least, from a cost perspective, if you look at the cost of blob storage ($0.15 per GB stored per month, $0.01 per 10,000 storage transactions), its really not much. Your site would have to have a lot of traffic for the cost to become significant.