I am evaluating Windows Azure for my photo / video management software which will have functionality of
1) uploading the photos / videos, Tagging, album creation etc
2) Live streaming of content from server
3) Download of content from server.
CDN and AppFabric Cache will definitely be helpful here. Can anyone please let me know if there are some built in components / off the shelf components / specific design patterns of Azure which can facilitate the fast development e.g. if there is something else which can help in fast streaming of content, it will definitely be helpful.
Thanks.
As you've noted, the CDN and caching will definitelly help you. However I would mostly look at the CDN. I would use caching for relatively small chunks of data (such as any data-base driven lists of details, i.e. list of cities, or countries), or a slowly-changing data. I would not put large media content in AppFabric Cache.
As for leveraging blobs/CDN for streaming, you may want to check this example.
UPDATE
Well, will you have some photo/video manipulation on server side? Or what people (users or admins) upload, that is served to the users?
If not, than there is nothing additional.
If you, however will have some image,video processing on the server side I suggest that you split your app into a WebRole (for users upload/download/stream) and a Worker Role (for processing). You can check out this lab to get understanding of how to decouple web from worker role and how to submit work items to the worker.
The reason to have separate worker role for processing is to be able to independenatly scale either web or worker on demand.
Related
Context
I have an mobile app that provides our users with the possibility to capture the name plate of our products automatically. For this I use the Azure Cognitive Services OCR service.
I am a bit worried that customers might capture pictures of insufficient quality or of the wrong area of the product (where no name plate is). To analyse whether this is the case it would be handy to have a copy of the captured pictures so we can learn what went well or what went wrong.
Question
Is it possible to not only process an uploaded picture but to also store it in Azure Storage so that I can analyse it in a later point in time?
What I've tried so far
I configured the Diagnostic settings in a way that the logs and metrics are stored into Azure Storage. As it is called, this is only logs and metrics and not the actual image. So this does not solve my issue.
Remarks
I know that I can manually implement that in the app but I think it would be better if I have to upload
the picture only once.
I'm aware that there are data protection considerations that must be made.
No, you can't add an automatic logging based only on OCR operation, you have to implement it.
But to avoid uploading it twice as you said, you could create your logic on server side, but sending the image to your api and in the api, get the image and send it to OCR while storing it in parallel.
But I guess that based on your question, you might not have any server side things in your app?
Background
We are looking at porting a 'monolithic' 3 tier Web app to a microservices architecture. The web app displays listings to a consumer (think Craiglist).
The backend consists of a REST API that calls into a SQL DB and returns JSON for a SPA app to build a UI (there's also a mobile app). Data is written to the SQL DB via background services (ftp + worker roles). There's also some pages that allow writes by the user.
Information required:
I'm trying to figure out how (if at all), Azure Service Fabric would be a good fit for a microservices architecture in my scenario. I know the pros/cons of microservices vs monolith, but i'm trying to figure out the application of various microservice programming models to our current architecture.
Questions
Is Azure Service Fabric a good fit for this? If not, other recommendations? Currently i'm leaning towards a bunch of OWIN-based .NET web sites, split up by area/service, each hosted on their own machine and tied together by an API gateway.
Which Service Fabric programming model would i go for? Stateless services with their own backing DB? I can't see how Stateful or Actor model would help here.
If i went with Stateful services/Actor, how would i go about updating data as part of a maintenance/ad-hoc admin request? Traditionally we would simply login to the DB and update the data, and the API would return the new data - but if it's persisted in-memory/across nodes in a cluster, how would we update it? Would i have to expose this all via methods on the service? Similarly, how would I import my existing SQL data into a stateful service?
For Stateful services/actor model, how can I 'see' the data visually, with an object Explorer/UI. Our data is our Gold, and I'm concerned of the lack of control/visibility of it in the reliable services models
Basically, is there some documentation on the decision path towards which programming model to go for? I could model a "listing" as an Actor, and have millions of those - sure, but i could also have a Stateful service that stores the listing locally, and i could also have a Stateless service that fetches it from the DB. How does one decide as to which is the best approach, for a given use case?
Thanks.
What is it about your current setup that isn't meeting your requirements? What do you hope to gain from a more complex architecture?
Microservices aren't a magic bullet. You mainly get four benefits:
You can scale and distribute pieces of your overall system independently. Service Fabric has very sophisticated tools and advanced capabilities for this.
You can deploy and upgrade pieces of your overall system independently. Service Fabric again has advanced capabilities for this.
You can have a polyglot system - each service can be written in a different language/platform.
You can use conflicting dependencies - each service can have its own set of dependencies, like different framework versions.
All of this comes at a cost and introduces complexity and new ways your system can fail. For example: your fast, compile-time checked in-proc method calls now become slow (by comparison to an in-proc function call) failure-prone network calls. And these are not specific to Service Fabric, btw, this is just what happens you go from in-proc method calls to cross-machine I/O - doesn't matter what platform you use. The decision path here is a pro/con list specific to your application and your requirements.
To answer your Service Fabric questions specifically:
Which programming model do you go for? Start with stateless services with ASP.NET Core. It's going to be the simplest translation of your current architecture that doesn't require mucking around with your data layer.
Stateful has a lot of great uses, but it's not necessarily a replacement for your RDBMS. A good place to start is hot data that can be stored in simple key-value pairs, is accessed frequently and needs to be low-latency (you get local reads!), and doesn't need to be datamined. Some examples include user session state, cache data, a "snapshot" of the most recent items in a data stream (like the most recent stock quote in a stream of stock quotes).
Currently the only way to see or query your data is programmatically directly against the Reliable Collection APIs. There is no viewer or "management studio" tool. You have to write (and secure) an API in each service that can display and query data.
Finally, the actor model is a very niche model. It serves specific purposes but if you just treat it as a data store it will not work for you. Like in your example, a listing per actor probably wouldn't work because you can't query across that list, or even have multiple users reading the same listing simultaneously.
I would like to gather some metrics about usage for an Electron-based cross-platform desktop app. This would consist of basic information on the user's environment (OS, screen size, etc) as well as the ability to track usage, for example track how many times the app is opened or specific actions within the app.
These metrics should be sent to an analytics server, so they can be viewed in aggregate. Ideally I could host the server-side component myself, but would certainly consider a solution hosted by a third party.
There are various analytics solutions for the web (Google Analytics, Piwik), and for mobile apps, as well as solutions for Node.js server-side apps. Is it feasible to adapt one of these solutions for desktop Electron-based apps? How? Or are there any good analytics solutions specifically designed for use with desktop apps which work with Electron / javascript?
Unlike a typical webpage, the user might be using the app offline, so offline actions should be recorded, queued, and sent later when the user comes online. A desktop app is typically loading pages from the file system, not HTTP, so the solution needs to be able to cope with that.
Unlike a Node.js server-side application, there could be a large number of clients rather than just a single (or a few) server instances. Analytics for a desktop app would be user-centric, whereas a server-side Node.js app might not be.
Ease of setup is also a big factor - an ideal solution would just have a few lines of configuration to gather basic metrics, then could be extended as necessary with custom actions/events.
The easiest thing will be to use Google Analytics or a similar offering.
For most you'll have two major issues to solve over hosting on a website:
Electron does not store cookies or state between runs. You have to store this manually
Most analytics libraries ignore file: urls so that they only get hits from the internet
Use an existing library and most of these issues will already be solved for you.
I'm currently in the planning stages of a new project which is composed of a storefront, a highly reactive user dashboard, and the individual products being offered via the storefront being highly interactive mini-apps. We're trying to get away with making the entire platform a SPA and design the entire thing on a Flux architecture with React for the front-end views.
One issue, as with most SPAs, is SEO. I've prototyped an isomorphic solution based on the este.js dev stack. One issue is that our app consumes pretty much all of its data from a RESTful server, which is separate from the web server serving up the SPA. This means that the web server would need to fetch a considerable amount of data from the RESTful server, to isomorphically generate the HTML snapshot.
I've considered having a separate crawler process of my own crawl the entire storefront periodically and isomorphically generate HTML snapshots of the pages that could be served up when the web server encounters a search engine crawler. I'm not sure if this is a good approach though, as it would likely introduce additional maintenance and, frankly, seems a bit fragile. I could just have the web server isomorphically generate the HTML on the fly, but I fear bogging the server down for ordinary users as the server would be pulling considerable data from the REST API...
Is there a better way to handle such a case?
Check out Yahoo's Fetchr, an open-source library that allows you to isomorphically hit your API. It ties into Facebook's Flux architecture, so you need to have Stores, but at the very least you can glean some code and concepts from it. If you're in the planning phase, you might even consider going with the Flux or Fluxible.
http://fluxible.io/guides/data-services.html
https://github.com/yahoo/fetchr
I am designing a system that's going to have about 10 millions+ users, each has a photo, which is about 1~2 MB.
We are going to deploy both database and web app using Microsoft Azure
I am wondering the way I should store the photos, there are currently two options,
1, Store all photos use Sql Server FileStream
2, Use File Server
I haven't experienced such large scale BLOB data using FileStream.
Can anybody give my any suggestion? The Cons and Pros?
And anyone with Microsoft Azure experiences concerning the large photos store is really appreciated!
Thx
Ryan.
I vote for neither. Use Windows Azure Blob storage. Simple REST API, $0.15/GB/month. You can even serve the images directly from there, if you make them public (like <img src="http://myaccount.blob.core.windows.net/container/image.jpg" />), meaning you don't have to funnel them through your web app.
Database is almost always a horrible choice for any large-scale binary storage needs. Database is best for relational-only systems, and instead, provide references in your database to the actual storage location. There's a few factors you should consider:
Cost - SQL Azure costs quite a lot per GB of storage, and has small storage limitations (50GB per database), both of which make it a poor choice for binary data. Windows Azure Blob storage is vastly cheaper for serving up binary objects (though has a bit more complicated pricing system, still vastly cheaper per GB).
Throughput - SQL Azure has pretty good throughput, as it can scale well, however, Windows Azure Blog storage has even greater throughput as it can scale to any number of nodes.
Content Delivery Network - A feature not available to SQL Azure (though a complex, custom wrapper could be created), but can easily be setup within minutes to piggy-back off your Windows Azure Blob storage to provide limitless bandwidth to your end-users, so you never have to worry about your binary objects being a bottleneck in your system. CDN costs are similar to that of Blob storage, but you can find all that stuff here: http://www.microsoft.com/windowsazure/pricing/#windows
In other words, no reason not to go with Blob storage. It is simple to use, cost effective, and will scale to any needs.
I can't speak on anything Azure related but for my money the biggest advantage of using FILESTREAM is that that data can get backed up inside the normal SQL Server backup process. The size of the data that you are talking about also suggests that FILESTREAM may be a good choice as well.
I've worked on a SCM system with a RDBMS back end and one of our big decisions was whether to store the file deltas on the file system or inside the DB itself. Because it was cross-RDBMS we had to cook up a generic non-FILESTREAM way of doing it but the ability to do a single shot backup sold us.
FILESTREAM is a horrible option for storing images. I'm surprised MS ever promoted it.
We're currently using it for our images on our website. Mainly the user generated images and any CMS related stuff that admins create. The decision to use FILESTREAM was made before I started. The biggest issue is related to serving the images up. You better have a CDN sitting in front. If not, plan on your system coming to a screeching halt. Of course, most sites have a CDN, but you don't want to be at the mercy of that service going down meaning your system will get overloaded. The amount of stress put on your sql server is the main problem here.
In terms of ease of backup. Your tradeoff there is that your db is MUCH MUCH LARGER and, therefore, the backup takes longer. Potentially, much longer and the system runs slower during the backup. Not to mention, moving backups around takes longer (i.e., restoring prod data in a dev environment or on local machines for dev purposes). Don't use this as a deciding factor.
Most cloud services have automatic redundancy of any files that you store on their system (i.e., aws's S3 and azure's blob). If you're on premise, just make sure you use a shared location for the images and make sure that location is backed up. I think the best option is to set it up so each image (other UGC file types too) has an entry in your db with a path to that file. Going one step further, separate the root path into a config setting and only store the remaining path with the entry. For example, root path in config might be a base url, a shared drive or virtual dir, or a blank entry. Then your entry might have "/files/images/image.jpg". This way, if you move your filestore, you can just update the root config. I would also suggest creating a FileStoreProvider interface (Singleton) that can be used for managing (saving, deleting, updating) these files. This way, if you switch between AWS, Azure, or on premise, you can just create a new Provider.
I have a client server DB, i manage many files (doc, txt, pdf, ...) and all of them go in a filestream BLOB. Customers has 50+ MB dbs. If in azure you can do the same go for it. Having all in the db is a wonderful thing. It is considered good policy also for Postgres and MySQL