I'm trying to find a good way to send objects between the worker roles of two different azure applications. They are very simple objects(only about 20 properties on each one) and there could be as many as 10,000 of these objects transferred at a time. I'm thinking azure table storage would do the job, but I am not sure if theres anything else out there that would do a better job. I thought about serializing them and using blob storage, but Id like to know what the correct approach would be.
I think Azure Table Storage is an excellent system for that. If you need to manage which worker role processes individual items, you might consider using a Queue to communicate between the worker roles. Microsoft Patterns and Practices created the "CQRS Journey" set of documentation to formalize how you can manage the flow of data through your Azure services with multiple roles.
Related
Can you guys explain
Service Fabric can be packaged with MULTIPLE SERVICES to be shipped but then
how do you reuse some of these services into other Application?
Is there a way Reliable Dictionary or Reliable Queue may be shared among
services deployed on Same Cluster?
I tried reading on google but no clear understanding. Your help will be really appreciated.
... how do you reuse some of these services into other Application?
What do you mean with reuse? Sharing the code? You could have a service in Application A talk to a service in Application B instead of having the same service in Application A.
Is there a way Reliable Dictionary or Reliable Queue may be shared among services deployed on Same Cluster?
No there is not. A Reliable Dictionary or Reliable Queue provides data locality to a service removing the need for additional network calls. As soon as you need this same data for multiple services you should consider using other storage solutions like CosmosDB, Blob storage or another database.
If you are looking for some kind of distributed cache you can take a look at Azure Redis.
It is, however, entirely possible to expose the data of a Reliable Dictionary or Reliable Queue using a service. Then that service acts like a data provider / repository. You can expose methods like Add() or Delete() in such a service that results in an update of the Reliable Dictionary or Reliable Queue.
Iam relatively new to Cloud Computing and azure. I was wondering whether you can have more than one web and worker role in an Azure application. If so what advantages can I get using multiple roles and where do they apply?
Yes, you can have more than 1 web or worker role in an Azure Cloud Service. You can have up to 25 different roles per deployment I believe in any mix of Web and Worker roles. See the Azure Subscription and Service Limits, Quotas and Constraints link for more information.
The advantage of having the roles within the same cloud service is simply that within that cloud service they can see all the other roles and instances easily (unless you configure them otherwise). They will all be relatively close to each other within a data center because a cloud service is assigned to a stamp of machines and controlled by a Fabric Controller assigned to that stamp. You can watch this video by Mark Russinovich which sheds more light on the inner workings of Azure and talks a bit about stamps I think. A cloud service is a security boundary as well, so you get some benefits from that encapsulation if you need to do a lot of inter machine communication that ISN'T going across a queue for some reason.
The disadvantage of batching a whole bunch of roles together is that they are tied pretty closely together at that point. You can certainly scale them separately, and you can do updates that target only a single role at a time. However, if you want to deploy changes to multiple roles you may end up having to do a full deployment to all roles (even those that haven't changed) or do updates to single roles one at a time until all the ones you need updated are, which can take some time. Of course, it could be argued that having them in separate cloud services would still have you doing updates concurrently depending on your architecture and/or dependencies.
My suggestion is to group only roles that REALLY belong together in the same solution. These are role that have workloads that are interrelated. Even then, there's nothing stopping you from separating these as well into separate deployments (though you may benefit from the security boundaries that being within the same cloud service). Think about how each role will be updated, and if they would generally be updated together or not. There are many factors in thinking about how to package roles together.
We have an upcoming project where we'll need to integrate with 3rd parties over a variety of transports to get data from them.
Things like WCF Endpoints & Web API Rest Endpoints are fine.
However in 2 scenario's we'll need to either pick up auto-generated emails containing xml from a pop3 account OR pull the xml files from an External SFTP account.
I'm about to start prototyping these now, but I'm wondering are there any standard practices, patterns or guidelines about how to deal with these non-transactional systems, in a multi-instance worker role environment. i.e.
What happens if 2 workers connect to the pop account at the same time or the same FTP at the same time.
What happens if 1 worker deletes the file from the FTP while another is in mid-download.
Controlling duplication shouldn't be an issue, as we'll be logging everything on application side to a database, and everything should be uniquely identifiable so we'll be able to add if-not-exists-create-else-skip logic to the workers but I'm just wondering is there anything else I should be considering to make it more resilient/idempotent.
Just thinking out loud, since the data is primarily files and emails one possible thing you could do is instead of directly processing them via your worker roles first thing you do is save them in blob storage. So there would be some worker role instances which will periodically poll the POP3 server / SFTP site and pull the data from the there and push them in blob storage. When the blob is written, same instance can delete the data from the source as well. With this approach you don't have to worry about duplicate records because blob will be overwritten (assuming each message/file has a unique identifier and the name of the blob is that identifier).
Once the file is in your blob storage, you can write a message in a Windows Azure Queue which has details about this blob (may be blob URL etc.). Then using 'Get' semantics of Windows Azure Queues, your worker role instances start fetching and processing these messages. Because of Get semantic, once a message is fetched from the queue it becomes invisible to other callers (worker roles instances in this case). This way you could take care of duplicate message processing.
UPDATE
So I'm trying to combat against two competing instances pulling the same file at the same moment from the SFTP
For this, I'll pitch my favorite Master/Slave Concept:). Essentially the idea is that each instance will try to acquire a lease on a single blob. The instance which acquires the lease becomes the master and others slave. Master would fetch the data from SFTP while slaves will wait. I've described this concept in my blog post which you can read here: http://gauravmantri.com/2013/01/23/building-a-simple-task-scheduler-in-windows-azure/, though the context of the blog is somewhat different.
have a look the recently released Cloud Design Patterns. you might be able to find the corresponding pattern and sample code for what you need.
I have an ASP.NET MVC 2 Azure application that I am trying to switch from being single tenant to multi-tenant. I have been reviewing many blogs and posts and questions here on Stack Overflow, but am still trying to wrap my head around the specifics of what's right for this particular app.
Currently the application stores some information in a SQL Azure database, as well as some other info in an Azure Storage Account. I'm considering writing the tenant provisioning code to simply create a new database for a new tenant, along with a new azure storage account. This brings me to the following question:
How will I go about testing this approach locally? As far as I can tell, the local Azure Storage Emulator only has 1 storage account. I'm not sure if I'm able to create others locally. How will I be able to test this locally? Or will it be possible?
There are many aspects to consider with multitenancy, one of which is data architecture. You also have billing, performance, security and so forth.
Regarding data architecture, let's first explore SQL storage. You have the following options available to you: add a CustomerID (or other identifyer) that your code will use to filter records, use different schema containers for different customers (each customer has its own copy of all the database objects owned by a dedicated schema in a database), linear sharding (in which each customer has its own database) and Federation (a feature of SQL Azure that offers progressive sharding based on performance and scalability needs). All these options are valid, but have different implications on performance, scalability, security, maintenance (such as backups), cost and of course database design. I couldn't tell you which one to choose based on the information you provided; some models are easier to implement than others if you already have a code base. Generally speaking a linear shard is the simplest model and provides strong customer isolation, but perhaps the most expensive of all. A schema-based separation is not too hard, but requires a good handle on security requirements and can introduce cross-customer performance issues because this approach is not shared-nothing (for customers on the same database). Finally Federations requires the use of a customer identifyer and has a few limitations; however this technology gives you more control over performance distribution and long-term scalability (because like a linear shard, Federation uses a shared-nothing architecture).
Regarding storage accounts, using different storage accounts per customer is definitively the way to go. The primary issue you will face if you don't use separate storage accounts is performance limitations, such as the maximum number of transactions per second that can be executed using a single storage account. As you are pointing out however, testing locally may be a problem; however consider this: the local emulator does not offer 100% parity with an Azure Storage Account (some functions are not supported in the emulator). So I would only use the local emulator for initial development and troubleshooting. Any serious testing, including multitenant testing, should be done using real storage accounts. This is the only way you can fully test an application.
You should consider not creating separate databases, but instead creating different object namespaces within a single SQL database. Each tenant can have their own set of tables.
Depending on how you are using storage, you can create separate storage containers or message queues per client.
Given these constraints you should be able to test locally with the storage emulator and local SQL instance.
Please let me know if you need further explanation.
According to MSDN, an azure service can conatins any number of worker roles. According to my knowledge a worker role can be recycled at any time by Windows Azure Fabric. If it is the true, then:
Worker role should be state less OR
Worker role should persist its state to Windows Azure storage services.
But i want to make a service which conatains client data and do not want to use Azure storage service. How I can accomplish this?
The velocity (whatever it is called) component of AppFabric is a distributed cache and can be used in these situations.
Azure's web and compute roles are stateless means all its local data is volatile and if you want to maintain the state you need to use some external resource to maintain that state and logic in your app to handle that. For simplicity you can use Azure drive but again internally its a blob storage.
You can write to local storage on the worker role by using the standard file IO APIs - but this will be erased upon instance shutdown.
You could also use SQL Azure, or post your data off to another storage service by HTTP (e.g. Amazon S3, or your own server).
However, this is likely to have performance implications. Depending on how much data you'll be storing, how frequently, and how big it is, you might be better off with Azure Storage!
Why don't you want to use Azure Storage?
If the data could be stored in Azure you have a good number of choices: Azure distributed cache, SQL Azure, blob, table, queue, or Azure Drive. It sounds like you need persistence, but can't use any of these Azure storage mechanisms. If data security is the problem, could you encrypt/hashing the data? Understanding why would be useful.
One alternative might be not persist at all, by chaining/nesting synchronous web service calls together, thus achieving reliable messaging.
Another might be to use Azure Connect to domain join Azure compute resource to your local data centre (if you have one), and use you on-premise storage.