Pulling data asynchronously from third-party web service on Windows Azure Platform - azure

I want to pull large amount of data, frequently from different third party API web services and store it in a staging area (this is what I want to decide right now) from where it will be then moved one by one as required into my application's database.
I wanted to know that can I use Azure platform to achieve the above? How good is it to use Azure platform for this task?
What if the data to be pulled is of large amount and the frequency of the pull is high i.e. may be half-hourly or hourly for 2,000 different users?
I assume that if at all this is possible, then the bandwidth, data storage and server capability etc. will not be a thing to worry for me but for ©Microsoft. And obviously, I should be able to access the data back whenever I need it.
If I would have to implement it on Windows Servers, then I know that I would use a windows service to do this. But I don't know how it can be done for Windows Azure Platform if at all it is possible?

As Rinat stated, you can use Lokad's solution. If you choose to do it yourself, you can run a timed task in your worker role - maybe spawn a thread that sleeps, waking every 30 minutes to perform its task. It can then reach out to the Web Services in question (or maybe one thread per Web Service?) and fetch data. You can store it temporarily in Azure Table Storage, which is a fraction of the cost of SQL Azure (0.15 per GB), and then easily read it out of Table Storage on-demand and transfer to SQL Azure.
Assuming you host your services, storage and SQL Azure are in the same data center (by setting the affinity appropriately), you'd only pay for bandwidth when pulling data from the web service. There'd be no bandwidth charges to retrieve from Table Storage or insert into SQL Azure.

In Windows Azure that's usually Worker Role used to host the cloud processing. In order to accomplish your tasks you'll either need to implement this messaging/scheduling infrastructure yourself or use something like Lokad.Cloud or Lokad.CQRS open source projects for Azure.
We use Lokad.Cloud for distributed BI processing of hundreds of thousands of series and Lokad.CQRS allows to reliably retrieve and synchronize millions of products on schedule.
There are samples, docs and community in both projects to get you started.

Related

Is Azure Monitor a good store for custom application performance monitoring

We have legacy applications that currently write out various run time metrics (SQL calls run time, api / http request run times etc) to local SQL DB.
format:( source, event, data, executionduration)
We are moving away from storing those in local SQL DB, and are now publishing those same metrics to azure event hub.
Looking for a good place to store those metrics for the purpose of monitoring the health of the application. Simple solution would be to store in some DB and build custom application to visualize the data in custom ways.
We are also considering using Azure Monitor for this purpose via data collector API (https://learn.microsoft.com/en-us/azure/azure-monitor/platform/data-collector-api)
QUESTION: Are there any issues with azure monitor that would prevent us from achieving this type of health monitoring?
Details
each event is small (few hundred characters)
expecting ~ 10 million events per day
retention of 1-2 days is enough
ability to aggregate old events per source per event is important (to have historical run time information)
Thank you
You can do some simple graphs and with the Log Analytics query language, you can do just about any form of data analytics you need.
Here's a pretty good article on Monitor Visualizations.
learn.microsoft.com/en-us/azure/azure-monitor/log-query/charts

How to store (and query) the MaxMind GeoIP2 database in Azure?

In an Azure Web App I need to efficiently query the MaxMind GeoIP2 City Database (due to the volume of queries and the latency requirements we cannot use the MaxMind's rest API).
I'm wondering what's the best approach for storing the db (binary MMDB format, accessed via the official .NET api) so that it's easy to update with minimal downtime (we are going to subscribe Monthly updates) and still cost effective as to what regards Azure storage and transactions.
Apparently block blobs are the way to go, but I'm not sure about the monthly updates and the fact that the GeoIP2 api load in memory the whole db (I do not know if this would be a problem for the Web App, if I need a web worker to keep it up or I need something else), but actually I do not know yet how large the file is.
What's the most cost effective solution that preserve low latency over a huge volume?
According to the API docs you must have the database available in a file system (the API doesn't know anything about Azure storage and related REST API). So, regardless where you permanently store it, you'll need to have it on a disk somewhere.
I have no idea how large the database footprint is, but Web Apps, Cloud Services (web/worker roles) and Virtual Machines (whether Linux or Windows) all have local disks. And you have read/write access to these disks. So, you'd need to copy the database binary file (or csv) to local disk from somewhere. At this point, when you initialize the SDK, you'd create a DatabaseReader and point it to your locally-downloaded copy of the database file.
You mentioned storing the database in blob storage. There's nothing stopping you from doing so and simply downloading a copy to local disk. And there's nothing stopping you from storing multiple versions in multiple blobs. Note: You may also take advantage of Azure File storage (an SMB share). Which you choose is up to you.
As far as most cost effective solution: You'll need to do the pricing workup yourself to see what's most effective. You'd also need to evaluate how much RAM is available for the given size VM/role instance/Web App you choose. You mentioned Web Apps in your question: Web App instances scale from 0.5GB to 14GB, depending on the tier you choose (again, you'll need to evaluate this).

Azure Traffic Manager for Cloud Services - What about storage access?

I have finally got the time to start looking at Azure. It's looks good and easy scaling.
Azure SQL, Table Storage and Blog Storage should cover most of my things. Fast access to data, auto replication and failover to an other datacenter.
Should the idea come for an app that needs fast global access the Traffic manager is there and one can route users for "Fail Over" or "Performance".
The "performance" is very nice for Cloud Services and "Web Roles / Worker Roles" ... BUT ... What about access to data from SQL Azure/Table Storage/Blog Storage.
I have tried searching the web(for what to do about this need), but haven't found anything about the traffic manager that mentions anything about how to access data in such a scenario.
Have I missed anything?
Do people access the storage in the original data center (and if that fails use the Geo Replication feature)? Is that fast enough? Is internal traffic on the MS network free across datacenters?
This seems like such a simple ...
Take a look at the guidance by Microsoft: Replicating, Distributing, and Synchronizing Data. You could use the Service Bus to keep data centers in Sync. This can cover SQL Databases, Storage, search indexes like SolR, ElasticSearch, ... The advantage over solutions like SQL Data Sync is that it's technology independent and it can keep virtually all your data in sync:
In this episode of Channel 9 they state that Traffic Manager is only for Cloud Services as of now (Jan 2014) but support is coming for Azure Web Sites and other services. I agree that you should be able to ask for a Blob using a single global URL and expect that the content will be served from the closest datacenter.
There isn't a one-click easy to implement solution for this issue. The way you solve it will depend on where the data lives (ie. SQL Azure, Blob storage, etc) and your access patterns.
Do you have a small number of data requests that are not on a performance critical path in your code? Consider just using the main datacenter.
Do you have a large number of read-only type of requests? Consider doing a replication of the data to another datacenter.
Do you do a large number of read and only a few write operations? Consider duplicating the data among all datacenters and each write will write to all datacenters at the same time (incurring a perf penalty) and do all reads to the local datacenter (fast reads).
Is your data in SQL Azure? Consider using SQL Data Sync to keep multiple datacenters in sync.

Setting up Azure to Sync Contacts in Custom Program, Tasks and Pricing

We have our own application that stores contacts in an SQL database. What all is involved in getting up and running in the cloud so that each user of the application can have his own, private list of contacts, which will be synced with both his computer and his phone?
I am trying to get a feeling for what Azure might cost in this regard, but I am finding more abstract talk than I am concrete scenarios.
Let's say there are 1,000 users, and each user has 1,000 contacts that he keeps in his contacts book. No user can see the contacts set up by any other user. Syncing should occur any time the user changes his contact information.
Thanks.
While the Windows Azure Cloud Platform is not intended to compete directly with consumer-oriented services such as Dropbox, it is certainly intended as a platform for building applications that do that. So your particular use case is a good one for Windows Azure: creating a service for keeping contacts in sync, scalable across many users, scalable in the amount of data it holds, and so forth.
Making your solution is multi-tenant friendly (per comment from #BrentDaCodeMonkey) is key to cost-efficiency. Your data needs are for 1K users x 1K contacts/user = 1M contacts. If each contact is approx 1KB then we are talking about approx 1GB of storage.
Checking out the pricing calculator, the at-rest storage cost is $9.99/month for a Windows Azure SQL Database instance for 1GB (then $13.99 if you go up to 2GB, etc. - refer to calculator for add'l projections and current pricing).
Then you have data transmission (Bandwidth) charges. Though since the pricing calculator says "The first 5 GB of outbound data transfers per billing month are also free" you probably won't have any costs with current users, assuming moderate smarts in the sync.
This does not include the costs of your application. What is your application, how does it run, etc? Assuming there is a client-side component, (typically) this component cannot be trusted to have the database connection. This would therefore require a server-side component running that could serve as a gatekeeper for the database. (You also, usually, don't expose the database to all IP addresses - another motivation for channeling data through a server-side component.) This component will also cost money to operate. The costs are also in the pricing calculator - but if you chose to use a Windows Azure Web Site that could be free. An excellent approach might be the nifty ASP.NET Web API stack that has recently been released. Using the Web API, you can implement a nice REST API that your client application can access securely. Windows Azure Web Sites can host Web API endpoints. Check out the "reserved instance" capability too.
I would start out with Windows Azure Web Sites, but as my service grew in complexity/sophistication, check out the Windows Azure Cloud Service (as a more advance approach to building server-side components).

Storage Transaction Profiler for Windows Azure Web Deploy Accelerator

I've recently begun using the Web Deployment Accelerator for my Windows Azure account. It is providing an immediate return in time saved and is an excellent offering.
However since "everything" is now stored to Azure Storage rather to the regular E:Drive I am immediately seeing a cost consequence for using the tool.
In one day I have racked up a mighty 4 cent NZD charge. In order to do that I had to burn through about 80,000 storage transactions and frankly i cant figure where they all went.
I uploaded 6 sites that are very small wouldn't have more than 300 files each. So I'm wondering:
a. is there is a profiling tool for the Web Deployment Accelerator that will allow me to see where and how 80,000 storage transactions were used for such a small offering. Is it storage transaction intensive tool? Has any cost analysis been carried out in terms of how this tool operates? Has it been optimised with cost in mind?
b. If I'm using this tool do i pay for 2 storage transactions per http request to a site? As since the tool now writes the web server logs to table storage, that would be one storage request to pull the http request resource (img, script, etc) and a storage request to write the log entry as well would it not?
I'm not concerned about current charges I 'm concerned about the future if i start rolling all my hosted business into the cloud. I mean Im now being charged even just to "look" at my data right? If i list the contents of a storage folder using a tool like Azure Storage Explorer that's x number of storage transactions where x = number of files in the folder?
Not sure of a 3rd-party profiler tool, but Windows Azure Storage logging and metrics will give you very detailed info regarding both individual accesses and hourly rollups. It's pretty straightforward to enable, and the November 2011 SDK includes support for the API calls required for enabling. See here for an overview of what's offered for metrics and logging.
My team worked with Fullscale180 to build a storage library, Azure Store XRay, to demonstrate how to enable and query storage metrics and logging. Note: This was published before the SDK had logging and metrics support, so it uses the REST API calls instead. But that won't impact you if you try to use the library.
You can also look at another code demo, Cloud Ninja, which calls the XRay library for its metrics display (see here for running demo).
Regarding querying storage for objects in blob containers: that's not a 1:1 transaction:file scenario. You can specify the maximum number of blobs to return when listing items in a container. It's possible that all blobs are returned in one transaction. Of course, if you then grab each blob, each of these will be at least one transaction (depending on blob size). See here for details about listing blobs.

Resources