What's the recommended ElasticSearch deployment on Windows Azure? - azure

Bearing in mind that the ElasticSearch-Zookeeper plugin doesn't support v0.90 release.
With unicast, what's your strategy on updating your list of IPs? I.e. when upgrading/scaling-up/down.
What client-side connectivity (from web/worker role) to the cluster? Do you:
a) implement your own round-robin/failover implementation across all nodes in the cluster
b) spin up a local (non-data/non-master) elasticsearch process on the client machine that joins the unicast cluster. The application will only connect to localhost
Where do you store your data? Azure blob gateway?
Can you share your detailed story on your ElasticSearch experience on azure, and any particular points/issues to watch out for?
Cheers

Just a note about this. We are on the way on releasing Azure plugin for Elasticsearch. It will help to allow automatic discovery of your Elasticsearch nodes. I think that we will have something public in the next weeks.
Also, I recommend to use local storage. Azure blob will be used in the future to allow snapshots (and restore) feature when Elasticsearch 1.0 will be out.
Hope this helps
Update: Plugin is now available here: https://github.com/elasticsearch/elasticsearch-cloud-azure

It's in no way certified by ElasticSearch, but I wrote a blog post about my experience running ES on Azure: http://thomasardal.com/running-elasticsearch-in-a-cluster-on-azure/

Related

How can I run multiple SQL Server containers in Azure and make sure data is replicated across them?

title describes pretty much what we are trying to accomplish in our organization.
We have a very database intensive application, and our single SQL Server machine is struggling.
We are reading articles about Azure, Docker and Kubernetes but we are afraid of trying these technologies.
Our problem is data replication.
How can we have scalability here? If we have three different SQL Server instances in three different containers, How does data get replicated across them? (meaning, user inserts a new product into a shared library, other user accessing a different node/container should be able to see that product).
Maybe we don't need containers at all and Azure provides another way to scale databases?
We really appreciate any help from you guys.
Regards, Cris.
I would advise against trying to run your databases on K8s. Kubernetes Containers should generally be stateless application, and were not designed for persistent data storage. Azure provides a Database as a Service, which will be able to scale appropriately with your needs(Azure Pricing for Cloud SQL
We once experimented with running our Postgres DB inside of a Kubernetes pod, but I was terrified to change anything. Not worth it, and not what the system was designed for.
If you are really really committed to this path, you can check out MySQL NDB ClusterMySQL for distributed environments. It should be adaptable to the Kubernetes paradigm.

Cheapest way to host MongoDB on Azure

We have been developing a RESTful web api using node and MongoDB. For hosting options, we decided to use Azure through BizSpark. We used DocumentDB with protocol support for MongoDB.
The problem now is DocumentDB is consuming all the credit causing a downtime and we haven't started making money yet. We are now considering switching from DocumentDB to MongoDB. The question now becomes, what is the cheapest way to host MongoDB on Azure?
So far on our research, we have found two options:
Using a VM (Linux or Windows)
Using a worker role
Please advice if there are other options, and how easy can it be to switch between these options at a later stage.
You can use the Azure calculator to get estimates between DocumentDB and a VM with the settings your company needs to see which one is cheaper.
If you are using Bizspark, remember that you have 5 accounts in which you can distribute all your costs to optimize in a better way.
Personal recommendations(subjective view):
Remember that if you are using the PAAS solution(DocumentDB) you
get full functionality out of the box, you don't have to set it up
and you can escale it very easily and plug in to very powerful tools
like PowerBi out of the box.
In the case of IAAS solution(vms) you have to install, mantain and
create all the connection settings for this to work. If you want to
scale you have to me more dedicated, since you have to scale it
through the use of more vms, traffic managers and more robust
architecture. If this is the path you are taking I would recommend
using containers like Docker inside the VM and their power to
manage this.

ArangoDB managed services

I am doing a research about some needs in a database and I really liked ArangoDB, The only issue is that I couldn't find any managed services or managed hosts for ArangoDB.
For an example in Amazon AWS services the RDS allows us to easily to scale up, without worrying about the clustering and configuration.
Is there any service that can manage this for me, or should I manage this myself?
You may start an arangodb cluster on AWS with Mesosphere DC/OS. The cluster is fully managed and can be scaled as you go. It is documented here:
https://docs.arangodb.com/3.2/Manual/Deployment/Mesos.html

Achieving MasterData deduplication on Azure

I am looking at achieving Master Data deduplication based on match percentages in AzureDB...was looking at something equivalent to Master Data Services/ DQS (Data Quality Services) in SQL Server2012
https://channel9.msdn.com/posts/SQL11UPD05-REC-06
Broadly looking for controls on match rules (exact, close match etc), handle dependencies and audit trail(undo capability etc)
I reckon this must be available in Azure cloud, if this is made available in SQL Server. Could you pls point me to how I get this done on AzureDB
Please note- I am NOT looking for data Sources like MelissaDAta, D&B that are listed on the Azure marketplace
Master Data Services is not just a database process: it also centrally involves a website component, which still (as of 2021) requires some Windows server running IIS.
This can be an Azure Virtual Machine (link to documentation) but there is no serverless offering for this at this time.
The database itself can be hosted on an Azure SQL Managed Instance (link to documentation) but not on a standalone Azure SQL DB, as far as I can tell. This is presumably because some of the essential components of MDS sit outside the database, much like other services like SSIS are more than just a database.
Data Quality Services is a similar story: it uses three databases (link to documentation) and seemingly some components outside the databases, so wouldn't be possible to deploy in standalone Azure SQL DBs. It may be possible to run on a Managed Instance, I couldn't find a clear answer to that. And again, there is no fully-serverless offering at this time.
Of course, all of this can easily be run via IaaS (Infrastructure as a Service) using an Azure virtual machine running SQL Server.

Is it possible to deploy an application using cassandra database on Windows Azure?

I recently got a trial version of Windows Azure and wanted to know if there is any way I can deploy an application using Cassandra.
I can't speak specifically to Cassandra working or not in Azure unfortuantly. That's likely a question for that product's development team.
But the challenge you'll face with this, mySQL, or any other role hosted database is persistence. Azure Roles are in and of themselves not persistent so whatever back end store Cassandra is using would need to be placed onto soemthing like an Azure Drive (which is persisted to Azure Blob Storage). However, this would limit the scalability of the solution.
Basically, you run Cassandra as a worker role in Azure. Then, you can mount an Azure drive when a worker starts up and unmount when it shuts down.
This provides some insight re: how to use Cassandra on Azure: http://things.smarx.com/#Run Cassandra
Some help w/ Azure drives: http://azurescope.cloudapp.net/CodeSamples/cs/792ce345-256b-4230-a62f-903f79c63a67/
This should not limit your scalability at all. Just spin up another Cassandra instance whenever processing throughput or contiguous storage become an issue.
You might want to check out AppHarbor. AppHarbor is a .Net PaaS built on top of Amazon. It gives users the portability and infrastructure of Amazon and they provide a number of the rich services that Azure offers such as background tasks & load balancing plus some that it doesn't like 3rd party add-ons, dead-simple deployment and more. They already have add-ons for CouchDB, MongoDB and Redis if Cassandra got high enough on the requested features I'm sure they could set it up.

Resources