Replication in couchDB shows high traffic on cloudant - couchdb

I've got an application that has got a couchDB as database. The database (Let's call it x.cloudant.com) is served on cloudant.com.
During development, I changed the account from x.cloudant.com to y.cloudant.com. After that (it may or may not have something to do with switching to the new account.) I got problems:
The traffic on cloudant.com spiked up really high (on one day I had
like over 700k requests). Mostly they were light HTTP requests
(GETs, HEADs). The user agent was from couchDB1.5 or couchDB1.6
Looking in the couchDB logs, I would get a lot of error messages: 500 HTTP messages, replication was canceled sind the database was shutdown, it couldn't connect to the database
My application would still sometimes connect to the old account (x.cloudant.com) even though I switched the account over a week ago. Meaning alongside replicating the data on the new account (y.cloudant.com) it also tries to replicate the data on my old account (x.cloudant.com).
My replication settings are the default settings. I want to reduce the amount of traffic on cloudant.con. Has anyone experienced the same issues? How did you solve it?

Related

Why is Azure MySQL database unresponsive at first

I have recently setup an 'Azure Database for MySQL flexible server' using the burstable tier. The database is queried by a React frontend via a node.js api; which each run on their own seperate Azure app services.
I've noticed that when I come to the app first thing in the morning, there is a delay before database queries complete. The React app is clearly running when I first come to it, which is serving the html front-end with no delays, but queries to the database do not return any data for maybe 15-30 seconds, like it is warming up. After this initial slow performance though, it then runs with no delays.
The database contains about 10 records at the moment, and 5 tables, so it's tiny.
This delay could conceivably be due to some delay with the node.js server, but as the React server is running on the same type of infrastructure (an app service), configured in the same way, and is immediately available when I go to its URL, I don't think this is the issue. I also have no such delays in my dev environment which runs on my local PC.
I therefore suspect there is some delay with the database server, but I'm not sure how to troubleshoot. Before I dive down that rabbit hole though, I was wondering whether a delay when you first start querying a database (after, say, 12 hours of inactivity) is simply a characteristic of the burtsable tier on Azure?
There may be more factors affecting this (see comments from people on my original question), but my solution has been to set two global variables which cache data, improving initial load times. The following should be set to ON in the Azure config:
'innodb_buffer_pool_dump_at_shutdown'
'innodb_buffer_pool_load_at_startup'
This is explained further in the following best practices documentation: https://learn.microsoft.com/en-us/azure/mysql/single-server/concept-performance-best-practices in the section marked 'Use InnoDB buffer pool Warmup'

Beanstalkd / Pheanstalk security issue

I have just started using beanstalkd and pheanstalk and I am curious whether the following situation is a security issue (and if not, why not?):
When designing a queue that will contain jobs for an eventual worker script to pick up and preform SQL database queries, I asked a friend what I could do to prevent an online user from going into port 11300 of my server, and inserting a job into the queue himself and hence causing the job to be executed with malicious code. I was told that I could include a password inside the job being sent.
Though after some time passed, I recognized that someone could preform a few simple commands on a terminal and obtain the job inside the queue, and hence find the password, and then create jobs with the password included:
telnet thewebsitesipaddress 11300 //creating a telnet connection
list-tubes //finding which tubes are currently being used
use a_tube_found //using one of the tubes found
peek-ready //see whats inside one of the jobs and find the password
What could be done to make sure this does not happen and my queue doesn't get hacked / controlled?
Thanks in advance!
You can avoid those situations by placing beanstalkd behind a firewall or in a private network.
DigitalOcean (for example) offers such a service where you have a private network IP address which can be accessed only from servers of the same location.
We've been using beanstalkd in our company for more than a year, and we haven't had any of those issues yet.
I see, but what if the producer was a page called index.php, where when someone entered it, a job would be sent to the queue. In this situation, wouldn't the server have to be an open network?
The browser has no way to get in contact with the job server, it only access the resources /you/ allow them to, that is the view page. Only the back-end is allowed to access the job server. Also, if you build the web application in a certain way that the front-end is separated from the back-end, you're going to have even less potential security issues.

Azure WebSites / App Service Unexplained 502 errors

We have a stateless (with shared Azure Redis Cache) WebApp that we would like to automatically scale via the Azure auto-scale service. When I activate the auto-scale-out, or even when I activate 3 fixed instances for the WebApp, I get the opposite effect: response times increase exponentially or I get Http 502 errors.
This happens whether I use our configured traffic manager url (which worked fine for months with single instances) or the native url (.azurewebsites.net). Could this have something to do with the traffic manager? If so, where can I find info on this combination (having searched)? And how do I properly leverage auto-scale with traffic-manager failovers/perf? I have tried putting the traffic manager in both failover and performance mode with no evident effect. I can gladdly provide links via private channels.
UPDATE: We have reproduced the situation now the "other way around": On the account where we were getting the frequent 5XX errors, we have removed all load balanced servers (only one server per app now) and the problem disappeared. And, on the other account, we started to balance across 3 servers (no traffic manager configured) and soon got the frequent 502 and 503 show stoppers.
Related hypothesis here: https://ask.auth0.com/t/health-checks-response-with-500-http-status/446/8
Possibly the cause? Any takers?
UPDATE
After reverting all WebApps to single instances to rule out any relationship to load balancing, things ran fine for a while. Then the same "502" behavior reappeared across all servers for a period of approx. 15 min on 04.Jan.16 , then disappeared again.
UPDATE
Problem reoccurred for a period of 10 min at 12.55 UTC/GMT on 08.Jan.16 and then disappeared again after a few min. Checking logfiles now for more info.
UPDATE
Problem reoccurred for a period of 90 min at roughly 11.00 UTC/GMT on 19.Jan.16 also on .scm. page. This is the "reference-client" Web App on the account with a Web App named "dummy1015". "502 - Web server received an invalid response while acting as a gateway or proxy server."
I don't think Traffic Manager is the issue here. Since Traffic Manager works at the DNS level, it cannot be the source of the 5XX errors you are seeing. To confirm, I suggest the following:
Check if the increased response times are coming from the DNS lookup or from the web request.
Introduce Traffic Manager whilst keeping your single instance / non-load-balanced set up, and confirm that the problem does not re-appear
This will help confirm if the issue relates to Traffic Manager or some other aspect of the load-balancing.
Regards,
Jonathan Tuliani
Program Manager
Azure Networking - DNS and Traffic Manager

High amount of http read timeouts on azure

When we migrated our apps to azure from rackspace, we saw almost 50% of http requests getting read timeouts.
We tried placing the client both inside and outside azure with the same results. The client in this case is also a server btw, so no geographic/browser issues either.
We even tried increasing the size of the box to ensure azure wasn't throttling. But even using D boxes for a single request, the result was the same.
Once we moved out apps out of azure they started functioning properly again.
Each query was done directly on an instance using a public ip, so no load balancer issues either.
Almost 50% of queries ran into this issue. The timeout was set to 15 minutes.
Region was US East 2
Having 50% of HTTP requests timing out is not normal behavior. This is why you need to analyze what is causing those timeouts by validating the requests are hitting your VM. For this, I would recommend you running a packet capture on your server and analyze response times, as well as look for high number of retransmissions; it is even better if you can take a simultaneous network trace on your clients machines so you can do TCP sequence number analysis and compare packets sent vs received. 
If you are seeing high latencies in the packet capture or high number of retransmissions, it requires detailed analysis. I strongly suggest you to open a support incident so Microsoft support can help you investigate your issue further.

Is there a limit on the number of sessions for Azure Web SQL Database?

We are using the Azure SQL Database (Web Edition) for a MVC3 ASP.NET/EF5 application.
Is there a limit to the number of sessions that this SQL Database setup supports? I am just wondering whether any delays that we are getting is due to some form of queuing or pooling. Currently we have about 5 concurrent users.
Thanks.
The SQL Azure Web edition database should support a high number of concurrent users - we've had applications running that issue thousands of queries per minute against Web databases.
Throttling
SQL Azure does implement database throttling to maintain performance for all users of the platform. If throttling has been applied to the current operation you'll receive error 40501. The link I've provided also shows you how to determine why throttling is being applied. If you receive this error you can treat it as a transient error and wait before retrying.
It doesn't sound like your connections are being throttled, because you mention only 5 concurrent users and talk about delays, whereas the throttling error would occur pretty quickly.
Transient error handling
If you're getting connection timeouts etc you need to handle them as transient errors. Transient errors are timeouts or dropped connections, as well as error codes 10054, 10053, 40501 (throttling as described above) and 40197 (usually because an upgrade or failover operation is in progress).
You should ensure you implement retry logic to handle transient errors.
Query performance
If you're executing long running queries you can check which ones are slow by logging into the database management URL:
https://<database-id>.database.windows.net/#$database=<database-name>
Log in and click "Query Performance" - take a look at the longest running queries at the top.

Resources