Ideal timeout period for dns lookup - dns

In my rails app i do a nslookup using a ruby library resolv. If the site like dgdfgdfgdfg.com is entered its talking too long to resolve. in some instance like 20 sec.(mostly for non-existent sites) Because it cause the application to slowdown.
So i though of introducing a timeout period for the dns lookup.
What will be the ideal timeout period for the dns lookup so that resolution of actual site doesnt fail. will something like 10 sec will be fine?

There's no IETF mandated value, although ยง6.1.3.3 of RFC 1123 suggests a value not less than 5 seconds.
Perl's Net::DNS and the command line dig utility do default to 5 seconds between retries. Some versions of the Microsoft resolver appear to default to 3 seconds.

You can run some tests among the users to find out the right number compromising responsiveness / performance.
Also you can adjust that timeout dinamically depending on the network traffic.
For example, for every sucessful resolv, you save how much time it took you to resolv it. And every hour (for example) you can calculate an average and set double of its value as timeout (Remember that "average" is, roughly speaking, "the middle"). This way if your latency is high at some point, it autoadjust itself to increase the timeout period.

Related

Native Transport Requests All time blocked

Trying to understand more about Native-Transport-Requests!
As we know these are cql requests and if limit exceeds the result will be all time blocked NTR.
My question is how do i monitor these requests in real time and get some kind of report on it.
I see some settings like max_queued_native_transport_requests and native_transport_max_threads. How these settings will have effect over all time blocked.
Have a look at JIRA-11363.
Also check this discussion for more info.
The recommendation is to start with the default values and tune from there. The default values are:
max_queued_native_transport_requests=1024
native_transport_max_threads: 128
Monitor you nodes and if you see an increasing number of blocked Native-Transport-Requests, then you need to increase max_queued_native_transport_requests.
Also, I think it's worth checking these discussions: 1, 2

Instagram rate-limit header with no predictable value

According to documentation: https://www.instagram.com/developer/limits/
The rate-limit control works under a "time-sliding" window, the question is:
What's the frequency of increasing for the remaining calls HTTP header (x-ratelimit-remaining) seconds? minutes?, an hour?
Reading the docs. "5000/hr per token for Live apps" (our company App went Live already), I assumed a frequency limiter, being calculated each second or minute, but after several days trying different strategies the value doesnt seem to have any deductible behaviour.
Possible answers (depending how it is coded) could be:
(a sliding window like a frequency limiter)
it increases 1 credit each 720 ms (3600'(1hr) / 5000 (remaining calls)) without a request until reaching 5000, it decays to 0 otherwise.
If we do 1 req. at the correct frequency, we should never lose 5000 calls., So we could spend them strategically: dispersed, cluttered, traffic-adapted.
(a limited sink recharging each hour)
with 5000 remaining, it decays 1 credit per request -no matter the frequency-, after 1 hour passed since that 1st request: it goes back to 5000
it renews to 5000 each 1 hour counting since the token was used to do the 1st request.
it decays 1 credit per request, and it goes to 5000 in a time fixed hour, like at 12:00, 13:00, 14:00, 15:00...
I'm using jInstagram 1.1.7.
After a lot of testing....
I have some temporary conclusions...
Starting from 5000, if you fetch at uniform rate (720ms/req) you will reach 500 like at the minute 50, then instagram will begin to give you credit in portions lesser than 500. So at the minute 60 you'll have 150 remaining calls left, and instagram will give you another credit portion, generally reaching 500 avg. and going down again of course...
If you stop consuming, like 30 minutes aprox. You will acquire again 5000 credits.
Also they give you 5000 remaining calls, they seem to have counters indexed by IP, if you make the request from different IPs with the same credential, they'll act like ignoring the others.
Besides that, instagram have many errors keeping a consistent value for the x-ratelimit-remaining HTTP header they respond on every HTTP request.
It looks related to some overriding, and some kind of race between the servers replicating the last value.
Shame on you instagram, I spent a lot of time adapting my cool throttling algorithm to your buggy behaviour, assuming you had good engineering down there !
Please fix them so we can play fair with you instead of playing hide and seek, stealth tricks..

What does BandwidthIn and BandwidthOut graph represent for a service?

I have a service and its bandwidth graph looks like this
What does it represent.? I am using tutum which shows me these graphs.!
Should I worry about it.? Please Explain! Any help is appreciated.!
Bandwidth is the the amount of data sent (Out) or received (In) in a period of time. Mbps stands for Mega bits per seconds, i.e., how many bits did you send or receive during that past whole second.
I am sure you heard about xxx Mpbs from your internet provider, in which case, it correspond to the maximum speed you can have, but you are not required to use the whole bandwidth all the time.
Same thing on Tutum, depending on your hosting provider / instance type you will also have a maximum Mbps bandwidth, but at any given t time, you are using YY Mbps out of your XX Mpbs maximum.
As the graph increase, it simply means that you send/receive more data, which can mean that you have a higher traffic or you are doing some kind of networking activity.

Give reads priority over writes in Elasticsearch

I have an EC2 server running Elasticsearch 0.9 with a nginx server for read/write access. My index has about 750k small-medium documents. I have a pretty continuous stream of minimal writes (mainly updates) to the content. The speeds/consistency I receive with search is fine with me, but I have some sporadic timeout issues with multi-get (/_mget).
On some pages in my app, our server will request a multi-get of a dozen to a few thousand documents (this usually takes less than 1-2 seconds). The requests that fail, fail with a 30,000 millisecond timeout from the nginx server. I am assuming this happens because the index was temporarily locked for writing/optimizing purposes. Does anyone have any ideas on what I can do here?
A temporary solution would be to lower the timeout and return a user friendly message saying documents couldn't be retrieved (however they still would have to wait ~10 seconds to see an error message).
Some of my other thoughts were to give read priority over writes. Anytime someone is trying to read a part of the index, don't allow any writes/locks to that section. I don't think this would be scalable and it may not even be possible?
Finally, I was thinking I could have a read-only alias and a write-only alias. I can figure out how to set this up through the documentation, but I am not sure if it will actually work like I expect it to (and I'm not sure how I can reliably test it in a local environment). If I set up aliases like this, would the read-only alias still have moments where the index was locked due to information being written through the write-only alias?
I'm sure someone else has come across this before, what is the typical solution to make sure a user can always read data from the index with a higher priority over writes. I would consider increasing our server power, if required. Currently we have 2 m2x-large EC2 instances. One is the primary and the replica, each with 4 shards.
An example dump of cURL info from a failed request (with an error of Operation timed out after 30000 milliseconds with 0 bytes received):
{
"url":"127.0.0.1:9200\/_mget",
"content_type":null,
"http_code":100,
"header_size":25,
"request_size":221,
"filetime":-1,
"ssl_verify_result":0,
"redirect_count":0,
"total_time":30.391506,
"namelookup_time":7.5e-5,
"connect_time":0.0593,
"pretransfer_time":0.059303,
"size_upload":167002,
"size_download":0,
"speed_download":0,
"speed_upload":5495,
"download_content_length":-1,
"upload_content_length":167002,
"starttransfer_time":0.119166,
"redirect_time":0,
"certinfo":[
],
"primary_ip":"127.0.0.1",
"redirect_url":""
}
After more monitoring using the Paramedic plugin, I noticed that I would get timeouts when my CPU would hit ~80-98% (no obvious spikes in indexing/searching traffic). I finally stumbled across a helpful thread on the Elasticsearch forum. It seems this happens when the index is doing a refresh and large merges are occurring.
Merges can be throttled at a cluster or index level and I've updated them from the indicies.store.throttle.max_bytes_per_sec from the default 20mb to 5mb. This can be done during runtime with the cluster update settings API.
PUT /_cluster/settings HTTP/1.1
Host: 127.0.0.1:9200
{
"persistent" : {
"indices.store.throttle.max_bytes_per_sec" : "5mb"
}
}
So far Parmedic is showing a decrease in CPU usage. From an average of ~5-25% down to an average of ~1-5%. Hopefully this can help me avoid the 90%+ spikes I was having lock up my queries before, I'll report back by selecting this answer if I don't have any more problems.
As a side note, I guess I could have opted for more balanced EC2 instances (rather than memory-optimized). I think I'm happy with my current choice, but my next purchase will also take more CPU into account.

Average internet delay

Just wondering, what is the average packet transmission delay between two hosts over the internet (ignoring packet loss and retransmission).
Now, hang a second before you write that it's too genenral and depends on too many factors (Location of the two hosts, network workload at a specific time, just to name a few), i'm aware of that.
Yet, that's why i'm asking what might be the AVERAGE delay. There must be some record for that.
Maybe it's appropriate to ask for seperate countrywide/continentwide/intercontinental average values, too. Whatever makes sense.
However you ask it, this question is WAAAYYYY too general. Ping times can give you a reasonable approximation, though. My avg to a google host:
round-trip (ms) min/avg/max/med = 20/23/37/21
Yahoo:
round-trip (ms) min/avg/max/med = 19/23/38/23
Baidu (China):
round-trip (ms) min/avg/max/med = 269/272/275/272
Pair (Pittsburgh):
round-trip (ms) min/avg/max/med = 63/66/73/67
Google and Y! are using content-distribution networks, so I am most likely hitting servers very nearby. Baidu is across the world from me. Pair is across the country. These are all from a relatively fast connection.
I'd expect a dialup user to see figures that are approximately 100-200 ms higher (depending on network activity at the time). Similarly, my figures would increase significantly if my network were heavily loaded (its not at the moment).
Does that help at all?
You may find the discussion of this stuff at this page interesting. The author argues that traffic is traveling at about half the speed of light (the speed of light being the best you can possibly do for traffic speed, assuming various scientists are right.

Resources