Number of Tor relays - tor

I have recently reached https://metrics.torproject.org/relayflags.html and I noticed a spike on 13/01/19.
According to the graph, almost half of the relays of the network were not online this day.
Do you have any ideas about how is this possible?

Related

Websockets severe delay/latency

We're using websockets (specifically uws.js on Node.js) to run a multiplayer quiz. The server's running on an AWS t2.micro in the eu-west-2a region, but recently, we've been seeing some incredibly high latency from some players - yet only on an intermittent basis.
By latency, what I'm actually measuring is the time taken from sending a broadcast message (using uws's built in pub-sub), to the player's device telling the server they've safely received it. The message we're tracking tells the player's device to move on to the next phase of the quiz, so it's pretty critical to the workings of the application. Most of the time, for players in the UK, this time is somewhere around 30 - 60 ms, but every now and then we're seeing delays of up to 17s.
Recently, we had a group on the other side of the world to our server do a quiz, and even though there were only 10 or so players, so the server's definitely not being overloaded, we saw roughly half that group intermittently getting these very, very high latency spikes, where it'd take 12, 17, 22, or even 39(!!) seconds for their device to acknowledge having received the message. Even though this is a slow paced quiz, that's still an incredibly amount of latency, and something that has a real detrimental effect.
My question is, how do I tell what's causing this, and how do I fix it? My guess would be it's something to do with TCP and its in-order delivery, combined with some perhaps dodgy internet connections, as one of the players yesterday seemed to receive nothing for 39 seconds, then three messages all in a row, backed up. To me that suggests packet loss, but I don't know where to even begin when trying to resolve it. I also haven't yet figured out how to reproduce it (I've never seen it happen when I've been playing myself), which makes things even harder.
TCP routing issues are unlikely to cause extreme delays of 17+seconds. Are you sure there is no "store and forward" queuing system that is buffering your messages on a server or perhaps a cloud pub/sub queue?
Another important check: t2.micro is just about the cheapest, least-assured networking QoS VM you can boot on AWS. No throughput and no jitter assurances on the network performance.
You may wish to review:
EC2 network benchmarking guide from AWS including MTU parameters
Available instance bandwidth for EC2
t2.micro does not have any minimum baseline assured bandwidth for example.

How to catch/record a "Burst" in HAProxy and/or NodeJS traffic

We have a real-time service, which gets binary messages from different sources (internal and external), then using a couple of NodeJS instances and one HAProxy instance, configured to route TCP traffic, we deliver them to our end-users and different services who consume the messages. HAProxy version is 1.8.14, NodeJS is 6.14.3, both hosted on a CentOS 7 machine.
Now we've got a complex problem with some "burst"s in the outbound interface of HAProxy instance. We are not sure whether the burst is real (e.g. some messages got stuck in Node and then network gets flooded with messages) or the problem is some kind of misconfig or an indirect effect of some other service (Both latter reasons are more likely, as sometimes we get these bursts during midnight, which we have minimal to zero load).
The issue is annoying right now, but it might get critical as it floods our outbound traffic so our real-time services experience a lag or a small downtime during working hours.
My question is, how can we track and record the nature or the content of these messages with minimum overhead? I've been reading through HAProxy docs to find a way to monitor this, which can be achieved by using a Unix socket, but we are worried about a couple of things:
How much is the overhead of using this socket?
Can we track what is going on in the servers using this socket? Or it only gives us stats?
Is there a way to "catch/echo" the contents of these messages, or find out some information about them? with minimum overhead?
Please let me know if you have any questions regarding this problem.

Azure VM outbound HTTP is unreliable

I have setup and Azure VM and installed a monitoring service that reaches out to various endpoints to verify a 200 response. The service is set to cycle through about 8 URL endpoints every 5 minutes or so.
We have run this service from multiple other servers outside of Azure, including virtual machines that are cheap, low end offerings.
While this machine is running on the lowest A0, it isn't doing anything else other than to run this service and call out to the various endpoints.
We are getting intermittent periods where one of the calls out of the list will fail for different periods that span 10-40 minutes at random periods several times a day.
The site or sites that fail are totally random and there is no down time from other monitor locations. We are sure that the connection problem is between Azure and the endpoints outside of Azure. There is no problem from anywhere outside of Azure.
I'm trying to figure out what could be causing this issue. It concerns me because we will be adding more services to Azure soon that use outside HTTP calls for credit card authorization and other API's.
Is this a known issue where outbound calls just don't function reliably at periods, or am I missing something in the setup or security settings?
Obviously, if the call makes it out and the response doesn't make it back, that is even worse as credit card charges would end up being pushed and the application would not register the proper response.
Anyone with some experience or insight would be greatly appreciated.
Thanks!
I find that very disturbing and hard to believe since, among a lot of other stuff, I run a service like that too... In my case I reach out to several (today, about 70) external addresses on both IPV4 and IPV6. I don't run A0s, and most of my machines are A3. I'll start a A0 to test it... if anything turns out <terminator>i'll be back</terminator> to report...
I know that there are several limitations regarding network traffic but i don't think you can reach them the way you're reporting...
My suggestion is to report that problem directly to MS via support ticket... most likely the problem is on the other side...

What are the most important statistics to look at when deploying a Node.js web-application?

First - a little bit about my background: I have been programming for some time (10 years at this point) and am fairly competent when it comes to coding ideas up. I started working on web-application programming just over a year ago, and thankfully discovered nodeJS, which made web-app creation feel a lot more like traditional programming. Now, I have a node.js app that I've been developing for some time that is now running in production on the web. My main confusion stems from the fact that I am very new to the world of the web development, and don't really know what's important and what isn't when it comes to monitoring my application.
I am using a Joyent SmartMachine, and looking at the analytics options that they provide is a little overwhelming. There are so many different options and configurations, and I have no clue what purpose each analytic really serves. For the questions below, I'd appreciate any answer, whether it's specific to Joyent's Cloud Analytics or completely general.
QUESTION ONE
Right now, my main concern is to figure out how my application is utilizing the server that I have it running on. I want to know if my application has the right amount of resources allocated to it. Does the number of requests that it receives make the server it's on overkill, or does it warrant extra resources? What analytics are important to look at for a NodeJS app for that purpose? (using both MongoDB and Redis on separate servers if that makes a difference)
QUESTION TWO
What other statistics are generally really important to look at when managing a server that's in production? I'm used to programs that run once to do something specific (e.g. a raytracer that finishes running once it has computed an image), as opposed to web-apps which are continuously running and interacting with many clients. I'm sure there are many things that are obvious to long-time server administrators that aren't to newbies like me.
QUESTION THREE
What's important to look at when dealing with NodeJS specifically? What are statistics/analytics that become particularly critical when dealing with the single-threaded event loop of NodeJS versus more standard server systems?
I have other questions about how databases play into the equation, but I think this is enough for now...
We have been running node.js in production nearly an year starting from 0.4 and currenty 0.8 series. Web app is express 2 and 3 based with mongo, redis and memcached.
Few facts.
node can not handle large v8 heap, when it grows over 200mb you will start seeing increased cpu usage
node always seem to leak memory, or at least grow large heap size without actually using it. I suspect memory fragmentation, as v8 profiling or valgrind shows no leaks in js space nor resident heap. Early 0.8 was awful in this respect, rss could be 1GB with 50MB heap.
hanging requests are hard to track. We wrote our middleware to monitor these especially as our app is long poll based
My suggestions.
use multiple instances per machine, at least 1 per cpu. Balance with haproxy, nginx or such with session affinity
write midleware to report hanged connections, ie ones that code never responded or latency was over threshold
restart instances often, at least weekly
write poller that prints out memory stats with process module one per minute
Use supervisord and fabric for easy process management
Monitor cpu, reported memory stats and restart on threshold
Whichever the type of web app, NodeJS or otherwise, load testing will answer whether your application has the right amount of server resources. A good website I recently found for this is Load Impact.
The real question to answer is WHEN does the load time begin to increase as the number of concurrent users increase? A tipping point is reached when you get to a certain number of concurrent users, after which the server performance will start to degrade. So load test according to how many users you expect to reach your website in the near future.
How can you estimate the amount of users you expect?
Installing Google Analytics or another analytics package on your pages is a must! This way you will be able to see how many daily users are visiting your website, and what is the growth of your visits from month-to-month which can help in predicting future expected visits and therefore expected load on your server.
Even if I know the number of users, how can I estimate actual load?
The answer is in the F12 Development Tools available in all browsers. Open up your website in any browser and push F12 (or for Opera Ctrl+Shift+I), which should open up the browser's development tools. On Firefox make sure you have Firebug installed, on Chrome and Internet Explorer it should work out of the box. Go to the Net or Network tab and then refresh your page. This will show you the number of HTTP requests, bandwidth usage per page load!
So the formula to work out daily server load is simple:
Number of HTTP requests per page load X the average number of pages load per user per day X Expected number of concurrent users = Total HTTP Requests to Server per Day
And...
Number of MBs transferred per page load X the average number of pages load per user per day X Expected number of concurrent users = Total Bandwidth Required per Day
I've always found it easier to calculate these figures on a daily basis and then extrapolate it to weeks and months.
Node.js is single threaded so you should definitely start a process for every cpu your machine has. Cluster is by far the best way to do this and has the added benefit of being able to restart died workers and to detect unresponsive workers.
You also want to do load testing until your requests start timing out or exceed what you consider a reasonable response time. This will give you a good idea of the upper limit your server can handle. Blitz is one of the many options to have a look at.
I have never used Joyent's statistics, but NodeFly and their node-nodefly-gcinfo is a great tools to monitor node processes.

Collecting high volume DNS stats with libpcap

I am considering writing an application to monitor DNS requests for approximately 200,000 developer and test machines. Libpcap sounds like a good starting point, but before I dive in I was hoping for feedback.
This is what the application needs to do:
Inspect all DNS packets.
Keep aggregate statistics. Include:
DNS name.
DNS record type.
Associated IP(s).
Requesting IP.
Count.
If the number of requesting IPs for one DNS name is > 10, then stop keeping the client ip.
The stats would hopefully be kept in memory, and disk writes would only occur when a new, "suspicious" DNS exchange occurs, or every few hours to write the new stats to disk for consumption by other processes.
My question are:
1. Do any applications exist that can do this? The link will be either 100 MB or 1 GB.
2. Performance is the #1 consideration by a large margin. I have experience writing c for other one-off security applications, but I am not an expert. Do any tips come to mind?
3. How much of an effort would this be for a good c developer in man-hours?
Thanks!
Jason
I suggest you to try something like DNSCAP or even Snort for capturing DNS traffic.
BTW I think this is rather a superuser.com question than a StackOverflow one.

Resources