Synthetic performance AB test - frontend

I have deployed two versions of our singlepage web app: one master (A) and one branch where are some changes which can affect somehow load time (B). The change is usually some new feature on front-end, refactoring, small performance optimization, etc. The difference is not so big and the load time varies much more from other reasons (a load of testing machines, a load of servers, network, etc). So webpagetest.org even with 9 tries varies much more (14-20s speedindex) than the real difference could be (0,5s in average for example).
Basically, I need one number which tells me - this feature increase/decrease load time.
Is there some tool which could measure such differences?
My idea was to deploy Webpagetest to a server with minimal load and run Webpagetest randomly on both versions at the same time so I avoid most of the noise. Make a lot of samples (1000+) and check average(or median) value.
But before I start working on that I would like to ask if there is some service which solves that problem.

Related

about managing file system space

Space Issues in a filesystem on Linux
Lets call it FILESYSTEM1
Normally, space in FILESYSTEM1 is only about 40-50% used
and clients run some reports or run some queries and these reports produce massive files about 4-5GB in size and this instantly fills up FILESYSTEM1.
We have some cleanup scripts in place but they never catch this because it happens in a matter of minutes and the cleanup scripts usually clean data that is more than 5-7 days old.
Another set of scripts are also in place and these report when free space in a filesystem is less than a certain threshold
we thought of possible solutions to detect and act on this proactively.
Increase the FILESYSTEM1 file system to double its size.
set the threshold in the Alert Scripts for this filesystem to alert when 50% full.
This will hopefully give us enough time to catch this and act before the client reports issues due to FILESYSTEM1 being full.
Even though this solution works, does not seem to be the best way to deal with the situation.
Any suggestions / comments / solutions are welcome.
thanks
It sounds like what you've found is that simple threshold-based monitoring doesn't work well for the usage patterns you're dealing with. I'd suggest something that pairs high-frequency sampling (say, once a minute) with a monitoring tool that can do some kind of regression on your data to predict when space will run out.
In addition to knowing when you've already run out of space, you also need to know whether you're about to run out of space. Several tools can do this, or you can write your own. One existing tool is Zabbix, which has predictive trigger functions that can be used to alert when file system usage seems likely to cross a threshold within a certain period of time. This may be useful in reacting to rapid changes that, left unchecked, would fill the file system.

Threads in application server and connections in HttpServer

I'm building a system, I will use two servers and one load balancer,
This company has more than 60,000 users and they expect 10,000 concurrent users, all transactions will occurs within 5 seconds
I'm not sure how about this for each server:
Amount of connections in HttpServer
Amount of threads in application server
I understand that I will find out this numbers when the system is in production but I need to start with something
any suggestion or advice?
This is about Capacity Planning I can give some suggestions as below, however all depend on the technical feasibility and business requirements of your system.
Try to find out your maximum capacity which you need to support. So you can do required stress test to figure this out.
Make sure system capable of improving performance by horizontally with clustering etc.
Decide on predicted capacity requirement(CR) , CR may be H/W ,bandwidth etc
Predicted CR = Current CR + 30%*Current CR
Finally this is about continuous improvements , keep eye on the changes .
Check how reliable the system is , decide on changes to H/W , Software , Architecture etc.
Hope this add some value to you.
Setup a test-server and extrapolate the numbers from there (get some time to do the research to come up with an educated guess).
For example, the "amount of threads in application server" depends on what kind of HTTP-server you use. Some servers can process hundreds of connections with a single thread, but badly programmed/configured servers might end up using 1 thread per connection.
The performance requirement "all transactions will occurs within 5 seconds" needs some further detailing. Showing web-pages with data (from a database) to a user may not take more than 3 seconds (if it takes longer, users get irritated), but ideally should be less than 1 second (an average user would expect that). On the other hand, it might be OK to take 10 seconds to store data from a complex form (as long as the form is not used too often).
I would be skeptical about the load requirement "they expect 10,000 concurrent users". That would mean 1 in 6 company employees is actively using the web-application. I think this would be 'peak usage' and not 'average usage'. This is important to know with respect to the performance requirements and costs: if the system must adhere to the performance requirements during peak usage, you need more money for better and/or more hardware.

Should I go for faster queries or less cpu consuption / faster processing?

I have to choose between performing a query for X size data and not process it, just send it to the client,
OR
I can choose to perform a query for half X size data and do a little processing, then send it to the client.
Now, in my life of a programmer I met storage vs speed problem quite a bit, but in this case, I have to choose between "fast query + processing" or "slow query + no processing".
If it matters, I am using nodejs for the server and mongodb for the database.
If you care, I am holding non intersecting map areas and I am testing if an area intersects any or no map area. All are boxes. If I keep them as origin point, its only one pair of coordinates and I have to process the point into an area(all map areas have the same size). If I store them as an area directly, I don't have to process it anymore, but its 4 pairs of coordinates now. 4 times the size and I think, slower query.
There is no right answer to this question, it all depends on your infrastructure. If you're for example using Amazon Webservices for this, it depends on the transaction price. If you've got your own infrastructure, it depends on the load of the DB and web servers. If they are on the same server, it's a matter of the underlying hardware whether the I/O from the DB starts to limit before the CPU/memory become the bottle neck.
The only way to determine the right answer to this question for your specific situation is to set it up and do a stress test, for example using Load Impact or one of the tons of other good tools to do this. While it is getting hammered, monitor your system load using top and watch the wa column specifically - if it starts going up over 50% consistently you're I/O limited, and the DB should be offloaded to the CPU.

Sequencing documents in distributed CouchDB system

I am implementing a distributed audio repository on which registered users can create audio recordings and share these with other registered users. Ideally we would like to run the repository on a few (n<5) laptops that are intermittently connected to a wireless mesh network. So repository nodes running couchdb would replicate with each other when connectivity is available. The repository will be situated in a rural village, which has no power infrastructure, so repository laptops are powered by solar panels and car batteries. In the past we have run into problems where laptop batteries drain completely resetting the computers clock to 0 -- January 1st, 1970. Internet connectivity is also very patchy and very expensive. Keeping in mind intermittent power and connectivity as well as very inaccurate clocks, I was wondering how best to sequence documents (or recordings) so that they roughly correspond to the chronology of when recordings were created on different repository nodes, both before and after replication. I would be very grateful for any help on how I could create a replicating by_date, by_sequence, by_chronology (or whatever you want to call it) view that could work within these constrained conditions. It doesn't have to be perfect, just good enough or better than a naive solution.
Thank you!
Thomas
Well, you could start by including a simple incrementing integer in each new document, by looking at the last number available in the database and taking the next one. This gets you a few things:
Consistent ordering for locally recorded segments in the absence of connectivity
After synchronization with other nodes, new segments will get a higher sequence number than all replicated segments
This will mean that you cannot piece together if sequence number 45 from one node was recorded earlier or later than sequence number 45 from another node, but it does let you piece together a directed acyclic graph, imposing at least a rudimentary ordering of the documents.
You could add a value representing the node on which the segment was recorded to make it even better.

Is there any modern review of solutions to the 10000 client/sec problem

(Commonly called the C10K problem)
Is there a more contemporary review of solutions to the c10k problem (Last updated: 2 Sept 2006), specifically focused on Linux (epoll, signalfd, eventfd, timerfd..) and libraries like libev or libevent?
Something that discusses all the solved and still unsolved issues on a modern Linux server?
The C10K problem generally assumes you're trying to optimize a single server, but as your referenced article points out "hardware is no longer the bottleneck". Therefore, the first step to take is to make sure it isn't easiest and cheapest to just throw more hardware in the mix.
If we've got a $500 box serving X clients per second, it's a lot more efficient to just buy another $500 box to double our throughput instead of letting an employee gobble up who knows how many hours and dollars trying to figure out how squeeze more out of the original box. Of course, that's assuming our app is multi-server friendly, that we know how to load balance, etc, etc...
Coincidentally, just a few days ago, Programming Reddit or maybe Hacker News mentioned this piece:
Thousands of Threads and Blocking IO
In the early days of Java, my C programming friends laughed at me for doing socket IO with blocking threads; at the time, there was no alternative. These days, with plentiful memory and processors it appears to be a viable strategy.
The article is dated 2008, so it pulls your horizon up by a couple of years.
To answer OP's question, you could say that today the equivalent document is not about optimizing a single server for load, but optimizing your entire online service for load. From that perspective, the number of combinations is so large that what you are looking for is not a document, it is a live website that collects such architectures and frameworks. Such a website exists and its called www.highscalability.com
Side Note 1:
I'd argue against the belief that throwing more hardware at it is a long term solution:
Perhaps the cost of an engineer that "gets" performance is high compared to the cost of a single server. What happens when you scale out? Lets say you have 100 servers. A 10 percent improvement in server capacity can save you 10 servers a month.
Even if you have just two machines, you still need to handle performance spikes. The difference between a service that degrades gracefully under load and one that breaks down is that someone spent time optimizing for the load scenario.
Side note 2:
The subject of this post is slightly misleading. The CK10 document does not try to solve the problem of 10k clients per second. (The number of clients per second is irrelevant unless you also define a workload along with sustained throughput under bounded latency. I think Dan Kegel was aware of this when he wrote that doc.). Look at it instead as a compendium of approaches to build concurrent servers, and micro-benchmarks for the same. Perhaps what has changed between then and now is that you could assume at one point of time that the service was for a website that served static pages. Today the service might be a noSQL datastore, a cache, a proxy or one of hundreds of network infrastructure software pieces.
You can also take a look at this series of articles:
http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3
He shows a fair amount of performance data and the OS configuration work he had to do in order to support 10K and then 1M connections.
It seems like a system with 30GB of RAM could handle 1 million connected clients on a sort of social network type of simulation, using a libevent frontend to an Erlang based app server.
libev runs some benchmarks against themselves and libevent...
I'd recommend Reading Zed Shaw's poll, epoll, science, and superpoll[1]. Why epoll isn't always the answer, and why sometimes it's even better to go with poll, and how to bring the best of both worlds.
[1] http://sheddingbikes.com/posts/1280829388.html
Have a look at the RamCloud project at Stanford: https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud
Their goal is 1,000,000 RPC operations/sec/server. They have numerous benchmarks and commentary on the bottlenecks that are present in a system which would prevent them from reaching their throughput goals.

Resources