Benchmark simulating "realistic" desktop/server workload - linux

I'm currently working on a good energy estimation using the CPU's performance counters. To be able to choose the best counters, I need a benchmark simulating realistic workload.
So, does anybody know a good (free if possible) benchmark suite which simulates usual desktop and/or server workload?
I'm thinking of a suite of isolated benchmarks, e.g.
compile C code
interpretation of JavaScript
some SSL
some IO (disk/network usage)
image conversion
some math problem solving
In fact a good mix of tasks a computer executes all the time a user is working :-).
EDIT: The best would be something where very little floating point gets used.

Phoronix Test Suite is your answer!
It can even use external wat-o-meter.
And is best benchmark of cpu and gpu for linux.

The best benchmark would probably be installing Apache or some other web server, setting up a script on one or more computers to request pages (using http, https, and whatever other protocols you will use). You could make a script to request from localhost, but it would be more realistic if you had an external computer making requests, you could also test network latency.
Once set up, you could use PowerTop to estimate watts used during load.

Related

Real Browser based load testing or Browser level user testing

I am currently working on multiple Load testing tool such as Jmeter, LoadRunner and Gatling.
All above tool works upon protocol level user load testing except TrueClient protocol offered by LoadRunner. Now something like real browser testing is in place which is definitely high on resources consumption tools such as LoadNinja and Flood.IO works on this novel concept.
I have few queries in this regards
What will be the scenario where real browser based load testing fits perfectly?
What real browser testing offers which is not possible in protocol based load testing?
I know, we can use Jmeter to Mimic browser behavior for load testing but is there anything different that real browser testing has to offer?
....this novel concept.....
You're showing your age a bit here. Full client testing was state of the art in 1996 before companies shifted en masse to protocol based testing because it's more efficient in terms of resources. (Mercury, HP, Microfocus) LoadRunner, and (Segue, Borland, Microfocus) Silk, and (Rational, IBM) Robot, have retained the ability to use full GUI virtual users (run full clients using functional automation tools) since this time. TruClient is a recent addition which runs a full client, but simply does not write the output to the screen, so you get 99% of the benefits and the measurements
What is the benefit. Well, historically two tier client server clients were thick. Lots of application processing going on. So having a GUI Virtual user in a small quantity combined with protocol virtual users allowed you to measure the cost/weight of the client. The flows to the server might take two seconds, but with the transform and present in the client it might take an addtional 10 seconds. You now know where the bottleneck is/was in the user experience.
Well, welcome to the days of future past. The web, once super thin as a presentation later, has become just as thick as the classical two tier client server applications. I might argue thicker as the modern browser interpreting JavaScript is more of a resource hog than the two tier compiled apps of years past. It is simply universally available and based upon a common client-server protocol - HTTP.
Now that the web is thick, there is value in understanding the delta between arrival and presentation. You can also observe much of this data inside of the performance tab of Chrome. We also have great w3c in browser metrics which can provide insight into the cost/weight of the local code execution.
Shifting the logic to the client also has resulted in a challenge on trying to reproduce the logic and flow of the JavaScript frameworks for producing the protocol level dataflows back and forth. Here's where the old client-server interfaces has a distinct advantage, the protocols were highly structured in terms of data representation. So, even with a complex thick client it became easy to represent and modify the dataflows at the protocol level (think database as an example, rows, columns....). HTML/HTTP is very much unstructured. Your developer can send and receive virtually anything as long as the carrier is HTTP and you can transform it to be used in JavaScript.
To make it easier and more time efficient for script creation with complex JavaScript frameworks the GUI virtual user has come back into vogue. Instead of running a full functional testing tool driving a browser, where we can have 1 browser and 1 copy of the test tool per OS instance, we now have something that scale a bit more efficiently, Truclient, where multiple can be run per OS instance. There is no getting around the high resource cost of the underlying browser instance however.
Let me try to answer your questions below:
What will be the scenario where real browser based load testing fits perfectly?
What real browser testing offers which is not possible in protocol based load testing?
Some companies do real browser based load testing. However, as you rightly concluded that it is extremely costly to simulate such scenarios. Fintech Companies mostly do that if the load is pretty less (say 100 users) and application they want to test is extremely critical and such applications cannot be tested using the standard api load tests as these are mostly legacy applications.
I know, we can use JMeter to Mimic browser behaviour for load testing but is there anything different that real browser testing has to offer?
Yes, real Browsers have JavaScript. Sometimes if implementation is poor on the front end (website), you cannot catch these issues using service level load tests. It makes sense to load test if you want to see how well the JS written by the developers or other logic is affecting page load times.
It is important to understand that performance testing is not limited to APIs alone but the entire user experience as well.
Hope this helps.
There are 2 types of test you need to consider:
Backend performance test: simulating X real users which are concurrently accessing the web application. The goal is to determine relationship between increasing number of virtual users and response time/throughput (number of requests per second), identify saturation point, first bottleneck, etc.
Frontend performance test: protocol-based load testing tools don't actually render the page therefore even if response from the server came quickly it might be the case that due to a bug in client-side JavaScript rendering will take a lot of time therefore you might want to use real browser (1-2 instances) in order to collect browser performance metrics
Well-behaved performance test should check both scenarios, it's better to conduct main load using protocol-based tools and at the same time access the application with the real browser in order to perform client-side measurements.

IO performance in windows and linux

We want to build a web service to return some images (like google map tiles).
And the source data is organized as the esri compact cache format,the key of our service is to read the tiles from the bundles.
I am not sure how to choose the platform,windows or linux?
It is said that the linux have a bettor IO reading/writing performance than that of windows.
However java is our only choose if we choose linux,so I want to know if there is any points we should know to impove the IO reading performnce in linux?
PS:
In winodws platform,we will build the service based on .net4 using c#,and deploy the service use iis.
In linux,we will build the service using java (maybe based on spring mvc or some other mvc framework),and deploy the service using tomcat.
Update:
We may have the following source compact files in different folds:
L1
RxxCxx.bundle
RxxCxx.bundlx
L2
RxxCxx.bundle
RxxCxx.bundlx
And the request from the client may looks like this:
http://ourserver/maptile?row=123&col=234&level=1.png
For this requst,we will go in to the fold L1 since the level is 1,then read the RxxCxx.bundlx file first,since this file is the metadata that till tell us the position(the offset and length in RxxCxx.bundle) of the data for render the image(row=123&col=234),then we will read the RxxCxx.bundle according to the offset and length. Then we render the data to an image by write them to the response and set the content type to "image/png" or something else.
This is a whole procceed to handle a request.
Then I wonder if there is any documents or exist demos which can show me how to handle these type of IO reading?
The only situation where you have to have Windows servers in your environment is when you choose MS SQL Server DBMS (it is almost a Sybase but is a way cheaper), in which case have Windows box for the DB and *nix server for middle tier.
There are many situations where Windows can be used. Beginning with the declaration "have to have Windows" reveals an existing bias and is then followed by many groundless statements. But at least you clearly recognized this as the case.
Java is the best technology for millisecond grade middleware, mainly for the amount of mature standartized open source technologies available. Everything from coding (Eclipse, NetBeans, Idea) to manual (ant, maven) and automatic (teamcity, hudson/jenkins) builds, testing, static code analysis is there, is standartized, is open source, and is backed up by a multimillion size community.
I feel it necessary to say Visual Studio/C# (because OP mentioned as an alternative) offers everything you mentioned above with the exception of being open source. That said, the .NET Framework (or .NET Core) is now open source. Get information here. Based on your above comment, I think I can conclude that the only viable solutions are available through the open source community.
Quote I once heard that has a lot of truth: "It's only free if your time is worthless."
Also, counting the entirety of the open source community is a bogus argument. You'd have to take one development tool/API and compare the community support with another. For example, compare the community size/quality for Visual Studio with that of Eclipse. Or that of the .NET Framework vs. Java.
By the way, I've experienced no better intellisense implementation than with Visual Studio/Windows. When Eclipse does work you have rely on the quality of the open source libraries you reference to have anything meaningful. I've found the .NET Framework requires fewer 3rd party libraries than Java to accomplish the same goal.
Linux is the best server side platform for performance, stability, ease of maitenance, quality of the development environment - an extremely powerful command line based IDE. You can expect multimonth uptime from a Linux server, but not from Windows.
We have many Windows servers running services processing "big data" that have a system up-time since 5/30/2014 (nearly a year) and several more running without interruption since 2013. The only times we experience up-time problems is when hardware is aged/failing or the application-layer software we wrote contains bugs.
Tomcat/Servlet (or Jetty/Servlet) is a classic industrial combination in many financial institutions where stability is the #1 priority.
IIS is also used: job posting for IIS developer at financial institution
And lastly, the IO performance concern: a high quality user space non-blocking IO code will be CPU and hardware bandwidth bound, so OS will not be determining factor. Though fancy things like interrupts affinity, threads pinning, informed realtime tuning, kernel bypass I believe are easier to do on Linux.
Most of these variables are defined by each OS. It sounds like you have a lot of experience with threads, but also I would posit the developer can optimize at the application layer just as easily in both environments. Changing thread priority, implementing a custom thread pool, configuring BIOS, etc. are all available in the Windows world as well. Unless you want to customize the kernel which Unix/Linux allows, but then you have to support your own custom build of Unix/Linux.
I don't think commercial software should be vilified or avoided in favor of open source as a rule.
I understand this may sound as a groundless statement, but use *nix unless you have to use Windows. The only situation where you have to have Windows servers in your environment is when you choose MS SQL Server DBMS (it is almost a Sybase but is a way cheaper), in which case have Windows box for the DB and *nix server for middle tier.
Java is the best technology for millisecond grade middleware, mainly for the amount of mature standartized open source technologies available. Everything from coding (Eclipse, NetBeans, Idea) to manual (ant, maven) and automatic (teamcity, hudson/jenkins) builds, testing, static code analysis is there, is standartized, is open source, and is backed up by a multimillion size community.
Linux is the best server side platform for performance, stability, ease of maitenance, quality of the development environment - an extremely powerful command line based IDE. You can expect multimonth uptime from a Linux server, but not from Windows.
Tomcat/Servlet (or Jetty/Servlet) is a classic industrial combination in many financial institutions where stability is the #1 priority.
And lastly, the IO performance concern: a high quality user space non-blocking IO code will be CPU and hardware bandwidth bound, so OS will not be determining factor. Though fancy things like interrupts affinity, threads pinning, informed realtime tuning, kernel bypass I believe are easier to do on Linux.

HTTPS on Cocos2d-x

I'm implementing a game app based on cocos2d-x. In order to technically prevent cheating, one of the ideas to do is using HTTPS for all the client-server communication, which make it difficult to get the data format / game logic and send modified request to cheat. (I know "prevent" is actually impossible but for increasing the cost of making game cheating it's ok : ). My question is,
In Cocos2d-x, how to make HTTPS request? Possible?
In a more general case, technically what to do to reduce such game hacking? What strategy to hold?
For native cross platform C++ networking you may consider using Boost C++ libraries. Boost.Asio is the one used for networking.
Boost.Asio link:
http://www.boost.org/doc/libs/1_53_0/doc/html/boost_asio.html
Boost.Asio tutorials link: http://www.boost.org/doc/libs/1_53_0/doc/html/boost_asio/tutorial.html
Although not officially supported (only due to lack of regression testing on iOS and Android), Boost runs without any problems on iOS and Android (and probably other C++ based mobile platforms as well).
To prevent cheating you usually rely on an external source (which can be your game server) e.g. if your game relies on the time of day you may get the time form an external server. You may use encryption libraries for data transfer on the client and server side.
by using curl library you can make https connection.
if you want technically protect your game use you own strong encryption technique.
Thanks
Hi this is a problem we face all the time. If the cheating is limited to the cheater's instance the questions is academical and should be studied on your spare time.
On the other hand when your income is impacted or when the cheater's actions impact other players and degrade the game experience you should put some effort on testing the game state for inconsistencies, secure the client/server transactions and deal with cheating in very subtle ways to avoid completely deterring the cheaters' interest.
C++ https implementations are available with curl and boost.
Concerning the game data, the simplest way to test for inconsistencies are scores. You can add a few indicators to avoid polluting your leaderboards. You can add special checksums based on the score's components (time spent in game, number of power ups and score multipliers received...) if you can recalculate the score on the server and if inconsistencies are discovered you can deal with it.
Also you can grab instants of the game state and a few commands, encode that and replay the sequences on the server to check for inconsistencies. Deal with cheaters however you like.
When playing on a server let the server manage the gamestate and allow no client side game state changes that would impact players. Check for input consistency etc...
When using micro transactions each micro transaction should be verified with the vendors servers before being fully committed to the player's account.
Even if these papers 1, 2 from valve refer to fps games they should give you some pointers as to how to deal with state inconsistencies (introduced by communication delays). It should help in avoiding fake positives and ruining the experience for non cheaters.

What are good ways to create real-time stats for high-load webservers?

Say I have a bunch of webservers each serving 100's of requests/s, and I want to see real time stats like:
Request rate over last 5s, 60s, 5 min etc
Number of unique users seen again per time window
Or in general for a bunch of timestamped events, I want to see real-time derived statistics - what's the best way to go about it?
I've considered having each GET request update a global counter somewhere, then sampling that at various intervals, but at the event rates I'm seeing it's hard to get a distributed counter that's fast enough.
Any ideas welcome!
Added: Servers are Linux running Apache/mod_wsgi, with a Python (Django) stack.
Added: To give a sense of the event rates I want to track stats for, they're coming in at over 10K events/s. Even incrementing a distributed counter at that rate is a challenge.
You might like to help us try out the beta of our agent for application performance monitoring in Python web applications.
http://newrelic.com
It delves more into the application performance rather than just the web server, but since any bottlenecks aren't generate going to be the web server, but your application then that is going to be more useful anyway.
Disclaimer. I work for New Relic and this is the project I am working on. It is a paid product, but the beta means it is free for now with all features. Later when that changes, if you didn't want to pay for it, their is still a Lite subscription level which is free and which gives you basic web metrics reporting which still covers some of what you are after. Anyway, right now would be a great opportunity to make use of it to debug your performance while you can.
Virtually all good servers provide this kind of functionality out of the box. For example, Apache has the mod_status module and Glassfish supports JMX. Furthermore, there are many commercial packages for monitoring clusters, such as Hyperic and Zenoss.
What web or application server are you using? It is difficult to provide a solution without that information.
Look at using WebSockets, their overhead is much smaller than a HTTP request, they are very well suited to real-time web applications. See: http://nodeknockout.com/ for Node based websocket examples.
http://en.wikipedia.org/wiki/WebSocket
You will need to run a daemon if you want to run it on your apache server.
Also take a look at:
http://kaazing.com/ if you wan't less hassle, but are willing to fork out some cash.
On the Windows side, Perfmonance monitor is the tool you should investigate.
As Jared O'Connor said, you should precise what kind of web server you want to monitor.

Which programming language suits web critical application development?

According to this page,it seems that Perl,PHP,Python is 50 times slower than C/C++/Java.
Thus,I think Perl,PHP,Python could not handle critical application(such as >100 million user,>xx million request every second) well.But exceptions are exist,e.g. facebook(it is said facebook is written with PHP entirely),wikipeida.Moreover,I heard google use Python extensively.
So why?Is it the faster hardware fill the big speed gap between C/C++/Java and Perl/PHP/Python?
thanks.
Computational code is the least of my concerns in most heavy usage web applications.
The bottle necks in a typical high availablility web application are (not nessecarility in this order, but most likely):
Database (IO and CPU)
File IO
Network Bandwidth
Memory on the Application Server
Your Java / C++ / PHP / Python code
Your main concerns to make your application scalable are:
Reduce access to the database (caching, with clustering in mind, smart quering)
Distribute your application (clustering)
Eliminate useless synchronization locks between threads (see commons-pool 1.3)
Create the correct DB indexes, data model, and replication to support many users
Reduce the size of your responses, using incremental updates (AJAX)
Only after all of the above are implemented, optimize your code
Please feel free to add more to the list if I missed something
The page you are linking only tells half the truth. Of course native languages are faster than dynamic ones, but this is critical to applications with high computing requirements. For most web applications this is not so important. A web request is usually served fast. It is more important to have an efficient framework, that manages resources properly and starts new threads to serve requests quickly. Also the timing behaviour is not the only critical aspect. Reliable and error-free applications are probably better achieved with dynamic languages.
And no, faster hardware isn't a solution. In fact Google is famous for using a cluster of inexpensive machines.
(such as >100 million user,>xx million request every second)
To achieve that sort of performance, you are going to HAVE to design and implement the web site / application as a scalable multi-tier system with replication across (probably) all tiers. At this point, the fact that one programming language is faster / slower than another probably only affects the number of machines you need in your processor farm. The design of the system architecture is far more significant.
there is no JIT compiler in php which Compile the code into machine code
Another big reason is PHP's dynamic typing. A dynamically typed language is always going to be slower..
click below and read more
What makes PHP slower than Java or C#?
C is easily the fastest language out there. Its so fast we write other languages in it. Nobody seriously writes web sites in C. Why? Its very easy to screw up in C in ways that are very difficult to detect and it does almost nothing to help you. In short, it eats programmers and generates bugs.
Building a robust, fast application is not about picking the fastest langauge, its about A) maintainability and B) scalability.
Maintainability means it doesn't have a lot of bugs. It means you can quickly add new features and modify existing ones. You want a language that does as much of the work as possible for you and doesn't get in the way. This is why things like Perl, Python, PHP and Ruby are so popular. They were all written with the programmer's convenience in mind over raw performance or tidiness. C was written for raw performance. Java was written for conceptual tidiness.
Scalability means you can go from 10 users to 10,000 users without rewriting the whole thing. That used to mean you wrote the tightest code you can manage, but highly optimized code is usually hard to maintain code. It usually means doing things for the benefit of the computer, not the human and the business. That sacrifices maintainability and you have to tell your boss its going to take 3 months to add a new feature.
Scalability these days is mostly achieved by throwing hardware at it and parallelizing. How many processes and processors and machines can you farm your work out to? If you can achieve that, you can just fire up another cheap cloud computer as you need it. Of course you're going to want to optimize some, but at this scale you get so much more out of implementing a better algorithm than tightening up your code.
For example, I took a sluggish PHP app that was struggling to handle 50 users at a time, switched from Apache with mod_php to lighttpd with load balanced, remote FastCGI processes allowing parallelization with a minimum of code change. Some basic profiling revealed that the PHP framework they used to prototype was dog slow, so it was stripped out. Profiling also suggested a few indexes to make the database queries run faster. End result was a system that could handle thousands of users and more capacity could be added as needed while leaving most of the code implementing the business logic untouched. Took a few weeks, and I don't really know PHP well.
It may be beneficial to reimplement small, sharp pieces in a very fast language, but usually that's already been done for you in the form of an optimized library or tool. For example, your web server. For the complexity and ever-changing needs of business logic the important thing is ease of maintenance and how good your programmers are.
You will find that most of the web is written in PHP, Perl and Python because they are easy to write in, with small, sharp bits written in things like C, Java and exotics like Scala (for example, Twitter). Wikia, for example, is a modified Mediawiki which is written in PHP but it is performant (amongst other reasons) by doing a heroic amount of caching.
Google is using Python for GAE and Windows Azure is providing PHP. The LAMP architecture is a great for application scalabilty.
I also think that the programming language is not that important regarding performance. The most important thing is to look at the architecture of your app.
I hope it helps
To serve a web page, you need to:
Receive and parse the request.
Decide what you wish to do with the request.
Read/write persistent data (database, cache, file system)
Output HTML data.
The "speed" of the server side language only applies to steps two and four. Given that most scripts strive to keep step 2 as short as possible, and that most web languages (including PHP) optimize step 4 as much as they can, in any serious web site most of the request processing time will be spent in step 3.
And the time spent on step 3 is independent of the server-side language you use ... unless you implement your own database and distributed cache.
For php, there are lot of things you can do to increase performance. For example
Php Accelerator
Caching Queries
Optimize Queries
Using a profiler to find slower parts and optimize
These things would certainly help reduce the gap between lower level languages. So to answer your question there are other things you can do inside the code to optimize it and make it run faster
I agree with luc. Its the architecture that really matter and not the programming language.

Resources