IO performance in windows and linux

IO performance in windows and linux - linux

We want to build a web service to return some images (like google map tiles).
And the source data is organized as the esri compact cache format,the key of our service is to read the tiles from the bundles.
I am not sure how to choose the platform,windows or linux?
It is said that the linux have a bettor IO reading/writing performance than that of windows.
However java is our only choose if we choose linux,so I want to know if there is any points we should know to impove the IO reading performnce in linux?
PS:
In winodws platform,we will build the service based on .net4 using c#,and deploy the service use iis.
In linux,we will build the service using java (maybe based on spring mvc or some other mvc framework),and deploy the service using tomcat.
Update:
We may have the following source compact files in different folds:
L1
RxxCxx.bundle
RxxCxx.bundlx
L2
RxxCxx.bundle
RxxCxx.bundlx
And the request from the client may looks like this:
http://ourserver/maptile?row=123&col=234&level=1.png
For this requst,we will go in to the fold L1 since the level is 1,then read the RxxCxx.bundlx file first,since this file is the metadata that till tell us the position(the offset and length in RxxCxx.bundle) of the data for render the image(row=123&col=234),then we will read the RxxCxx.bundle according to the offset and length. Then we render the data to an image by write them to the response and set the content type to "image/png" or something else.
This is a whole procceed to handle a request.
Then I wonder if there is any documents or exist demos which can show me how to handle these type of IO reading?

The only situation where you have to have Windows servers in your environment is when you choose MS SQL Server DBMS (it is almost a Sybase but is a way cheaper), in which case have Windows box for the DB and *nix server for middle tier.
There are many situations where Windows can be used. Beginning with the declaration "have to have Windows" reveals an existing bias and is then followed by many groundless statements. But at least you clearly recognized this as the case.
Java is the best technology for millisecond grade middleware, mainly for the amount of mature standartized open source technologies available. Everything from coding (Eclipse, NetBeans, Idea) to manual (ant, maven) and automatic (teamcity, hudson/jenkins) builds, testing, static code analysis is there, is standartized, is open source, and is backed up by a multimillion size community.
I feel it necessary to say Visual Studio/C# (because OP mentioned as an alternative) offers everything you mentioned above with the exception of being open source. That said, the .NET Framework (or .NET Core) is now open source. Get information here. Based on your above comment, I think I can conclude that the only viable solutions are available through the open source community.
Quote I once heard that has a lot of truth: "It's only free if your time is worthless."
Also, counting the entirety of the open source community is a bogus argument. You'd have to take one development tool/API and compare the community support with another. For example, compare the community size/quality for Visual Studio with that of Eclipse. Or that of the .NET Framework vs. Java.
By the way, I've experienced no better intellisense implementation than with Visual Studio/Windows. When Eclipse does work you have rely on the quality of the open source libraries you reference to have anything meaningful. I've found the .NET Framework requires fewer 3rd party libraries than Java to accomplish the same goal.
Linux is the best server side platform for performance, stability, ease of maitenance, quality of the development environment - an extremely powerful command line based IDE. You can expect multimonth uptime from a Linux server, but not from Windows.
We have many Windows servers running services processing "big data" that have a system up-time since 5/30/2014 (nearly a year) and several more running without interruption since 2013. The only times we experience up-time problems is when hardware is aged/failing or the application-layer software we wrote contains bugs.
Tomcat/Servlet (or Jetty/Servlet) is a classic industrial combination in many financial institutions where stability is the #1 priority.
IIS is also used: job posting for IIS developer at financial institution
And lastly, the IO performance concern: a high quality user space non-blocking IO code will be CPU and hardware bandwidth bound, so OS will not be determining factor. Though fancy things like interrupts affinity, threads pinning, informed realtime tuning, kernel bypass I believe are easier to do on Linux.
Most of these variables are defined by each OS. It sounds like you have a lot of experience with threads, but also I would posit the developer can optimize at the application layer just as easily in both environments. Changing thread priority, implementing a custom thread pool, configuring BIOS, etc. are all available in the Windows world as well. Unless you want to customize the kernel which Unix/Linux allows, but then you have to support your own custom build of Unix/Linux.
I don't think commercial software should be vilified or avoided in favor of open source as a rule.

I understand this may sound as a groundless statement, but use *nix unless you have to use Windows. The only situation where you have to have Windows servers in your environment is when you choose MS SQL Server DBMS (it is almost a Sybase but is a way cheaper), in which case have Windows box for the DB and *nix server for middle tier.
Java is the best technology for millisecond grade middleware, mainly for the amount of mature standartized open source technologies available. Everything from coding (Eclipse, NetBeans, Idea) to manual (ant, maven) and automatic (teamcity, hudson/jenkins) builds, testing, static code analysis is there, is standartized, is open source, and is backed up by a multimillion size community.
Linux is the best server side platform for performance, stability, ease of maitenance, quality of the development environment - an extremely powerful command line based IDE. You can expect multimonth uptime from a Linux server, but not from Windows.
Tomcat/Servlet (or Jetty/Servlet) is a classic industrial combination in many financial institutions where stability is the #1 priority.
And lastly, the IO performance concern: a high quality user space non-blocking IO code will be CPU and hardware bandwidth bound, so OS will not be determining factor. Though fancy things like interrupts affinity, threads pinning, informed realtime tuning, kernel bypass I believe are easier to do on Linux.

Related

WPP tracing for linux

I'm looking for a way to output traces to a log file in my code, which runs on linux.
I don't want to include the printing information in the binary, in every place I deploy it.
It windows, I simply used WPP to trace without putting the actual traces strings in my binary.
How can this by achieved in Linux?

I'm not very familiar with Linux tools in this area, so maybe there is a better system. However, since nobody else has made any good suggestions, I'll make a suggestion. (Probably not a very good suggestion, but the best I can think of right now.)
In theory, you could continue to use wpp. Wpp is simply a template system. It scans the configuration and input files to create data structures. Then it runs a template, fills in the data values it got from the scan, producing the tmh files. You could create a new set of templates that would use Linux apis instead of Windows apis, and would record the message strings in a way that works with some other log decoder system.

I noticed this question only now and would like to add my two cents to the story just for a case. Personally, I truly appreciate Windows WPP Tracing and consider it probably the best engineering solution for practical development troubleshooting among similar tools.
It happened I extended WPP use to Unix-like platforms twice. We wanted to use strong sides of WPP concept in general and yet use it in a multi-platform pieces of code. This was not a porting but rather a wrapper to specific WPP use we configured on Windows. One time we had a web service to perform actual WPP pre-processing on Windows; it may sound a bit insane but it worked fine and effective within the local network. A wrapper script that was executed before each compilation sent a web request, got a processed file and post-processed the generated include file to make it suitable for Unix-like platforms. The second time we implemented a simplified WPP pre-processor of our own (we found yet additional use for it - we could generate the tracing statements differently for production and unit testing, for example). This was a harsh solution: you anyway need to use some physical tracing framework behind the wrapper on non-Windows platform (well, the first time we apparently implemented our own lower level).
I do not think the Linux world has a framework comparable to WPP. Once I even thought it could be a great idea to make an open source porting project for WPP. I am not sure it would be much requested though. I said it is a great engineering solution. But who wants to do dirty engineering work? Open source community prefer abstract object-oriented and generic solutions, streaming and less necessity in corresponding tools (WPP requires special management tools and OS support).Ease of code writing is the today's choice.
There could be Microsoft fault (or unwillingness) in the lack of WPP popularity too. They kept it as an internal framework that came out just by a case with Windows DDK because they have to offer some logging/tracing solution for driver developers. Nobody even noticed much that WPP is well suitable for the user-space code too. And WPP pre-processor for C#, for example, has never been exposed to public at all.
Nevertheless, I still think that WPP porting to Unix/Linux work can be a challenging, interesting and maybe even useful attempt. If someone decides to lead it. :)

How do I choose between using a web installer versus a standalone installer

For example, the .Net Framework 4.0 is available in either format. Is there any scenario where internet access is not a consideration (always on, high bandwidth) but the standalone installar option is the better choice?
Also, when utilizing a web installer, are there any specific advantages with respect to:
1) long-term disk space usage?
2) the ability to cleanly uninstall/repair the software?

There's many ways to package and distribute that are all optimized for differnet scenarios. What works well for an online distribution would not work well for an offline distribution or even in-the-middle scenarios.
For example, consider .NET. They have a Web and a Full.
The full is pretty much going to be best for Media based distributions and Enterprise customers where they want to put the framework on a network share. The Web is going to work best for a single user (Home) that wants the shortest possible download.
To understand shortest understand that the .NET 3.5 SP1 installer is actually a bootstrapper with many packages to account for 2.0, 3.0, 3.5, SP's, Hotfixes, 32bit components, 64bit components ( x64 and Itanium ).
For a home user with .NET 3.0 x86 OS there might be very little to download. For an enterprise user you can get it all from the network share without needing to repeatedly download bits from MSFT over and over. For a media customer their might not be any internet connection at all.
This is all seperate from caching concerns. An installer can choose to cache or not cache installation files regardless of whether it came from media, network or internet download on demand.
Other installs might not be as layered as .NET and have very little distinction between Web and Full. ( i.e. always all )

If you want to download it quickly, choose web installers, but beware of some, as these may be flagged by antiviruses. For extra security, choose standalone installers.

Stress test local web application

How can i test my local RIA?
I need to do a stress test, graph response time and memory usage when user increases.
Do you know any software?

RIA tool support is often dictated by the development platform. For instance if you have GWT and need Javascript support in the tool then you will be pushed to one subset of tools, Silverlight to another, etc...
Looks to your development team, System Requirements Document and Architecture documentation for information on the developmnent toolkits used by your rick internet application. Once you have good insight there, into both which toolkit and what version then take a look at the commercial and open source tools out there to see which ones support your interface. There are few things more frustrating than driving a nail with the butt end of a screwdriver, but if your tool and your interface are a poor match you could wind up doing just that.
All of the commercial vendors are offering short term licenses at this point that you should be able to tie directly back to the project budget. Something to keep in mind on the open source front is that the level of effort on the labor front tends to be higher overall because of the efficiencies built into the commercial tools on the development, monitoring integration and analysis fronts.

If you want an open source solution, I can think on Apache JMeter. There are others like Rational Performance Tester or Mercury LoadRunner but those are not free. You might want to verify if there's a trial version out there.

memory safety and security - sandboxing arbitrary program?

In some languages (Java, C# without unsafe code, ...) it is (should be) impossible to corrupt memory - no manual memory management, etc. This allows them to restrict resources (access to files, access to net, maximum memory usage, ...) to applications quite easily - e.g. Java applets (Java web start). It's sometimes called sandboxing.
My question is: is it possible with native programs (e.g. written in memory-unsafe language like C, C++; but without having source code)? I don't mean simple bypass-able sandbox, or anti-virus software.
I think about two possibilities:
run application as different OS user, set restrictions for this user. Disadvantage - many users, for every combination of parameters, access rights?
(somehow) limit (OS API) functions, that can be called
I don't know if any of possibilities allow (at least in theory) in full protection, without possibility of bypass.
Edit: I'm interested more in theory - I don't care that e.g. some OS has some undocumented functions, or how to sandbox any application on given OS. For example, I want to sandbox application and allow only two functions: get char from console, put char to console. How is it possible to do it unbreakably, no possibility of bypassing?
Answers mentioned:
Google Native Client, uses subset of x86 - in development, together with (possible?) PNaCl - portable native client
full VM - obviously overkill, imagine tens of programs...
In other words, could native (unsafe memory access) code be used within restricted environment, e.g. in web browser, with 100% (at least in theory) security?
Edit2: Google Native Client is exactly what I would like - any language, safe or unsafe, run at native speed, sandbox, even in web browser. Everybody use whatever language you want, in web or on desktop.

You might want to read about Google's Native Client which runs x86 code (and ARM code I believe now) in a sandbox.

You pretty much described AppArmor in your original question. There are quite a few good videos explaining it which I highly recommend watching.

Possible? Yes. Difficult? Also yes. OS-dependent? Very yes.
Most modern OSes support various levels of process isolation that can be used to acheive what you want. The simplest approach is to simply attach a debugger and break on all system calls; then filter these calls in the debugger. This, however, is a large performance hit, and is difficult to make safe in the presence of multiple threads. It is also difficult to implement safely on OSes where the low-level syscall interface is not documented - such as Mac OS or Windows.
The Chrome browser folks have done a lot of work in this field. They've posted design docs for Windows, Linux (in particular the SUID sandbox), and Mac OS X. Their approach is effective but not totally foolproof - there may still be some minor information leaks between the outer OS and the guest application. In addition, some of the OSes require specific modifications to the guest program to be able to communicate out of the sandbox.
If some modification to the hosted application is acceptable, Google's native client is worth a look. This restricts the compiler's code generation choices in such a way that the loader can prove that it doesn't do anything nasty. This obviously doesn't work on arbitrary executables, but it will get you the performance benefits of native code.
Finally, you can always simply run the program in question, plus an entire OS to itself, in an emulator. This approach is basically foolproof, but adds significant overhead.

Yes this is possible IF the hardware provides mechanisms to restrict memory accesses. Desktop processors usually are equipped with an MMU and access levels, so the OS can employ these to deny access to any memory address a thread should not have access to.
Virtual memory is implemented by the very same means: any access to memory currently swapped out to disk is trapped, the memory fetched from disk and then the thread is continued. Virtualization takes it a little farther, because it also traps accesses to hardware registers.
All the OS really needs to do is properly use those features and it will be impossible for any code to break out of the sandbox. Of course this much easier said than practically applied. Mostly because the OS takes liberties if favor of performance, oversights in what certain OS calls can be used to do and last but not least bugs in the implementation.

How Do I Find Out Which Apps Are Using the Most Resources On Web Server (Win2003)

I have a lot of ASP.NET apps / websites on my server (Some custom built and some open source). I was wondering if there was a free (Or built into Win2003) way of finding out which applications use the most resources throughout the day?
Sort of a breakdown of CPU(s) & Memory... As I have a suspicion one of the open source apps is hogging the CPU from time to time?

I think Windows Perfmon is what you are looking for. Usually gives you everything you need in terms of process, overall usage etc etc. Covers just about everything, network, disk, memory and makes some nice graphs too. You can also export the results of the logging to CSV so you can build even nicer charts in Excel if you want.
Oh I forgot, I am not sure how to do this anymore, it was a long time ago, but you can also have your application publish counters to Perfmon. I did that in Java but I am sure that .Net will provided a way to do the same.

Windows 2003 includes a tool called Performance Monitor (perfmon.msc) which can keep track of various resources. It's a fairly powerful and complex tool, if you Google for "windows 2003" perfmon counters you can find a lot of information to get started and how to use it effectively.

Run your websites in different application pools, then you can isolate easily!
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/67e39bd8-317e-4cf6-b675-6431d4425248.mspx?mfr=true tells how application pools work in IIS6

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string