In this article about the stack overflow website : StackOverflow Update: 560M Pageviews a Month, 25 Servers, and It's All About Performance, you can read this :
Garbage collection driven programming. SO goes to great lengths to
reduce garbage collection costs, skipping practices like TDD, avoiding
layers of abstraction, and using static methods. While extreme, the
result is highly performing code.
I can see why avoiding layers of abstraction and static methods would prevent garbage-collecting, but I don't get why TDD would be harmful for it.
TDD is about automatic testing, and in most cases code that can be automatically tested requires some layers of abstractions and interfaces.
Related
Currently I am working with ExpressJs and NodeJs. My question is, If I have a lot of dynamically registered URLs in server (using app.get("/xyz", page.xyz)), what are the issues associated with it? Will it affect performance or memory usages of server?
Regards,
Harikrishnan
Disadvantages:
you're probably repeating a lot of code
manageability of your methods
improper design patterns
larger code base affects performance
easier to performance measure
more manageable in a team setting
easier/more manageable to add features
Advantages:
explicit
potentially less functions utilized (i.e. per call)
more small data passed on the wire
easier to debug
easier to read by you
There shouldn't be an outright need for many endpoints, however, that's kind of your decision. It boils down to whether or not the app works, meeting deadlines, and of course you can performance test to see whether or not areas in your app could improve with design patterns or different data structures. This is a tough question to answer, however, I'd suggest you look at the process as a learning opportunity with the expectation to improve on your following iteration or next app. Good luck!
I'm looking for a good slideshow/pdf/video explaining the differences in approach and thinking from hand-written threading of applications compared to the more abstracted and easier to use message passing and actor models. Does anyone know of existing resources to explain these concepts with good diagrams and visualizations?
It is slightly difficult to make direct comparisons without long, painful digressions, or a largely theoretical discussion. However, the following can be easily read, and I believe that the comparisons will form naturally for anyone familiar with the threading model.
Google's language GO uses message passing among co-routines as a core part of its concurrency model. There is a lot more information on GO at golang.org, and the following URL provides information about their concurrency model in GO.
http://golang.org/doc/effective_go.html#concurrency
This is a paper written by Edward Lee (Berkeley EECS department chair) called The Problem with Threads. It is a pitch for the actor model, and is a good read. Also note that there are other papers by Edward that deal with the problem with threads (visit his homepage for more papers).
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf
In the future, will managed runtimes provide additional protections against subtle data corruption issues?
Managed runtimes such as Java and the .NET CLR reduce or eliminate the possibility of many memory corruption bugs common in native languages like C#. Nonetheless, they are surprisingly not immune from all memory corruption problems. One intuitively expects that a method that validates its input, has no bugs, and robustly handles exceptions will always transform its object from one valid state to another, but this is not the case. (It is more accurate to say that it is not the case using prevailing programming conventions--object implementors need to go out of their way to avoid the problems I describe.)
Consider the following scenarios:
Threading. The caller might share the object with other threads and make concurrent calls on it. If the object does not implement locking, the fields might be corrupted. (Perhaps--unless notified that the object is thread-safe--runtimes should use an interlock on every method call to throw an exception if any method on the same object executing concurrently on another thread. This would be a protection feature and, just like other well-accepted safety features of managed runtimes, it has some cost.)
Re-entrancy. The method makes a callout to an arbitrary function (such as an event handler) that ultimately calls methods on the object that are not designed to be called at that point. This is even trickier than thread safety and many class libraries do not get this right. (Worse yet, class libraries are known to poorly document what re-entrancy is allowed.)
For all of these cases, it can be argued that thorough documentation is a solution. However, documentation also can prescribe how to allocate and deallocate memory in unmanaged languages. We know from experience (e.g., with memory allocation) that the difference between documentation and language/runtime enforcement is night and day.
What can we expect from languages and runtimes in the future to protect us from these problems and other subtle problems like them?
I think languages and runtimes will keep moving forward, keep abstracting away issues from the developer, and keep making our lives easier and more productive.
Take your example - threading. There are some great new features on the horizon in the .NET world to simplify the threading model we use daily. STM.NET may eventually make shared state much, much safer to handle, for example. The parallel extensions in .NET 4 make life very easy for threading compared to current technologies.
I think that transactional memory is promising for addressing some of these issues. I'm not sure if this answers your question in some way but this is an interesting topic in any event:
http://en.wikipedia.org/wiki/Software_transactional_memory
There was an episode of Software Engineering Radio on the topic a year or so ago maybe.
First of all, "managed" is a bit of a misnomer: languages like OCaml, Haskell, and SML achieve such protections and safety while being fully compiled. All relevant "management" occurs at compile time through static analysis, which aids optimization and speed.
Anyway, to answer your question: if you look at languages like Erlang and Haskell, state is isolated and immutable by default. With kind of system, threading and reentrancy is safe by default, and because you have to go out of your way to break these rules, it is obvious to see where unsafe code can arise.
By starting with safe defaults but leaving room for advanced unsafe usage, you get the best of both worlds. It seems reasonable that future systems that are safe by your definition may follow some of these practices as well.
What can we expect in the future?
Nothing. Thread-state and re-entrancy are not problems I see tools/runtimes solving. Instead I think in the future people will move to styles that avoid programming with mutable state to bypass these issues. Languages and libraries can help make these styles of programming more attractive, but the tools are not the solution - changing the way we write code is the solution.
Are there any paradigm that give you a different mindset or have a different take to writing multi thread applications? Perhaps something that feels vastly different like procedural programming to function programming.
Concurrency has many different models for different problems. The Wikipedia page for concurrency lists a few models and there's also a page for concurrency patterns which has some good starting point for different kinds of ways to approach concurrency.
The approach you take is very dependent on the problem at hand. Different models solve various different issues that can arise in concurrent applications, and some build on others.
In class I was taught that concurrency uses mutual exclusion and synchronization together to solve concurrency issues. Some solutions only require one, but with both you should be able to solve any concurrency issue.
For a vastly different concept you could look at immutability and concurrency. If all data is immutable then the conventional approaches to concurrency aren't even required. This article explores that topic.
I don't really understand the question, but if you start doing some coding using CUDA give you some different way of thinking about multi-threading applications.
It differs from general multi-threading technics, like Semaphores, Monitors, etc. because you have thousands of threads concurrently. So the problem of parallelism in CUDA resides more in partitioning your data and mixing the chunks of data later.
Just a small example of a complete rethinking of a common serial problem is the SCAN algorithm. It is as simple as:
Given a SET {a,b,c,d,e}
I want the following set:
{a, a+b, a+b+c, a+b+c+d, a+b+c+d+e}
Where the symbol '+' in this case is any Commutattive operator (not only plus, you can do multiplication also).
How to do this in parallel? It's a complete rethink of the problem, it is described in this paper.
Many more implementations of different algorithms in CUDA can be found in the NVIDIA website
Well, a very conservative paradigm shift is from thread-centric concurrency (share everything) towards process-centric concurrency (address-space separation). This way one can avoid unintended data sharing and it's easier to enforce a communication policy between different sub-systems.
This idea is old and was propagated (among others) by the Micro-Kernel OS community to build more reliable operating systems. Interestingly, the Singularity OS prototype by Microsoft Research shows that traditional address spaces are not even required when working with this model.
The relatively new idea I like best is transactional memory: avoid concurrency issues by making sure updates are always atomic.
Have a looksee at OpenMP for an interesting variation.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
We have a codebase that is several years old, and all the original developers are long gone. It uses many, many threads, but with no apparent design or common architectural principles. Every developer had his own style of multithreaded programming, so some threads communicate with one another using queues, some lock data with mutexes, some lock with semaphores, some use operating-system IPC mechanisms for intra-process communications. There is no design documentation, and comments are sparse. It's a mess, and it seems that whenever we try to refactor the code or add new functionality, we introduce deadlocks or other problems.
So, does anyone know of any tools or techniques that would help to analyze and document all the interactions between threads? FWIW, the codebase is C++ on Linux, but I'd be interested to hear about tools for other environments.
Update
I appreciate the responses received so far, but I was hoping for something more sophisticated or systematic than advice that is essentially "add log messages, figure out what's going on, and fix it." There are lots of tools out there for analyzing and documenting control-flow in single-threaded programs; is there nothing available for multi-threaded programs?
See also Debugging multithreaded applications
Invest in a copy of Intel's VTune and its thread profiling tools. It will give you both a system and a source level view of the thread behaviour. It's certainly not going to autodocument the thing for you, but should be a real help in at least visualising what is happening in different circumstances.
I think there is a trial version that you can download, so may be worth giving that a go. I've only used the Windows version, but looking at the VTune webpage it also has a Linux version.
As a starting point, I'd be tempted to add tracing log messages at strategic points within your application. This will allow you to analyse how your threads are interacting with no danger that the act of observing the threads will change their behaviour (as could be the case with step-by-step debugging).
My experience is with the .NET platform and my favoured logging tool would be log4net since it's free, has extensive configuration options and, if you're sensible in how you implement your logging, it won't noticeably hinder your application's performance. Alternatively, there is .NET's built in Debug (or Trace) class in the System.Diagnostics namespace.
I'd focus on the shared memory locks first (the mutexes and semaphores) as they are most likely to cause issues. Look at which state is being protected by locks and then determine which state is under the protection of several locks. This will give you a sense of potential conflicts. Look at situations where code that holds a lock calls out to methods (don't forget virtual methods). Try to eliminate these calls where possible (by reducing the time the lock is held).
Given the list of mutexes that are held and a rough idea of the state that they protect, assign a locking order (i.e., mutex A should always be taken before mutex B). Try to enforce this in the code.
See if you can combine several locks into one if concurrency won't be adversely affected. For example, if mutex A and B seem like they might have deadlocks and an ordering scheme is not easily done, combine them to one lock initially.
It's not going to be easy but I'm for simplifying the code at the expense of concurrency to get a handle of the problem.
This a really hard problem for automated tools. You might want to look into model checking your code. Don't expect magical results: model checkers are very limited in the amount of code and the number of threads they can effectively check.
A tool that might work for you is CHESS (although it is unfortunately Windows-only). BLAST is another fairly powerful tool, but is very difficult to use and may not handle C++. Wikipedia also lists StEAM, which I haven't heard of before, but sounds like it might work for you:
StEAM is a model checker for C++. It detects deadlocks, segmentation faults, out of range variables and non-terminating loops.
Alternatively, it would probably help a lot to try to converge the code towards a small number of well-defined (and, preferably, high-level) synchronization schemes. Mixing locks, semaphores, and monitors in the same code base is asking for trouble.
One thing to keep in mind with using log4net or similar tool is that they change the timing of the application and can often hide the underlying race conditions. We had some poorly written code to debug and introduced logging and this actually removed race conditions and deadlocks (or greatly reduced their frequency).
In Java, you have choices like FindBugs (for static bytecode analysis) to find certain kinds of inconsistent synchronization, or the many dynamic thread analyzers from companies like Coverity, JProbe, OptimizeIt, etc.
Can't UML help you here ?
If you reverse-engineer your codebase into UML, then you should be able to draw class diagrams that shows the relationships between your classes. Starting from the classes whose methods are the thread entry points, you could see which thread uses which class. Based on my experience with Rational Rose, this could be achieved using drag-and-drop ; if no relationship between the added class and the previous ones, then the added class is not directly used by the thread that started with the method you began the diagram with. This should gives you hints towards the role of each threads.
This will also show the "data objects" that are shared and the objects that are thread-specific.
If you draw a big class diagram and remove all the "data objects", then you should be able to layout that diagram as clouds, each clouds being a thread - or a group of threads, unless the coupling and cohesion of the code base is awful.
This will only gives you one portion of the puzzle, but it could be helpful ; I just hope your codebase is not too muddy or too "procedural", in which case ...