Server side language for cpu/memory intensive process - programming-languages

Whats a good server side language for doing some pretty cpu and memory intensive things that plays well with php and mysql. Currently, I have a php script which runs some calculations based on a large subset of a fairly large database and than updates that database based on those calculations (1.5millions rows). The current implementation is very slow, taking 1-2 hours depending on other activities on the server. I was hoping to improve this and was wondering what peoples opinions are on a good language for this type of task?

Where is the bottleneck? Run some real profiling, and see what exactly is causing the problem. Is it the DB I/O? Is it the cpu? Is the algorithm inefficient? Are you calling slow library methods in a tight inner loop? Could precalculation be used.
You're pretty much asking what vehicle you need to get from point A to point B, and you've offered a truck, car, bicycle, airplane, jet, and helicopter. The answer won't make sense without more context.

The language isn't the issue, your issue is probably where you are doing these calculations. Sounds like you may be better off writing this in SQL, if possible. Is it? What are you doing?

I suspect your bottleneck is not the computation. It definitely takes several hours to just update a few million records.
If that's the case, you can write a customized function in c/c++ for MySQL and execute the function in stored procedure.
We do this in our database to re-encrypt some sensitive fields during key-rotation. It shrunk key-rotation time from days to hours. However, it's a pain to maintain your own copy of MySQL. We have been looking for alternatives but nothing is close to the performance of this approach.

Related

Node server: code weight and server performance

I would like to know how important could be the impact between using a 15.5k library just for doing very simple validations, and, using my own 1k super-simple validation class, in the time when I'll have more than 10k users on my system (Node + Mongo running on a super pentium 8 core 32gb ram).
Is it worst to care about this 14.5k of code?
I cant find any clue in my so bleak but always wondering mind.
I'll apretiate very much your opinion.
A nice thing about server development is that you usually have significant RAM available and the code is generally loaded just once at the startup of the server so load time is not part of the user experience.
Because of these, you'd be hard pressed to even measure a meaningful impact between a 1k library and a 15k library. You might care about 15k of memory usage per active user, but you will not care about an extra 15k of code loaded into memory one time and it will not affect your server performance in any way.
So, I'd say you should not worry about skrimping on code size (within reason). Instead, pick the tool that best solves your problem, makes your development quickest and the most reliable. And, where possible, use what has been built and tested before rather than build your own from scratch. That will leave you more development time to spend on the things that really matter to making your website better, different or great. Or, it will get you to market faster.
For reference, 15k is 0.000045% of your total computer's RAM.
I agree with #jfriend00. There's almost no impact on memory/performance for the code sizes you describe. You can always benchmark different modules according to your usage profile and choose by yourself. However, I think you should ask yourself some other (similar) questions -
Why the package I use is so 'big'? maybe there's a much 'smaller' one
that does the same job with the same performance. When I say big or small here I mean in terms of functionality. Most of the times you'd want to go with minimum functionality, even if its size might seem big. If you use a validation module that also validates emails, but you don't need it - doesn't mean that you shouldn't use it, just know the tradeoffs - it might get updated more frequently because bugs in the email validation that might cause other bugs in integer validations that you use, you have more code to read if you want your production code to feel more safe (explained bellow).
Does the package function as I expect? (read the tests)
Is the package I use "secured"/"OK for production"? Read the code of the packages you use, make sure there isn't something fishy going on - usually node packages are not that big because most are minimal (I never used it, but I know https://requiresafe.com/ exists for these types of questions - you might want to check it out). Note that if they are larger in size that might mean you would have to read more code.
Ask these questions (and others of you feel you should) recursively on the package' dependencies.

Does different language = different performance in couchDB lists?

I am writing a list function in couchDB. I want to know if using a faster language than javascript would boost performance (i was thinking python, just because I know it).
Does anyone know if this is true, and has anyone tested whether it is true?
Generally the different view engines are going to give you the same speed.
Except erlang, which is much faster.
The reason for this is that erlang is what CouchDB is written in and for all other languages the data needs to get converted into standard JSON then sent to the view server, then converted back to the native erlang format for writing.
BUT, This performance "boost" only happens on view generation, which typically happens out -of-line of a request or only on the changed documents.
As in, real world usage performance difference between view servers is irrelevant most of the time.
Here is the list of all the view server implementations: http://wiki.apache.org/couchdb/View_server
I've never used the python ones, but if that is where you are comfortable, go for it.
You can use the V8 engine if you want for Couch. A guy from IrisCouch wrote couchjs to do this (I've seen him on Stack Overflow quite a bit too).
https://github.com/iriscouch/couchjs
Also for views, filtered replication, things like that, you can write the functions in Erlang instead of javascript. I've done that and seen around a 50% performance increase.
Seems you can write list functions in Erlang: http://tisba.de/2010/11/25/native-list-functions-with-couchdb/

LINQ vs PLINQ: When does overhead outweigh benefits?

I am working on projects basically ecommerce type. our architect has got instructions from client to use PLINQ as its much more beneficial than LINQ, as they works in parallel and uses all cores of the processors, resulting in quick responses. Client suggestion is PLINQ + Repository if possible.
So I just want to know, which one is good to follow in small and medium app. Is it feasible to use Plinq + Repository. As per my findings, I found Plinq has more overhead than linq if we are not handling the stuffs properly. Please help me.
It is impossible to answer this question without knowing far more details about your application. PLINQ has overhead to fan out the workload to worker threads and then coordinate the work amongst them. If you are processing hundreds of thousands of entities and have a meaningful amount of work to do for each one, then yes it can benefit. In the end, the only way to really know if PLINQ will benefit you is to profile using a realistic data set.
When a for loop has a small body, it might perform more slowly than the equivalent sequential loop. Slower performance is caused by the overhead involved in partitioning the data and the cost of invoking a delegate on each loop iteration. To address such scenarios, the Partitioner class provides the Partitioner.Create method, which enables you to provide a sequential loop for the delegate body, so that the delegate is invoked only once per partition, instead of once per iteration.
See here.
This applies to PLINQ.
See here for PLINQ.
LOL.
He who pays the piper calls the tune
Generally though, this is an engineering issue. Talk to the architects and clients and work out what what metrics they will be using to measure the performance of the deliverables.
Then using these metrics find the optimal solution Linq, PLinq or other and then report back your findings.
In the main all technologies are good for something and the size of the app is measured in different ways. So your term 'small' is meaningless.

Advice on starting a large multi-threaded programming project

My company currently runs a third-party simulation program (natural catastrophe risk modeling) that sucks up gigabytes of data off a disk and then crunches for several days to produce results. I will soon be asked to rewrite this as a multi-threaded app so that it runs in hours instead of days. I expect to have about 6 months to complete the conversion and will be working solo.
We have a 24-proc box to run this. I will have access to the source of the original program (written in C++ I think), but at this point I know very little about how it's designed.
I need advice on how to tackle this. I'm an experienced programmer (~ 30 years, currently working in C# 3.5) but have no multi-processor/multi-threaded experience. I'm willing and eager to learn a new language if appropriate. I'm looking for recommendations on languages, learning resources, books, architectural guidelines. etc.
Requirements: Windows OS. A commercial grade compiler with lots of support and good learning resources available. There is no need for a fancy GUI - it will probably run from a config file and put results into a SQL Server database.
Edit: The current app is C++ but I will almost certainly not be using that language for the re-write. I removed the C++ tag that someone added.
Numerical process simulations are typically run over a single discretised problem grid (for example, the surface of the Earth or clouds of gas and dust), which usually rules out simple task farming or concurrency approaches. This is because a grid divided over a set of processors representing an area of physical space is not a set of independent tasks. The grid cells at the edge of each subgrid need to be updated based on the values of grid cells stored on other processors, which are adjacent in logical space.
In high-performance computing, simulations are typically parallelised using either MPI or OpenMP. MPI is a message passing library with bindings for many languages, including C, C++, Fortran, Python, and C#. OpenMP is an API for shared-memory multiprocessing. In general, MPI is more difficult to code than OpenMP, and is much more invasive, but is also much more flexible. OpenMP requires a memory area shared between processors, so is not suited to many architectures. Hybrid schemes are also possible.
This type of programming has its own special challenges. As well as race conditions, deadlocks, livelocks, and all the other joys of concurrent programming, you need to consider the topology of your processor grid - how you choose to split your logical grid across your physical processors. This is important because your parallel speedup is a function of the amount of communication between your processors, which itself is a function of the total edge length of your decomposed grid. As you add more processors, this surface area increases, increasing the amount of communication overhead. Increasing the granularity will eventually become prohibitive.
The other important consideration is the proportion of the code which can be parallelised. Amdahl's law then dictates the maximum theoretically attainable speedup. You should be able to estimate this before you start writing any code.
Both of these facts will conspire to limit the maximum number of processors you can run on. The sweet spot may be considerably lower than you think.
I recommend the book High Performance Computing, if you can get hold of it. In particular, the chapter on performance benchmarking and tuning is priceless.
An excellent online overview of parallel computing, which covers the major issues, is this introduction from Lawerence Livermore National Laboratory.
Your biggest problem in a multithreaded project is that too much state is visible across threads - it is too easy to write code that reads / mutates data in an unsafe manner, especially in a multiprocessor environment where issues such as cache coherency, weakly consistent memory etc might come into play.
Debugging race conditions is distinctly unpleasant.
Approach your design as you would if, say, you were considering distributing your work across multiple machines on a network: that is, identify what tasks can happen in parallel, what the inputs to each task are, what the outputs of each task are, and what tasks must complete before a given task can begin. The point of the exercise is to ensure that each place where data becomes visible to another thread, and each place where a new thread is spawned, are carefully considered.
Once such an initial design is complete, there will be a clear division of ownership of data, and clear points at which ownership is taken / transferred; and so you will be in a very good position to take advantage of the possibilities that multithreading offers you - cheaply shared data, cheap synchronisation, lockless shared data structures - safely.
If you can split the workload up into non-dependent chunks of work (i.e., the data set can be processed in bits, there aren't lots of data dependencies), then I'd use a thread pool / task mechanism. Presumably whatever C# has as an equivalent to Java's java.util.concurrent. I'd create work units from the data, and wrap them in a task, and then throw the tasks at the thread pool.
Of course performance might be a necessity here. If you can keep the original processing code kernel as-is, then you can call it from within your C# application.
If the code has lots of data dependencies, it may be a lot harder to break up into threaded tasks, but you might be able to break it up into a pipeline of actions. This means thread 1 passes data to thread 2, which passes data to threads 3 through 8, which pass data onto thread 9, etc.
If the code has a lot of floating point mathematics, it might be worth looking at rewriting in OpenCL or CUDA, and running it on GPUs instead of CPUs.
For a 6 month project I'd say it definitely pays out to start reading a good book about the subject first. I would suggest Joe Duffy's Concurrent Programming on Windows. It's the most thorough book I know about the subject and it covers both .NET and native Win32 threading. I've written multithreaded programs for 10 years when I discovered this gem and still found things I didn't know in almost every chapter.
Also, "natural catastrophe risk modeling" sounds like a lot of math. Maybe you should have a look at Intel's IPP library: it provides primitives for many common low-level math and signal processing algorithms. It supports multi threading out of the box, which may make your task significantly easier.
There are a lot of techniques that can be used to deal with multithreading if you design the project for it.
The most general and universal is simply "avoid shared state". Whenever possible, copy resources between threads, rather than making them access the same shared copy.
If you're writing the low-level synchronization code yourself, you have to remember to make absolutely no assumptions. Both the compiler and CPU may reorder your code, creating race conditions or deadlocks where none would seem possible when reading the code. The only way to prevent this is with memory barriers. And remember that even the simplest operation may be subject to threading issues. Something as simple as ++i is typically not atomic, and if multiple threads access i, you'll get unpredictable results.
And of course, just because you've assigned a value to a variable, that's no guarantee that the new value will be visible to other threads. The compiler may defer actually writing it out to memory. Again, a memory barrier forces it to "flush" all pending memory I/O.
If I were you, I'd go with a higher level synchronization model than simple locks/mutexes/monitors/critical sections if possible. There are a few CSP libraries available for most languages and platforms, including .NET languages and native C++.
This usually makes race conditions and deadlocks trivial to detect and fix, and allows a ridiculous level of scalability. But there's a certain amount of overhead associated with this paradigm as well, so each thread might get less work done than it would with other techniques. It also requires the entire application to be structured specifically for this paradigm (so it's tricky to retrofit onto existing code, but since you're starting from scratch, it's less of an issue -- but it'll still be unfamiliar to you)
Another approach might be Transactional Memory. This is easier to fit into a traditional program structure, but also has some limitations, and I don't know of many production-quality libraries for it (STM.NET was recently released, and may be worth checking out. Intel has a C++ compiler with STM extensions built into the language as well)
But whichever approach you use, you'll have to think carefully about how to split the work up into independent tasks, and how to avoid cross-talk between threads. Any time two threads access the same variable, you have a potential bug. And any time two threads access the same variable or just another variable near the same address (for example, the next or previous element in an array), data will have to be exchanged between cores, forcing it to be flushed from CPU cache to memory, and then read into the other core's cache. Which can be a major performance hit.
Oh, and if you do write the application in C++, don't underestimate the language. You'll have to learn the language in detail before you'll be able to write robust code, much less robust threaded code.
One thing we've done in this situation that has worked really well for us is to break the work to be done into individual chunks and the actions on each chunk into different processors. Then we have chains of processors and data chunks can work through the chains independently. Each set of processors within the chain can run on multiple threads each and can process more or less data depending on their own performance relative to the other processors in the chain.
Also breaking up both the data and actions into smaller pieces makes the app much more maintainable and testable.
There's plenty of specific bits of individual advice that could be given here, and several people have done so already.
However nobody can tell you exactly how to make this all work for your specific requirements (which you don't even fully know yourself yet), so I'd strongly recommend you read up on HPC (High Performance Computing) for now to get the over-arching concepts clear and have a better idea which direction suits your needs the most.
The model you choose to use will be dictated by the structure of your data. Is your data tightly coupled or loosely coupled? If your simulation data is tightly coupled then you'll want to look at OpenMP or MPI (parallel computing). If your data is loosely coupled then a job pool is probably a better fit... possibly even a distributed computing approach could work.
My advice is get and read an introductory text to get familiar with the various models of concurrency/parallelism. Then look at your application's needs and decide which architecture you're going to need to use. After you know which architecture you need, then you can look at tools to assist you.
A fairly highly rated book which works as an introduction to the topic is "The Art of Concurrency: A Thread Monkey's Guide to Writing Parallel Application".
Read about Erlang and the "Actor Model" in particular. If you make all your data immutable, you will have a much easier time parallelizing it.
Most of the other answers offer good advice regarding partitioning the project - look for tasks that can be cleanly executed in parallel with very little data sharing required. Be aware of non-thread safe constructs such as static or global variables, or libraries that are not thread safe. The worst one we've encountered is the TNT library, which doesn't even allow thread-safe reads under some circumstances.
As with all optimisation, concentrate on the bottlenecks first, because threading adds a lot of complexity you want to avoid it where it isn't necessary.
You'll need a good grasp of the various threading primitives (mutexes, semaphores, critical sections, conditions, etc.) and the situations in which they are useful.
One thing I would add, if you're intending to stay with C++, is that we have had a lot of success using the boost.thread library. It supplies most of the required multi-threading primitives, although does lack a thread pool (and I would be wary of the unofficial "boost" thread pool one can locate via google, because it suffers from a number of deadlock issues).
I would consider doing this in .NET 4.0 since it has a lot of new support specifically targeted at making writing concurrent code easier. Its official release date is March 22, 2010, but it will probably RTM before then and you can start with the reasonably stable Beta 2 now.
You can either use C# that you're more familiar with or you can use managed C++.
At a high level, try to break up the program into System.Threading.Tasks.Task's which are individual units of work. In addition, I'd minimize use of shared state and consider using Parallel.For (or ForEach) and/or PLINQ where possible.
If you do this, a lot of the heavy lifting will be done for you in a very efficient way. It's the direction that Microsoft is going to increasingly support.
2: I would consider doing this in .NET 4.0 since it has a lot of new support specifically targeted at making writing concurrent code easier. Its official release date is March 22, 2010, but it will probably RTM before then and you can start with the reasonably stable Beta 2 now. At a high level, try to break up the program into System.Threading.Tasks.Task's which are individual units of work. In addition, I'd minimize use of shared state and consider using Parallel.For and/or PLINQ where possible. If you do this, a lot of the heavy lifting will be done for you in a very efficient way. 1: http://msdn.microsoft.com/en-us/library/dd321424%28VS.100%29.aspx
Sorry i just want to add a pessimistic or better realistic answer here.
You are under time pressure. 6 month deadline and you don't even know for sure what language is this system and what it does and how it is organized. If it is not a trivial calculation then it is a very bad start.
Most importantly: You say you have never done mulitithreading programming before. This is where i get 4 alarm clocks ringing at once. Multithreading is difficult and takes a long time to learn it when you want to do it right - and you need to do it right when you want to win a huge speed increase. Debugging is extremely nasty even with good tools like Total Views debugger or Intels VTune.
Then you say you want to rewrite the app in another lanugage - well this isn't as bad as you have to rewrite it anyway. THe chance to turn a single threaded Program into a well working multithreaded one without total redesign is almost zero.
But learning multithreading and a new language (what is your C++ skills?) with a timeline of 3 month (you have to write a throw away prototype - so i cut the timespan into two halfs) is extremely challenging.
My advise here is simple and will not like it: Learn multithreadings now - because it is a required skill set in the future - but leave this job to someone who already has experience. Well unless you don't care about the program being successfull and are just looking for 6 month payment.
If it's possible to have all the threads working on disjoint sets of process data, and have other information stored in the SQL database, you can quite easily do it in C++, and just spawn off new threads to work on their own parts using the Windows API. The SQL server will handle all the hard synchronization magic with its DB transactions! And of course C++ will perform a lot faster than C#.
You should definitely revise C++ for this task, and understand the C++ code, and look for efficiency bugs in the existing code as well as adding the multi-threaded functionality.
You've tagged this question as C++ but mentioned that you're a C# developer currently, so I'm not sure if you'll be tackling this assignment from C++ or C#. Anyway, in case you're going to be using C# or .NET (including C++/CLI): I have the following MSDN article bookmarked and would highly recommend reading through it as part of your prep work.
Calling Synchronous Methods Asynchronously
Whatever technology your going to write this, take a look a this must read book on concurrency "Concurrent programming in Java" and for .Net I highly recommend the retlang library for concurrent app.
I don't know if it was mentioned yet, but if I were in your shoes, what I would be doing right now (aside from reading every answer posted here) is writing a multiple threaded example application in your favorite (most used) language.
I don't have extensive multithreaded experience. I've played around with it in the past for fun but I think gaining some experience with a throw-away application will suit your future efforts.
I wish you luck in this endeavor and I must admit I wish I had the opportunity to work on something like this...

How to keep track of performance testing

I'm currently doing performance and load testing of a complex many-tier system investigating the effect of different changes, but I'm having problems keeping track of everything:
There are many copies of different assemblies
Orignally released assemblies
Officially released hotfixes
Assemblies that I've built containing further additional fixes
Assemblies that I've build containing additional diagnostic logging or tracing
There are many database patches, some of the above assemblies depend on certain database patches being applied
Many different logging levels exist, in different tiers (Application logging, Application performance statistics, SQL server profiling)
There are many different scenarios, sometimes it is useful to test only 1 scenario, other times I need to test combinations of different scenarios.
Load may be split across multiple machines or only a single machine
The data present in the database can change, for example some tests might be done with generated data, and then later with data taken from a live system.
There is a massive amount of potential performance data to be collected after each test, for example:
Many different types of application specific logging
SQL Profiler traces
Event logs
DMVs
Perfmon counters
The database(s) are several Gb in size so where I would have used backups to revert to a previous state I tend to apply changes to whatever database is present after the last test, causing me to quickly loose track of things.
I collect as much information as I can about each test I do (the scenario tested, which patches are applied what data is in the database), but I still find myself having to repeat tests because of inconsistent results. For example I just did a test which I believed to be an exact duplicate of a test I ran a few months ago, however with updated data in the database. I know for a fact that the new data should cause a performance degregation, however the results show the opposite!
At the same time I find myself sepdning disproportionate amounts of time recording these all these details.
One thing I considered was using scripting to automate the collection of performance data etc..., but I wasnt sure this was such a good idea - not only is it time spent developing scripts instead of testing, but bugs in my scripts could cause me to loose track of things even quicker.
I'm after some advice / hints on how better to manage the test environment, in particular how to strike a balance between collecting everything and actually getting some testing done at the risk of missing something important?
Scripting the collection of the test parameters + environment is a very good idea to check out. If you're testing across several days, and the scripting takes a day, it's time well spent. If after a day you see it won't finish soon, reevaluate and possibly stop pursuing this direction.
But you owe it to yourself to try it.
I would tend to agree with #orip, scripting at least part of your workload is likely to save you time. You might consider taking a moment to ask what tasks are the most time consuming in terms of your labor and how amenable are they to automation? Scripts are especially good at collecting and summarizing data - much better then people, typically. If the performance data requires a lot of interpretation on your part, you may have problems.
An advantage to scripting some of these tasks is that you can then check them in along side the source / patches / branches and you may find you benefit from organizational structure of your systems complexity rather than struggling to chase it as you do now.
If you can get away with testing only against a few set configurations that will keep the admin simple. It may also make it easier to put one on each of several virtual machines which can be quickly redeployed to give clean baselines.
If you genuinely need the complexity you describe I'd recommend building a simple database to allow you to query the multivariate results you have. Having a column for each of the important factors will a allow you to query in for questions like "what testing config had the lowest variance in latency?" and "which test database allowed the raising of most bugs?". I use sqlite3 (probably through the Python wrapper or the Firefox plug-in) for this kind of lightweight collection, because it keeps maintenance overhead relatively low and allows you to avoid perturbing the system under test too far, even if you need to run on the same box.
Scripting the tests will make them quicker to execute and permit results to be gathered in an already-ordered way, but it sounds like your system may be too complex to make this easy to do.

Resources