How "fast" are lists? - couchdb

While developing my application I noticed, that I put more and more complexity in list. Like "joining" related docs. Or manipulating the output based on query parameters. As we know, there is a lot of stuff that can be put in lists. Stuff that could also be handled by the middleware (if you not developing a couchapp).
Just to be sure the question: How far can/should one go with lists?

You should only go to about 12 on the Jason Scale ;)
Very hard to quantify the answer. JS within Couch is as fast as JS outside of Couch, which is slower than native code, faster than some other interpreters, and slower than some other interpreters. The short answer is that if you like writing code in lists, and it works in with your development environment, then relax, don't stop until/unless it becomes a problem.

The problem with lists is that they are executed on each request. It may not be a problem for you, but I prefer to avoid using lists, and design the documents and the application to not need lists. That said, nothing stops you from putting some caching mechanism in front of your couch to reduce server load.

Related

Clojure: Create and manage multiple threads

I wrote a program which needs to process a very large dataset and I'm planning to run it with multiple threads in a high-end machine.
I'm a beginner in Clojure and i'm lost in the myriad of tools at disposal -
agents, futures, core.async (and Quartzite?). I would like to know which one is most suited for this job.
The following describes my situation:
I have a function which transforms some data and store it in database.
The argument to the said function is popped from a Redis set.
Run the function in several separate threads as long as there is a value in the Redis set.
For simplicity, futures can't be beat. They create a new thread, and return a value from it. However, often you need more fine-grained control than they provide.
The core.async library has nice support for parallelism (via pipeline, see below), and it also provides automatic back-pressure. You have to have a way to control the flow of data such that no one's starving for work, or burdened by too much of it. core.async channels must be bounded, and this helps with this problem. Also, it's a pretty logical model of your problem: taking a value from a source, transforming it (maybe using a transducer?) with some given parallelism, and then putting the result to your database.
You can also go the manual route of using Java's excellent j.u.concurrent library. There are low level primitives as well as thread management tools for thread pools. All of this is accessible within clojure.
From a design standpoint, it comes down to whether you are more CPU-bound or I/O-bound. This affects decisions such as whether or not you will perform parallel reads from redis and writes to your database. If you are CPU-bound and thus your bottleneck is the computation, then it wouldn't make much sense to parallelize your reads from redis, or your writes to your database, would it? These are the types of things to consider.
You really have two problems to solve: (1) your familiarity with clojure's/java's concurrency mechanisms, and (2) your approach to this problem (i.e., how would you approach this problem, irrespective of the language you're using?). Once you solve #2, you will have a much better idea of which tools to use that I mentioned above, and how to use them.
Sounds like you may have a
good
embarrassingly parallel problem
to solve. In that case, you could start simply by coding up your
processing into a top-level function that processes the first datum.
Once that's working, wrap it in
a map to handle all of the
data sequentially (serially, one-at-a-time).
You might want to start tackling the bigger problem with just a few
items from your data set. That will make your testing smoother and
faster.
After you have the map working, it's time to just add a p
(parallel) to your code to make it
a pmap. This is a very
rewarding way to heat up your
machine.
Here is
a discussion about the number of threads pmap uses.
The above is the simplest approach. If you need finer control over
the concurrency, the
this concurrency screencast explores
the use cases.
It is hard to be precise w/o knowing the details of your problem. There are several choices as you mention:
Plain Java threads & threadpools. If your problem is similar to a pre-existing Java solution, this may be the most straightforward.
Simple Clojure threading with future et al. Kicking off a thread with future and getting the result in a promise is very easy.
Replace map with pmap (parallel map). This can help in simple cases that are primarily map/reduce oriented.
The Claypoole library: Lots of tools to make multithreading simpler and easier. Please see their GitHub project and the Clojure/West talk.

Quantify performance gain when using Java instead of SSJS

When developing XPages applications it seems to have become very popular to mainly use Java methods and beans instead of server-side JavaScript (SSJS). SSJS of course takes longer to execute because the code has to be evaluated at runtime. However, can anyone provide information about the QUANTITATIVE gain in performance when using Java? Are there any benchmarks for how much the execution times differ, for example depending on the length of the SSJS code or the functions used?
You have to use your own benchmarks. The increase in time might not be measurable. It is more around capabilities and your development process. Switching from SSJS to Java an expecting an instant increase in performance most likely won't happen.
Unless of course Java allows you to code things differently. So most of the decisions are based on capabilities, not speed. You are most welcome to run some tests and share the insights. What you can expect e.g. opening a document in SSJS vs. Java: the difference should be in the space of a rounding error, since most of the time is needed for the C call below.
SSJS and Java run at almost the same speed after the SSJS has been evaluated, so you have some onramp time and similar speed thereafter.
I agree about the performance gain being negligible. I will chime in to say this. Right now I am trying to learn to support an existing XPages application written without using any java, and entirely in SSJS. There is code here, there, and everywhere. It is very hard to follow.
Depending on your environment, you should consider programmer productivity when considering how to build your applications, especially when you know both. Productivity for you, and those coming after you.
Stephan's answer is right on point: though Java as a language IS faster (you'd probably see performance gains proportional to the complexity of the block of code more than the number of operations running), the primary benefit is program structure. My experience has been that using Java extensively makes my code much cleaner, easier to debug, and MUCH easier to understand after coming back to it months later.
One of the nice side effects of this structural change does happen to be performance, but not because of anything inherent to Java: by focusing on classes and getters/setters, it makes it easier to really pay attention to expensive operations and caching. While you CAN cache your data excellently in SSJS using the various scopes, it's easier for your brain - both now and after you've forgotten what you did next year - to think about that sort of thing in Java.
Personally, even if Java executed more slowly than SSJS but the programming models in XPages were the same as they are now, I would still use Java primarily.
You are asking about the pure processing performance - the speed of the computer running the code. And as Stephen stated Java is going to be a "little" faster because it doesn't need to do the extra step of the string parsing the code first. Ok in the big picture that's really not a big deal.
I think the real "performance" gain that you get by moving to Java in XPages is cleaner code with more capabilities. Yes you're putting a lot of code in SSJS Libraries. And that can work really well. But I assume those are more individual functions that you use over and over rather then true objects that you can put in memory and they're they're when you need them. When you get your core business logic inside Java Objects in my experience the speed of development goes significantly faster. It's not even close.
Take the Domino document object. That's a rather handy object. Imagine if it wasn't an "object" but simply a library of 50 or so functions that you need to first paste into each database. Doesn't seem right. And of course in the Domino API it's not just the domino object. There's like 60 or so different objects!
Typical XPages with Java development moves much - not all - but much of the code away from the .xsp page and into Java Classes which are very similar to custom classes on LotusScript. The not only creates separation between frontend code - making the .xsp pages easier to work with - but puts the business logic inside Java which is similar to working to the the Domino backend objects. So then the backend gets easier to work with, maintain and add onto.
And that's where a big part of the development speed improvements come from.
Getting back to your original question, which is about computer speed. I would suggest that it's much easier to cache frequently used data via Java Objects and managed beans then it is with SSJS. Not having to hit the disc as much would be a real speed advantage.
I would recommend you to consider performance gain in a wider context.
performance gain in quicker running?
performance gain in typing?
performance gain in not making mistakes because of the editor?
performance gain of using templating in the Java editor?
performance gain in better reusability, eventually to server-wide plugins?
performance gain in being comfortable building your own classes to hold complex objects?
performance gain in easier debugging?
performance gain in being comfortable with Validators, Converters, Phase Listeners, VariableResolvers etc?
performance gain in being comfortable looking at Extension Libraries to investigate or extend?
performance gain of being able to find answers more easily on StackOverflow or Google because you're using a standard language vs a proprietary language?
performance gain in using third party Java code like Apache Commons, Apache POI etc?
To be honest, when you have got that far and understand how much code is run during a page load or partial request, performance gain in runtime of Java vs SSJS is minimal compared to something like using loaded where possible instead of rendered. The gains of Java over SSJS are much wider, and I have not even mentioned the gains in professional development.
My answer is way too long for a stackOverflow answer, so as promised, here is a link to my blog post about this issue. Basically it has nothing to do with performance, but with Maintainability, Readability, Usability

Does different language = different performance in couchDB lists?

I am writing a list function in couchDB. I want to know if using a faster language than javascript would boost performance (i was thinking python, just because I know it).
Does anyone know if this is true, and has anyone tested whether it is true?
Generally the different view engines are going to give you the same speed.
Except erlang, which is much faster.
The reason for this is that erlang is what CouchDB is written in and for all other languages the data needs to get converted into standard JSON then sent to the view server, then converted back to the native erlang format for writing.
BUT, This performance "boost" only happens on view generation, which typically happens out -of-line of a request or only on the changed documents.
As in, real world usage performance difference between view servers is irrelevant most of the time.
Here is the list of all the view server implementations: http://wiki.apache.org/couchdb/View_server
I've never used the python ones, but if that is where you are comfortable, go for it.
You can use the V8 engine if you want for Couch. A guy from IrisCouch wrote couchjs to do this (I've seen him on Stack Overflow quite a bit too).
https://github.com/iriscouch/couchjs
Also for views, filtered replication, things like that, you can write the functions in Erlang instead of javascript. I've done that and seen around a 50% performance increase.
Seems you can write list functions in Erlang: http://tisba.de/2010/11/25/native-list-functions-with-couchdb/

Convert MFC Doc/View to?

My question will be hard to form, but to start:
I have an MFC SDI app that I have worked on for an embarrassingly long time, that never seemed to fit the Doc/View architecture. I.e. there isn't anything useful in the Doc. It is multi-threaded and I need to do more with threading, etc.
I dream about also porting it to Linux X Windows, but I know nothing about that programming environment as yet. Maybe Mac also.
My question is where to go from here?
I think I would like to convert from MFC Doc/View to straight Win API stuff with message loops and window procedures, etc. But the task seems to be huge.
Does the Linux X Windows environment use a similar kind of message loop, window procedure architecture?
Can I go part way? Like convert a little at a time without rendering my program unusable for long periods of work?
Added later:
My program is a file compare program (sounds simple enough.) So, stating my confusion in a simple way, normally a document can have multiple views, but in this app, I have one view with multiple (two) documents (files). I have a "compare engine" that I first wrote back in the DOS days, that is the heart of the program and the view is just looking at the output of that routine. Sometimes I think that some of my "view" code could make sense in a "document" class but I hardly know where to begin to separate it into more classes. I have recently started reading "Programming Windows" 5th Ed. by Charles Petzold, (I know that is quite out of date (C) 1998) hoping to get a better understanding of direct Windows programming.
I get overwhelmed with the proliferation of options like C#, NET, MFC, MVC, Qt, wxWidgets, etc.
I find I am often stuck trying to understand something going on in the MFC framework because something in my code doesn't work as it seems it should, but the problem is that I don't really understand how MFC is handling things in the background. That is why I am trying to learn "straight Windows programming" where my program has all the message passing code that I write. I hope this helps give enough insight into my question so someone can guide me on my way.
X works enough differently that a raw Windows program and a raw X program probably wouldn't be able to share much UI code at all.
If you want portability between the two, chances are pretty good that you want to use something like Qt or wxWidgets. Of the two, wxWidgets is more similar to MFC, so it would probably require less rewriting, but would maintain (more or less) the same "disconnect" you're seeing between what you want and what it provides.
Without knowing more about your application, and why it doesn't fit well with MFC, it's impossible to guess whether Qt would be a better fit or not. An immediate guess would be "probably not".
MFC uses a "document/view" architecture, where Qt uses the original Model-View-Controller architecture. For the most part, MFC's Document class is equivalent basically a Model and a Controller rolled into one -- so if your Document contains nothing useful, in Qt you'd apparently have both a Model and a Controller, neither of which did much that was useful.
That said, I have to raise a question about why your Document currently doesn't do much. The MVC pattern has proven applicable to a wide variety of problems, so while it's possible it can't work well for your problem, it's also possible that it could work well, and you're simply not using it. Without knowing more about what you're doing, it's impossible to even guess at that though.
Edit: Okay, the clarification helps quite a bit. The first thing to realize is that a Document does not necessarily equate to a file. Quite the contrary, a document can perfectly reasonably relate to an arbitrary number of files.
Just for example, consider a web browser. All the data needed to compose the page its currently displaying would reasonably be part of the same document. Depending on your viewpoint, that's either zero files, or a whole bunch of them (it will start as an arbitrary number of files coming from the server(s), but won't necessarily be stored as files locally at all). Storing any of it as a file locally will be a (more or less) accidental by-product of caching, and mostly unrelated to browsing per se.
In your case, you're presumably reading the two (or three?) files into memory and storing them along with some sort of data structure to hold the result of the comparison. After the comparison is complete, you might or might not discard the contents of the files themselves. I think it's safe to say that the "normal" separation of responsibilities would be for that data and the code that produces that data to be in the Document.
The View should contain only the code to take that result from that data structure, and display it on screen. Nearly the only data you normally want to store in the View would be things related to how the data is presented (e.g., things like a zoom level or current scroll position). Likewise, the code in the view should relate only to displaying the result and reacting to user input, NOT to "creating" the data in the first place.
As such, I think your program could be rewritten to use the Document/View pattern more effectively, or could be rewritten to use MVC. That, in turn, means a port to Qt could/would probably work just fine -- provided you're willing to put some time and effort into understanding how it's intended to work and then make what may be fairly substantial changes to your code to work the way it's designed to.
As I commented previously, wxWidgets is more like MFC in this respect -- it uses a Document and View, not a Model, View, and Controller. It's also going to work best if you do some rewriting to separate responsibilities the way it's designed for. The good point is that it's probably a bit easier to do that one step at a time: rewrite the code in MFC, which which you're already familiar, and then port it to wxWidgets -- but given the similarity between the two, that "Port" will probably be little more than minor editing -- often just changing some names from C* to wx* is just about enough. To my recollection, the only place I've run into much work was in creating menus -- with MFC they're normally handled via resources, but (at least a few years ago when I used it) wxWidgets normally directly exposed the code that created the menu entries.
Porting to Qt would probably be more work -- you pretty much have to learn a new framework, and substantially reorganize your code at the same time. The good point is that when you're done, the result will probably be somewhat cleaner, though given what you're doing, the difference may be pretty minor. In a Document/View, the View displays data, and reacts to user input. In a Model/View/Controller, the View only displays data, but user input (that modifies the underlying data) goes through the Controller. Since you (presumably) don't expect to modify the underlying data, the only user input involved probably belongs in the view in any case (e.g., things like scrolling). It's barely possible you might have a few things you could put in the Document/Model that would be open to change (e.g., things like the current font or colors the user has selected).

Writing easily modified code

What are some ways in which I can write code that is easily modified?
The one I have learned from experience is that I almost always need to write one to throw away. That way I have developed a sense of the domain knowledge and program structure required before coding the actual application.
The general guidelines are offcourse
High cohesion, low coupling
Dont repeat yourself
Recognize design patterns and implement them
Dont recognize design patterns where they are not existing or necassary
Use a coding standard, stick to it
Comment everyting that should be commented, when in doubt : comment
Use unit tests
Write comments and tests before implementation, that way you know exactly what you want to do
And when it goes wrong : refactor, refactor, refactor. With good tests you can be sure nothing breaks
And oh yeah:
read this : http://www.pragprog.com/the-pragmatic-programmer
Everything (i think) above and more is in it
I think your emphasis on modifiability is more important than readability. It is not hard to make something easy to read, but the real test of how well it is understood comes when someone else (or you) has to modify it in repsonse to changing requirements.
What I try to do is assume that modifications will be necessary, and if it is not really clear how to do them, leave explicit directions in the code for how to do them.
I assume that I may have to do some educating of the reader of the code to get him or her to know how to modify the code properly. This requires energy on my part, and it requires energy on the part of the person reading the code.
So while I admire the idea of literate programming, that can be easily read and understood, sometimes it is more like math, where the only way to do it is for the reader to buckle down, pay close attention, re-read it a few times, and make sure they understand.
Readability helps a lot: If you do something non-obvious, or you are taking a shortcut, comment. Comments are places where you can go back and refactor if you have time later. Use sensible names for everything, makes it easier to understand what is going on.
Continuous revision will let you move from that first draft to a better one without throwing away (too much) work. Any time you rewrite from scratch you may lose lessons learned. As you code, use refactoring tools to eliminate code representing areas of exploration that are no longer needed, and to make obvious things that were obscure. The first one reduces the amount that you need to maintain; the second reduces the effort per square foot. (Sqft makes about as much sense as lines of code, really.)
Modularize appropriately and enforce encapsulation and separation of logic between your modules. You don't want too many dependencies on any one part of the code or that part becomes inherently harder to understand.
Considering using tried and true methods over cutting edge ones. You give up some functionality for predictability.
Finally, if this is code that people will be using before and after modification, you need(ed) to have an appropriate API insulating your code from theirs. Having a strong API lets you change things behind the scenes without needing to alert all your consumers. I think there's a decent article on Coding Horror about this.
Hang Your Code Out to D.R.Y.
I learned this early when assigned the task of changing the appearance of a web-interface. The code was in C, which I hated, and was compiled to a CGI executable. And, worse, it was built on a library that was abandoned—no updates, no support, and too many man-hours put into its use to change it. On top of the framework was a disorderly web of code, consisting of various form and element builders, custom string implementations, and various other arcane things (for a non-C programmer to commit suicide with).
For each change I made there were several, sometimes many, exceptions to the output HTML. Each one of these exceptions required a small change or improvement in the form builder, thanks to the language there's no inheritance and therefore only functions and structs, and instead of putting the hours in the team instead wrote these exceptions frequently.
In my inexperience I was forced to change the output of each exception, rather than consolidate the changes in an improved form builder. But, trawling through 15,000 lines of code for several hours after ineffective changes would induce code-burn, and a fogginess that took a night's sleep to cure.
Always run your code through the DRY-er.
The easiest way to modify a code is NOT to write code. Write pseudo code not just for algo but how your code should be structured if you are unsure.
Designing while writing code never works...for me :-)
Here is my current experience: I'm working (Java) with a kind of database schema that might often change (fields added/removed, data types modified). My strategy is to parse this schema and to generate the code with apache velocity. The BaseClass generated is never modified by the programmer. Else, a MyClass extends BaseClass is created and the logical components of this class (e.g. toString() ! )are implemented using the 'getters' and the 'setters' of the super class.

Resources