I am looking for an efficient algorithm that lets me process the minimax search tree for chess with alpha-beta pruning on a distributed architecture. The algorithms I have found (PVS, YBWC, DTS) are all quite old (1990 being the latest). I assume there have been many substantial advancements since then. What is the current standard in this field?
Also please point me to an idiot's explanation of DTS as I can't understand it from the research papers that I have read.
Related
I am looking for good material on how to insert and search in a B+ tree. I already googled, but the stuff I found were really poorly explained. Is there any other good source online? Any suggestions for data structure books that might be useful? Any video lectures of any university? UCBerkley and MIT did not have any.
I would try walking through the BTree slides found at the following page. I found them very helpful when learning the material. The slides are from the January 25th lecture.
http://www.cs.washington.edu/education/courses/cse332/12wi/calendar/lecturelist.html
For B Tree - Introduction to Algorithms (CLRS) or Algorithms (Sedgewick & Wayne)
For B+ Tree - We have introductions to B tree in the above books just stating the difference in approaches. But for better insight, we might have to go for DBMS books instead like - Database Management Systems (Ramakrishnan Raghu, Gehrke Johannes) or Fundamentals of database systems (Navathe)
Most algorithms and complexities are similar for both but with slight differences. So, understanding one can aid understanding other.
I have been looking at MapReduce and reading through various papers about it and applications of it, but, to me, it seems that MapReduce is only suitable for a very narrow class of scenarios that ultimately result in word-counting.
If you look at the original paper Google's employees provide "various" potential use cases, like "distributed grep", "distributed sort", "reverse web-link graph", "term-vector per host", etc.
But if you look closer, all those problems boil down to simply "counting words" - that is counting the number occurrences of something in a chunk of data, then aggregating/filtering and sorting that list of occurrences.
There also are some cases where MapReduce has been used for genetic algorithms or relational databases, but they don't use the "vanilla" MapReduce published by Google. Instead they introduce further steps along the Map-Reduce chain, like Map-Reduce-Merge, etc.
Do you know of any other (documented?) scenarios where "vanilla" MapReduce has been used to perform more than mere word-counting? (Maybe for ray-tracing, video-transcoding, cryptography, etc. - in short anything "computation heavy" that is parallelizable)
Atbrox had been maintaining mapreduce hadoop algorithms in academic papers. Here is the link. All of these could be applied for practical purpose.
MapReduce is good for problems that can be considered to be embarrassingly parallel. There are a lot of problems that MapReduce is very bad at, such as those that require lots of all-to-all communication between nodes. E.g., fast Fourier transforms and signal correlation.
There are projects using MapReduce for parallel computations in statistics. For instance, Revolutions Analytics has started a RHadoop project for use with R. Hadoop is also used in computational biology and in other fields with large datasets that can be analyzed by many discrete jobs.
I am the author of one of the packages in RHadoop and I wrote the several examples distributed with the source and used in the Tutorial, logistic regression, linear least squares, matrix multiplication etc. There is also a paper I would like to recommend http://www.mendeley.com/research/sorting-searching-simulation-mapreduce-framework/
that seems to strongly support the equivalence of mapreduce with classic parallel programming models such as PRAM and BSP. I often write mapreduce algorithms as ports from PRAM algorithms, see for instance blog.piccolboni.info/2011/04/map-reduce-algorithm-for-connected.html. So I think the scope of mapreduce is clearly more than "embarrassingly parallel" but not infinite. I have myself experienced some limitations for instance in speeding up some MCMC simulations. Of course it could have been me not using the right approach. My rule of thumb is the following: if the problem can be solved in parallel in O(log(N)) time on O(N) processors, then it is a good candidate for mapreduce, with O(log(N)) jobs and constant time spent in each job. Other people and the paper I mentioned seem to focus more on the O(1) jobs case. When you go beyond O(log(N)) time the case for MR seems to get a little weaker, but some limitations may be inherent in the current implementation (high job overhead) rather the fundamental. It's a pretty fascinating time to be working on charting the MR territory.
I'm looking for a historical overview of computer graphics developments, a timeline of such things as
bump mapping
bloom
stencil buffer shadows
volumetric fog
subsurface scattering
radiosity
etc, the more inclusive the better
according to when they were invented and when they became practical for real-time mainstream use.
Hopefully this research would include an analysis of how much the end result has been improved by the invention of new techniques versus better algorithms for old techniques versus simply applying them more extensively given improving hardware.
Got any links for this sort of thing? Thanks.
Have you checked the Wikipedia article on Rendering? It has a list o influential articles on the subject, and their years of publication.
I want to extend an existing clustering algorithm to cope with very large data sets and have redesigned it in such a way that it is now computable with partitions of data, which opens the door to parallel processing. I have been looking at Hadoop and Pig and I figured that a good practical place to start was to compute basic stats on my data, i.e. arithmetic mean and variance.
I've been googling for a while, but maybe I'm not using the right keywords and I haven't really found anything which is a good primer for doing this sort of calculation, so I thought I would ask here.
Can anyone point me to some good samples of how to calculate mean and variance using hadoop, and/or provide some sample code.
Thanks
Pig latin has an associated library of reusable code called PiggyBank that has numerous handy functions. Unfortunately it didn't have variance last time I checked, but maybe that has changed. If nothing else, it might provide examples to get you started on your own implementation.
I should note that variance is difficult to implement in a stable way over huge data sets, so take care!
You might double check and see if your clustering code can drop into Cascading. Its quite trivial to add new functions, do joins, etc with your existing java libraries.
http://www.cascading.org/
And if you are into Clojure, you might watch these github projects:
http://github.com/clj-sys
They are layering new algorithms implemented in Clojure over Cascading (which in turn is layered over Hadoop MapReduce).
Does anyone known of a a good reference for canonical CS problems?
I'm thinking of things like "the sorting problem", "the bin packing problem", "the travailing salesman problem" and what not.
edit: websites preferred
You can probably find the best in an algorithms textbook like Introduction to Algorithms. Though I've never read that particular book, it's quite renowned for being thorough and would probably contain most of the problems you're likely to encounter.
"Computers and Intractability: A guide to the theory of NP-Completeness" by Garey and Johnson is a great reference for this sort of thing, although the "solved" problems (in P) are obviously not given much attention in the book.
I'm not aware of any good on-line resources, but Karp's seminal paper Reducibility among Combinatorial Problems (1972) on reductions and complexity is probably the "canonical" reference for Hard Problems.
Have you looked at Wikipedia's Category:Computational problems and Category:NP Complete Problems pages? It's probably not complete, but they look like good starting points. Wikipedia seems to do pretty well in CS topics.
I don't think you'll find the answers to all those problems in only one book. I've never seen any decent, comprehensive website on algorithms, so I'd recommend you to stick to the books. That said, you can always get some introductory material on canonical algorithm texts (there are always three I usually recommend: CLRS, Manber, Aho, Hopcroft and Ullman (this one is a bit out of date in some key topics, but it's so formal and well-written that it's a must-read). All of them contain important combinatorial problems that are, in some sense, canonical problems in computer science. After learning some fundamentals in graph theory you'll be able to move to Network Flows and Linear Programming. These comprise a set of techniques that will ultimately solve most problems you'll encounter (linear programming with the variables restricted to integer values is NP-hard). Network flows deals with problems defined on graphs (with weighted/capacitated edges) with very interesting applications in fields that seemingly have no relationship to graph theory whatsoever. THE textbook on this is Ahuja, Magnanti and Orlin's. Linear programming is some kind of superset of network flows, and deals with optimizing a linear function on variables subject to restrictions in the form of a linear system of equations. A book that emphasizes the relationship to network flows is Bazaraa's. Then you can move on to integer programming, a very valuable tool that presents many natural techniques for modelling problems like bin packing, task scheduling, the knapsack problem, and so on. A good reference would be L. Wolsey's book.
You definitely want to look at NIST's Dictionary of Algorithms and Data Structures. It's got the traveling salesman problem, the Byzantine generals problem, the dining philosophers' problem, the knapsack problem (= your "bin packing problem", I think), the cutting stock problem, the eight queens problem, the knight's tour problem, the busy beaver problem, the halting problem, etc. etc.
It doesn't have the firing squad synchronization problem (I'm surprised about that omission) or the Jeep problem (more logistics than computer science).
Interestingly enough there's a blog on codinghorror.com which talks about some of these in puzzle form. (I can't remember whether I've read Smullyan's book cited in the blog, but he is a good compiler of puzzles & philosophical musings. Martin Gardner and Douglas Hofstadter and H.E. Dudeney are others.)
Also maybe check out the Stony Brook Algorithm Repository.
(Or look up "combinatorial problems" on google, or search for "problem" in Wolfram Mathworld or look at Hilbert's problems, but in all these links many of them are more pure-mathematics than computer science.)
#rcreswick those sound like good references but fall a bit shy of what I'm thinking of. (However, for all I know, it's the best there is)
I'm going to not mark anything as accepted in hopes people might find a better reference.
Meanwhile, I'm going to list a few problems here, fell free to add more
The sorting problem Find an order for a set that is monotonic in a given way
The bin packing problem partition a set into a minimum number of sets where each subset is "smaller" than some limit
The travailing salesman problem Find a Hamiltonian cycle in a weighted graph with the minimum total weight