lucene 4.6 concurrent flushing - search

I have read http://blog.trifork.com/2011/04/01/gimme-all-resources-you-have-i-can-use-them/ which mentions concurrent flushing, however when I tried looking into api of version 4.5.1 and 4.6.1 there are no such function and I cannot find any sample code either. The class DocumentsWriterPerThread is not in 4.5.1-4.6.1
Can anyone please provide some info on this issue? It would be great if a sample code provided as well to get me start up.
thanks

DocumentsWriterPerThread certainly is out there in Lucene 4.5, though to the best of my knowledge, it's not really something most users would be expected to monkey with.
As far as how to use Concurrent Flushing, you already are. The change went out with Lucene 4.0, see LUCENE-3023.
If you are not seeing this speed improvement is some way (not sure what problem you are observing), as stated by Michael McCandless, in his article on the topic:
Remember this change only helps you if you have concurrent hardware, you use enough threads for indexing and there's no other bottleneck (for example, in the content source that provides the documents)

Related

Pyperplan with action costs support or alternative wanted

I'm using pyperplan in my project, however I'm very limited in choice of planning domains, as pyperplan does not support PDDL v2
Do you know of any pyperplan fork, that has this functionality? Basic +1 action costs should be enough, they are my only problem right now.
Eventually, do you know of any pyperplan-ish alternative that would support more modern versions of PDDL?
I'm planning to implement the functionality on my own, I ran through the code, and it shouldn't be THAT hard, it looks pretty set up for that.
I went through all forks listed on github, but none of them has this feature. I've also tried looking up such clone myself, or looking up any related articles, but nothing of value showed up.
I will be really grateful for any tips!

Managing version conflicts between Kafka-Spark-Scala-Cassandra?

I have been recently working on a project that involves the integration of Kafka, Spark and Cassandra. One of the key things I noticed when trying to get the whole thing setup is that there are a lot of version conflicts that needs to be very carefully matched in order to get these technologies to work together.
In addition, it was important to take note of the Scala version used with Spark when writing your own Spark-Jobs.
A slight change in the version of one of the above technologies breaks the complete flow and requires a proper redo of matching them together.
The task was not very straight forward (at least for me and I guess it's the same for all) and I am wondering how do companies which have these technologies working in sync actually manage this?
As I see it, it is an important problem with new releases and bug fixes being rolled out, to keep these tools working together without a break.
Can someone who has experience with regards to this enlighten me as to how companies actually manage/maintain these conflicts?
Or is it an overstatement to say it's an actual problem?
Thanks in advance

If you had one wish for SubSonic what would it be?

I know this question seems subjective but it's really pretty simple. As a long term user, and part time contributor to SubSonic I'm interested in what the community thinks would be the single best way to improve it.
So what's your opinion, how would you make SubSonic even better? What one thing would make you more likely to use/recommend/evangelise/stop complaining about it?
As I said I know this is a bit subjective and may get closed but as SO is the main support forum for SubSonic I think this could be a useful way to solicit opinion and/or contributions.
To keep this from turning into a general discussion here's the rules:
No omnibus wishes
No duplicate wishes
Up-vote those you agree with rather than re-posting them
Ability to run in MediumTrust out of the box
In all honesty the biggest thing thats lacking is solid documentation and HowTo's
Its got better but I think it needs a lot more.
Ability to automatically map collections of other objects, like Fluent NHibernate does.
When SubSonic throws an exception that isn't clear, I'd like to be able to use Google or some other mechanism to discover more information about how to keep my development effort moving forward. Right now it's too easy to get into a situation where you have to go spelunking into the SubSonic source code since SubSonic doesn't seem to be very proactive when the user goes off the "happy path".
This critique is hardly specific to SubSonic. Many (most?) software products suffer from this same problem. I have not really had this problem with NHibernate though, which is SubSonic's most clear competitor.
Faster and higher quality releases
Binary types for SimpleRepository (Images)
Left Outer Joins
Support more database-independent code generation...
What I mean by this is that it is truly a real pain if your application wants to talk to different databases (e.g. SQL Server and Oracle) and you want to only have one set of generated DAL objects. I would love it if you had the option of specifying that any SQL code that gets sent to the DB would be as compatible with most engines as possible, since right now if you generated your objects targeting SQL Server then all queries will be of the form:
SELECT [schema].[table_name] FROM ....
Sadly, this does not work in Oracle, so basically you're out of luck there.
Perhaps this isn't a huge concern for most of you, but I'm currently writing a commercial app that touts one of its main features as being able to run on various database engines just by changing its configuration and I chose SubSonic because I thought it could handle the job pretty easily, but I'm honestly having second thoughts now because of all the hoops I may have to jump through just to get this to work correctly under different environments.
Support MS Access ,Postgres and FireBird database :)....

Fast, scalable hash lookup database? (Berkeley'ish)

I use and love Berkeley but it seems to bog down once you get near a million or so entries, especially on the inserts. I've tried memcachedb which works but it's not being maintained so I'm worried of using it in production. Does anyone have any other similar solutions, basically I want to be able to do key lookups on a large(possibly distributed) dataset(40+ million).
Note: Anything NOT in Java is a bonus. :-) It seems most things today are going the Java route.
Have you tried Project Voldemort?
I would suggest you had a look at:
Metabrew key-value store blog post
There is a big list of key-value stores with a little bit of discussion in each of them. If you still have doubts you could join the so called Nosql google group and ask for help there.
Redis is insanely fast and actively developed. It is written in C(no java). Compiles out of the box on POSIX OS(no dependencies).
Did you try the hash backend? That should be faster for insert and key search.

Does anyone know where decent documentation describing the Lucene index format IN DETAIL on the web is?

I am mainly curious as to the inner workings of the engine itself. I couldnt find anything about the index format itself (IE in detail as though you were going to build your own compatible implementation) and how it works. I have poked through the code, but its a little large to swallow for what must be described somewhere since there are so many compatible ports to other languages around. Can anyone provide a decent link?
Have you seen this: http://lucene.apache.org/java/2_4_0/fileformats.html? It's the most detailed I've found.
Although Lucene in Action does stop short of the detail in that link, I found it a useful companion to keep a handle on the big picture concepts while understanding the nitty gritty.

Resources