Developing a custom partitioner in Cassandra

Developing a custom partitioner in Cassandra - cassandra

I would like to develop a custom partitioning strategy in Cassandra for research purposes. I couldn't find a decent tutorial in that regard. As far as I am concerned, I am supposed to provide an implementation for the IPartitioner interface. I am not quite sure whether I know the expected semantics of all methods or perhaps I need to implement other classes too.
Is this process documented somewhere? Is there any forum or existing discussion that I can read through? Or an informative blog post?

Related

Spring-Statemachine configuration

I am not sure this is the correct place to ask this question but I could not see a forum link in spring-statemachine project(http://projects.spring.io/spring-statemachine/) to ask a question to the developers, I hope this is the correct way to do it.
For a while I wrote a blog,
Extremely Axajified Web Application
concerning about extremely asynchronous web applications, using Spring Webflow with those and the limitations of Spring Webflow for these sort of projects.
In my proposed solution, I used extensively state machine principles. At the timeframe that I wrote this blog there wasn't an out of the box State Machine that I can use, so I implemented my own version of it.
Now to my surprise, I just saw Spring Statemachine project and I considering to convert my sample application to use the Spring Statemachine framework.
I have two questions to ask before really starting investing effort to the subject. I checked the samples of the Spring Statemachine, it seems that the configuration of the State Machine is done via Java Code.
Is this only possible method, in my sample application, I am reading an UML Model (XMI) and create via Eclipse M2T a Spring configuration file to startup the application.
I find it is for practical uses to complex for the end user to configure a complex State Machine via Java code. For this purpose I used graphical user interfaces to create UML model of the State Machine and convert this to Spring application context, so it will be easier to understand for the end user.
Which you can see here.
UML Model (unfortunately Stack Overflow is not letting me post extra link so please use "#sm_model" at the end of the previous link)
Eclipse M2T UML Model conversion (see above "#m2t")
So is it possible to provide an XML file and configure Spring Statemachine with it? I can naturally create Java classes via M2T but I have a feeling end result will be nearly unreadable for end user for complex projects.
Second question I like to ask, can Spring Statemachine support "Nested Statemachines", I found in the project web site a hint about the possibility but in the existing sample project I could not find a concrete implementation.
Biggest hurdle for the usage of a State Machine in a practical web application is the "State Explosion" and best way to deal with it, is the "Nested Statemachines".
Does Spring Statemachine support this concept.
You can find more details about what I mean with "State Explosion" and "Nested StateMachines" in the following links.
State Explosion (see above "#stateexplosion")
Nested Statemachines (see above "#nested_sm")
I hope this was the correct place to ask these questions.
Thx for the answers

Sorry for late answer, I just noticed this tagged question.
We don't yet have any functionality to define machine config outside of annotation based config model(javaconfig). Xml config is in our roadmap and you can track its status in ticket https://github.com/spring-projects/spring-statemachine/issues/78.
For your second question, yes we support nested states and even orthogonal regions. Showcase example shows relatively complex scenarios how nested states and transitions between those can be used http://docs.spring.io/spring-statemachine/docs/1.0.0.RC1/reference/htmlsingle/#statemachine-examples-showcase.

Providing documentation with Node/JS REST APIs

I'm looking to build a REST API using Node and Express and I'd like to provide documentation with it. I don't want to craft this by hand and it appears that there are solutions available in the forms of Swagger, RAML and Api Blueprint/Apiary.
What I'd really like is to have the documentation auto-generate from the API code as is possible in .NET land with Swashbuckle or the Microsoft provided solution but they're made possible by strong typing and reflection.
For the JS world it seems like the correct option is to use the Swagger/RAML/Api Blueprint markup to define the API and then generate the documentation and scaffold the server from that. The former seems straightforward but I'm less sure about the latter. What I've seen of the server code generation for all of these options seem very limited. There needs to be some way to separate the auto-generated code from the manual code so that the definition can be updated easily and I've seen no sign or discussion on that. It doesn't seem like an insurmountable problem (I'm much more familiar with .NET than JS so I could easily be missing something) and there is mention of this issue and solutions being worked on in a previous Stack Overflow question from over a year ago.
Can anyone tell me if I'm missing/misunderstanding anything and if any solution for the above problem exists?

the initial version of swagger-node-express did just this--you would define some metadata from the routes, models, etc., and the documentation would auto-generate from it. Given how dynamic javascript is, this became a bit cumbersome for many to use, as it required you to keep the metadata up-to-date against the models in a somewhat decoupled manner.
Fast forward and the latest swagger-node project takes an alternative approach which can be considered in-line with "generating documentation from code" in a sense. In this project (and swagger-inflector for java, and connexion for python) take the approach that the swagger specification is the DSL for the api, and the routing logic is handled by what is defined in the swagger document. From there, you simply implement the controllers.
If you treat the swagger specification "like code" then this is a very efficient way to go--the documentation can literally never be out of date, since it is used to construct all routes, validate all input variables, and connect the API to your business layer.
While true code generation, such as what is available from the swagger-codegen project can be extremely effective, it does require some clever integration with your code after you initially construct the server. That consideration is completely removed from the workflow with the three projects above.
I hope this is helpful!

My experience with APIs and dynamic languages is that the accent is on verification instead of code generation.
For example, if using a compiled language I generate artifacts from the API spec and use that to enforce correctness. Round tripping is supported via the generation of interfaces instead of concrete classes.
With a dynamic language, the spec is used at test time to guarantee that both all the defined API is test covered and that the responses are conform to the spec (I tend to not validate requests because of Postel's law, but it is possible too).

datastax driver vs spring-data-cassandra

Hey I am new to Cassandra and I am friendly with Spring jdbc-template.
Can anyone please explain difference between both of them? Also can you suggest which one is good to use ?
thanks.

spring-data-cassandra uses datastax's java-driver, so the decision to be made is really whether or not you need the functionality of spring-data.
Some features from spring data that may be useful for you (documented here):
spring xml configuration for configuring your Cluster instance (especially useful if you are already using spring).
object mapping component.
The java-driver also has a mapping component as well that is worth exploring.
In my opinion if you are already using spring, it is worth looking into spring-data-cassandra. Otherwise, it would be good to start off with just the datastax java-driver.

Cassandra vs Riak [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am looking for an eventually consistent data store and it looks like it may be coming down to Riak or Cassandra. Has anyone got expereinces of a view on this?

As you probably know, they are both architecturally strongly influenced by Dynamo (eventually consistent, no single points of failure, etc). Both also go beyond Dynamo in providing a "richer than pure K/V" data model -- in Cassandra's case, providing a Bigtable-like ColumnFamily mode, in Riak's, a Document-oriented one. I have seen sane people choose both.
I believe points that favor Cassandra include
speed
support for clusters spanning multiple data centers
big names using it (digg, twitter, facebook, webex, ... -- http://n2.nabble.com/Cassandra-users-survey-tp4040068p4040393.html)
Points that favor Riak include
map/reduce support out of the box
/Cassandra dev, fwiw

Riak is used by
Mozilla Foundation
Ask.com sponsored listings
Comcast
Citigroup
Bet365
I think they both pass the test of credible reference customers/users.
Cassandra seems more mature, and is currently doing better in benchmarks. Riak seems easier to add a node to as your cluster grows.

For completeness: A good (probably biased) comparison between the two can be found at http://docs.basho.com/riak/1.3.2/references/appendices/comparisons/Riak-Compared-to-Cassandra/

Use and download are different. Best to get references.
Perhaps a private conversation could be had where Riak references in these companies could be shared? Not sure how to get such with Cassandra, but there is a community of companies that support Cassandra that seem like a good place to start. As these probably have community participants in Cassandra development, it may be a REALLY reasonable place to start.
I would like to hear Riak's answer to recent and large deployments where customers are happy.
I also would like to see the roadmap for each product. Cassandra is a bit easier to track (http://wiki.apache.org/cassandra/) than Riak in my view as Cassandra's wiki discusses limitations and things that are probably going to change going forward, but neither outline futures well. I could understand that of an open source community ... perhaps ... but I cannot for a product for which I must pay.

I also would suggest research of Cloudant, which has what appears to be a very nice layering of capabilities. It also looks like it is bringing to bear the capabilities elsewhere in Apache land. CouchDB is the Apache platform on which Cloudant is based. BUT the indexing with Lucene seems but the tip of the iceberg when it comes to where Cloudant could go. Creating and managing an index is a very systematic process, a kind of data pipeline, that could be scripted using other Apache community assets. AND capabilities like NLP also could be added through Lucene indirectly, or maybe directly into what is persisted.
It would be nice to see a proposed Cloudant roadmap, especially since the team could mine the riches of the Apache community and integrate such into Cloudant. Such probably exists as there is an operational component to the Cloudant revenue model that will require it, if for no other reason.
Another area of interest ... Cloudant's pricing model ... it is clear their revenue model is not based on software, but around service. That is quite attractive, and it seems consistent with the ecosystem surrounding Cassandra too. I don't know if the Basho folks have won over enough of the nosql community as yet ... don't see such from any buzz around their web site or product.
I like this Cloudant web page (https://cloudant.com/the-data-layer/). I was surprised to see the embedded Erlang capability ... I did not know CouchDB was written in Erlang as this seems unusual to me in the Apache community (my ignorance); CouchDB appears to be older than other nosql products I know (now) to be written in Erlang. Whatever their strategy, they at least count Amazon EC2 and Microsoft Azure as hosting partners, indicating an appreciation of Microsoft and !Microsoft worlds - all very important if properly recognizing the middleware value potential (beyond cache or hash table applications) that these types of data stores could have.
Finally, while I don't know the board well, Andy Palmer's guidance looks like it will be valuable. He can bring some guidance vis-a-vis structured data (through VoltDB) to a world that rightly or wrongly may be unfairly branded as KVP hash tables of unstructured data. The need for structure and ecosystem surrounding nosql "databases" is being recognized ... witness Google's efforts with Spanner ... KVP/little structure/need for search-ability motivated Google's investment in the Spanner space. While we all may not need something like Spanner, we probably do need an improving and robust "enterprise" management and interoperability capability in these nosql databases to make it reasonable to incorporate them into modern cloud architectures. The needed structure can come from ease of interoperability and functional richness. It can also come from new capabilities that support conversion of unstructured data to structured data (e.g. indexes, use of NLP to create structured and parsed renderings of things inside of a KVP blob, and plenty of other things that, if put into a roadmap and published, could entice and grow a user base). Cloudant looks like it has a good chance of success ... I will take a closer look at it ...
And look what I found about CouchDB ...
CouchDB comes with a suite of features, such as on-the-fly document transformation and real-time change notifications, that makes web app development a breeze. It even comes with an easy to use web administration console. You guessed it, served up directly out of CouchDB! We care a lot about distributed scaling. CouchDB is highly available and partition tolerant, but is also eventually consistent. And we care a lot about your data. CouchDB has a fault-tolerant storage engine that puts the safety of your data first.

Using java classes in Grails

I have a Java\Spring\Hibernate application - complete with domain classes which are basically Hibernate POJOs
There is a piece of functionality that I think can be written well in Grails.
I wish to reuse the domain classes that I have created in the main Java app
What is the best way to do so ?
Should I write new domain classes extending the Java classes ? this sounds tacky
Or Can I 'generate' controllers off the Java domain classes ?
What are the best practices around reusing Java domain objects in Grails\Groovy
I am sure there must be others writing some pieces in grails\groovy
If you know about a tutorial which talks about such an integration- that would be awesome !!!
PS: I am quite a newbie in grails-groovy so may be missing the obvious. Thanks !!!

Knowing just how well Groovy and Grails excel at integrating with existing Java code, I think I might be a bit more optimistic than Michael about your options.
First thing is that you're already using Spring and Hibernate, and since your domain classes are already POJOs they should be easy to integrate with. Any Spring beans you might have can be specified in an XML file as usual (in grails-app/conf/spring/resources.xml) or much more simply using the Spring bean builder feature of Grails. They can then be accessed by name in any controller, view, service, etc. and worked with as usual.
Here are the options, as I see them, for integrating your domain classes and database schema:
Bypass GORM and load/save your domain objects exactly as you're already doing.
Grails doesn't force you to use GORM, so this should be quite straightforward: create a .jar of your Java code (if you haven't already) and drop it into the Grails app's lib directory. If your Java project is Mavenized, it's even easier: Grails 1.1 works with Maven, so you can create a pom.xml for your Grails app and add your Java project as a dependency as you would in any other (Java) project.
Either way you'll be able to import your classes (and any supporting classes) and proceed as usual. Because of Groovy's tight integration with Java, you'll be able to create objects, load them from the database, modify them, save them, validate them etc. exactly as you would in your Java project. You won't get all the conveniences of GORM this way, but you would have the advantage of working with your objects in a way that already makes sense to you (except maybe with a bit less code thanks to Groovy). You could always try this option first to get something working, then consider one of the other options later if it seems to make sense at that time.
One tip if you do try this option: abstract the actual persistence code into a Grails service (StorageService perhaps) and have your controllers call methods on it rather than handling persistence directly. This way you could replace that service with something else down the road if needed, and as long as you maintain the same interface your controllers won't be affected.
Create new Grails domain classes as subclasses of your existing Java classes.
This could be pretty straightforward if your classes are already written as proper beans, i.e. with getter/setter methods for all their properties. Grails will see these inherited properties as it would if they were written in the simpler Groovy style. You'll be able to specify how to validate each property, using either simple validation checks (not null, not blank, etc.) or with closures that do more complicated things, perhaps calling existing methods in their POJO superclasses.
You'll almost certainly need to tweak the mappings via the GORM mapping DSL to fit the realities of your existing database schema. Relationships would be where it might get tricky. For example, you might have some other solution where GORM expects a join table, though there may even be a way to work around differences such as these. I'd suggest learning as much as you can about GORM and its mapping DSL and then experiment with a few of your classes to see if this is a viable option.
Have Grails use your existing POJOs and Hibernate mappings directly.
I haven't tried this myself, but according to Grails's Hibernate Integration page this is supposed to be possible: "Grails also allows you to write your domain model in Java or re-use an existing domain model that has been mapped using Hibernate. All you have to do is place the necessary 'hibernate.cfg.xml' file and corresponding mappings files in the '%PROJECT_HOME%/grails-app/conf/hibernate' directory. You will still be able to call all of the dynamic persistent and query methods allowed in GORM!"
Googling "gorm legacy" turns up a number of helpful discussions and examples, for example this blog post by Glen Smith (co-author of the soon-to-be-released Grails in Action) where he shows a Hibernate mapping file used to integrate with "the legacy DB from Hell". Grails in Action has a chapter titled "Advanced GORM Kungfu" which promises a detailed discussion of this topic. I have a pre-release PDF of the book, and while I haven't gotten to that chapter yet, what I've read so far is very good, and the book covers many topics that aren't adequately discussed in other Grails books.
Sorry I can't offer any personal experience on this last option, but it does sound doable (and quite promising). Whichever option you choose, let us know how it turns out!

Do you really want/need to use Grails rather than just Groovy?
Grails really isn't something you can use to add a part to an existing web app. The whole "convention over configuration" approach means that you pretty much have to play by Grails' rules, otherwise there is no point in using it. And one of those rules is that domain objects are Groovy classes that are heavily "enhanced" by the Grails runtime.
It might be possible to have them extend existing Java classes, but I wouldn't bet on it - and all the Spring and Hibernate parts of your existing app would have to be discarded, or at least you'd have to spend a lot of effort to make them work in Grails. You'll be fighting the framework rather than profiting from it.
IMO you have two options:
Rewrite your app from scratch in Grails while reusing as much of the existing code as possible.
Keep your app as it is and add new stuff in Groovy, without using Grails.
The latter is probably better in your situation. Grails is meant to create new web apps very quickly, that's where it shines. Adding stuff to an existing app just isn't what it was made for.
EDIT:
Concerning the clarification in the comments: if you're planning to write basically a data entry/maintenance frontend for data used by another app and have the DB as the only communication channel between them, that might actually work quite well with Grails; it can certainly be configured to use an existing DB schema rather than creating its own from the domain classes (though the latter is less work).

This post provides some suggestions for using grails for wrapping existing Java classes in a web framework.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string