CouchDB simple find - couchdb

I have some couchDB database.
and i want, as in mongodb, find one item.
something like db.find({user : "John"})
That the easiest way to do it?

If you have your queries predefined, you can use views to query your database.
There is also the ability to use temporary views for ad-hoc searches, but they are never recommended for production use because the index is not saved.
If you need something more along the lines of full-text search, check out couchdb-lucene.

What is your programming language of choice?
CouchDB's API is HTTP based.
Basically you could setup a view which uses the username as key and query that via HTTP request or with the help of a "driver" for your specific language.
Views are defined as map/reduce functions. An easy introduction can be found at the official wiki for example.
Also taking a look at the CouchDB guide is a good place to start with.

I prefer using elasticsearch.
It has a couchdb _river for integration. It will listen to _changes of couchdb, then fetch and index documents.
That way you get awesome power of elasticsearch (powered by lucene), with it's RESTful interfaces and clustering ability.
You get good separation of "searching" vs. your core documents.
Which means you can index and search across different document stores.
Admittedly you don't get a nice small all in one package, but for flexibility for my use cases it wins hands down.

I have a new project to do this: http://github.com/iriscouch/query_couchdb
(Hopefully I can add an intro and documentation today.)
The idea is to copy the Google App Engine Python API.
new Query("User")
.filter("name =", "John")
.order('-age')
.get(function(er, view) {
if(er)
throw(er);
console.log("Got " + view.rows.length + " rows!");
for(var a = 0; a < view.rows.length; a++) {
var row = view.rows[a];
console.log("Row " + a + " = " + JSON.stringify(row));
}
});
Unfortunately it is missing unit tests and examples, but I am already using this in production.

There is an initiative to implement a mongo-like find with the query syntax offered by mongo db. Cloudant announced the initiative and they started contributing through mango, a MongoDB inspired query language interface for Apache CouchDB.
The Cloudant project should allow this type of queries find({user : "John"}), find({user:{$in : ["Doe", "Smith"]}}) or find({"age": {"$gt": 21}}) for age > 21
A similar alternative pouchdb-find is also being developed for pouch db.

Related

Sitecore 8.1 : Steps for converting the Lucene Search to Solr

We just upgraded our 7.2 to 8.1 which uses lucene search provider. The website relies heavily on lucene for search and indexing the articles so that it can be displayed as a list.
We already have a SOLR instance setup. We need to get this Lucene converted to SOLR. Will appreciate if I get direction on below:
How do we convert the custom computed lucene indexes and fields on to Solr?
Apart from configurations on CORES and end points, are there any code changes etc. that we need to be careful of?
How does the index rebuild event works in terms of SOLR. Do they (CDs) all try to build once or in sequence or only one triggers build.
UPDATE:
I switched to SOLR. I can rebuild all the CORES and web_index shows 11K documents. However the page doesn't return any results. Below is the code snippet, appreciate if I can get help on what I'm doing wrong. THis was working fine with Lucene:
public IEnumerable<Article> GetArticles(Sitecore.Data.ID categoryId)
{
List<Article> articles = null;
var home = _sitecoreService.GetItem<Sitecore.Data.Items.Item>(System.Guid.Parse(ItemIds.PageIds.Home));
var index = ContentSearchManager.GetIndex(new SitecoreIndexableItem(home));
using (var context = index.CreateSearchContext(SearchSecurityOptions.DisableSecurityCheck))
{
var query = context.GetQueryable<ArticleSearchResultItem>().Filter(item => item.Category == categoryId);
var results = query.GetResults();
articles = new List<Article>();
foreach (var hit in results.Hits)
{
var article = _sitecoreService.GetItem<Article>(new Sitecore.Data.ID(hit.Document.Id).ToGuid());
if (article != null)
{
if (article.ArticlePage != null && !article.ArticlePage.HideInNavigation)
{
articles.Add(article);
}
}
}
}
return articles;
}
The actual code for the computed field would probably not change. You would need to test that to make sure, but because Sitecore abstracts away the Lucene and SOLR code, as long as you are just making use of the Sitecore API it should work.
You will need to change the config. In the Lucene index you add the computed fields in the defaultLuceneIndexConfiguration section. This will need to change to the defaultSolrIndexConfiguration
Again, as long as you are making us of the Sitecore API exclusively and not using Lucene.net or Solr.net directly - most code should work fine. Some gotcha's that I have found.
Lucene is not case sensitive, SOLR is case sensitive. So some queries that may have worked fine on Lucene, may not anymore because of case sensitivity.
Be careful of queries that do not set a .Take() limit on them. Sitecore does have a default value for the max rows returned for a query, but on SOLR that can have a much bigger impact on query time than it does for Lucene because of the network round trips.
Another thing to think about with SOLR is the number of searches that take place. With Lucene, there is little impact in making many small calls to the index, as its local and on disk so very fast. With SOLR, those calls turn into Network traffic, so a lot of micro calls to the index can have a big performance impact.
As mentioned by mikaelnet: SOLR uses dynamic fields in the index. So each field has a suffix based on the field type. This shouldn't be a problem in most cases. The Sitecore API will automatically append the suffix to any IndexField attributes you have. But on occasion, it can get that mapping wrong and you may have to code around that.
The index rebuild is set by your configuration. There are a few index update strategies that you can set:
manual: The index is only updated manually.
sync: The index is updated when items are modified, created or deleted. This should be the default for the master index on the content authoring server.
onPublishEndAsync: This updates the index after a publish job has been completed.
In a multi-server setup, for example: 1 content authoring server and 2 content delivery servers. You should setup the content authoring server or a dedicated indexing server to perform the index updates. The delivery servers should have the update strategies set to manual for all indexes. This stops the indexes being built multiple times by each server.
There are some good articles out there about setting up SOLR with Sitecore. For reference:
* http://www.sequence.co.uk/blog/sitecore-8-and-solr/
That should give you an idea of the differences.

How to do "Not Equals" in couchdb?

Folks, I was wondering what is the best way to model document and/or map functions that allows me "Not Equals" queries.
For example, my documents are:
1. { name : 'George' }
2. { name : 'Carlin' }
I want to trigger a query that returns every documents where name not equals 'John'.
Note: I don't have all possible names before hand. So the parameters in query can be any random text like 'John' in my example.
In short: there is no easy solution.
You have four options:
sending a multi range query
filter the view response with a server-side list function
using a CouchDB plugin
use the mango query language
sending a multi range query
You can request the view with two ranges defined by startkey and endkey. You have to choose the range so, that the key John is not requested.
Unfortunately you have to find the commit request that somewhere exists and compile your CouchDB with it. Its not included in the official source.
filter the view response with a server-side list function
Its not recommended but you can use a list function and ignore the row with the key John in your response. Its like you will do it with a JavaScript array.
using a CouchDB plugin
Create an additional index with e.g. couchdb-lucene. The lucene server has such query capabilities.
use the "mango" query language
Its included in the CouchDB 2.0 developer preview. Not ready for production but will be definitely included in the stable release.

Retrieve analyzed tokens from ElasticSearch documents

Trying to access the analyzed/tokenized text in my ElasticSearch documents.
I know you can use the Analyze API to analyze arbitrary text according your analysis modules. So I could copy and paste data from my documents into the Analyze API to see how it was tokenized.
This seems unnecessarily time consuming, though. Is there any way to instruct ElasticSearch to returned the tokenized text in search results? I've looked through the docs and haven't found anything.
This question is a litte old, but maybe I think an additional answer is necessary.
With ElasticSearch 1.0.0 the Term Vector API was added which gives you direct access to the tokens ElasticSearch stores under the hood on per document basis. The API docs are not very clear on this (only mentioned in the example), but in order to use the API you have to first indicate in your mapping definition that you want to store term vectors with the term_vector property on each field.
Have a look at this other answer: elasticsearch - Return the tokens of a field. Unfortunately it requires to reanalyze on the fly the content of your field using the script provided.
It should be possible to write a plugin to expose this feature. The idea would be to add two endpoints to:
allow to read the lucene TermsEnum like the solr TermsComponent does, useful to make auto-suggestions too. Note that it wouldn't be per document, just every term on the index with term frequency and document frequency (potentially expensive with a lot of unique terms)
allow to read the term vectors if enabled, like the solr TermVectorComponent does. This would be per document but requires to store the term vectors (you can configure it in your mapping) and allows also to retrieve positions and offsets if enabled.
You may want to use scripting, however your server should have the scripting enabled.
curl 'http://localhost:9200/your_index/your_type/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "field_x.field_y"
}
}
}
}'
The default setting for allowing the script depends on the elastic search version, so please check that out from the official documentation.

Query and/or Search for SharePoint Document ID

We have the sharepoint 2010 environment with Document ID's enabled.
Given (part of) a Doc ID, we want to programmatically retrieve the document(s) matching that ID. The problem seems to be that this column is rather special, in that it might need special handling.
Using an SPSiteDataQuery, fetching the _dlc_DocId field as part of the viewfields works fine. However, including it as part of the where query never results in any documents being fetched.
Using the Search API has gotten us nowhere at all.
Has anyone pulled this off, or any suggestions on how to tackle this problem?
[Update] Turns out we were fooled by subtle errors in the XML and bad debugging misinterpretations. This stuff just works fine.
I don't normally contribute to these sorts of things because cleverer people than I always get there before me, but as this is an old one with no proper answer I think I'll add my thoughts for those who find this page.
I was struggling with this but after a little digging around and learning a bit of Caml I got this working.
I am using the SharePoint Client Object Model against SharePoint 2010 and Office365 beta.
Start off your query by looking at the all list items query:
Microsoft.SharePoint.Client.CamlQuery.CreateAllItemsQuery().ViewXml
"<View Scope=\"RecursiveAll\">\r\n <Query>\r\n </Query>\r\n</View>"
Stick a where child inside the query
Then add in
<Eq><FieldRef Name="_dlc_DocId" /><Value Type="Text">MDXC2KE55ASN-3-80</Value></Eq>
replacing MDXC2KE55ASN-3-80 with the doc ID you are looking for inside the where.
Also don't forget you might want to make use of these too:
<ViewFields><FieldRef Name="_dlc_DocId" /></ViewFields>
<RowLimit>1</RowLimit>
Then use List.GetItems() method to bring back the ListItemCollection.
Just in case nobody comes with a slick solutions from the depths of the Sharepoint infrastructure:
What would Google Do?
Slice is, Dice it and dump it in a reverse index.
Solr and Lucene offer supreme tools for this. The idea is to cut the DocId's in small pieces and add the location of the document to the bucket for that piece.
Say We have "A real nice document" with Id ABCD123. You would add it to the buckets
ABCD, BCD1, CD12, D123
When searching for a partial ID (+ other data like dates, types, ...) you (well the search engine) creates the union of the buckets + applies additonal constraints.
To make this happen you need to write a spider for the sharepoint server and a routine which makes a record of data elements to be indexed.
Put a nice REST interface in frnt of it (actually SOLR already has that), integrate it in the main sharepoint server, and nobody needs to know there is something else running behind it.
These products can also incrementally update the indexes, so they can be kept up to date.
you could use the following to get the Document ID.
SPFile file = MethodToUploadFileToServer(web, filepath);
SPListItem item = file.Item;
string DocID = item.Properties["_dlc_DocId"].ToString();

Search strategies in ORMs

I am looking for information on handling search in different ORMs.
Currently I am redeveloping some old application in PHP and one of requirements is: make everything or almost everything searchable, so user just types "punkrock live" and the app finds videos clips, music tracks, reviews, upcoming events or even user comments labeled that way.
In environment where everything is searchable ORM need to support this feature in two ways:
providing some indexing API on "O" side of ORM
providing means for bulk database retrieval on "R" side
Ideal solution would return ready made objects based on searched string.
Do you know any good end-to-end solutions that does the job, not necessarily in PHP?
If you dealt with similar problem it would be nice to listen what your experience is. Something more than Use Lucene or semantic web is the way oneliners, tho ;-)*
I have recently integrated the Compass search engine into a Java EE 5 application. It is based on Lucene Java and supports different ORM frameworks as well as other types of models like XML or no real model at all ;)
In the case of an object model managed by an ORM framework you can annotate your classes with special annotations (e.g. #Searchable), register your classes and let Compass index them on application startup and listen to changes to the model automatically.
When it comes to searching, you have the power of Lucene at hand. Compass then gives you instances of your model objects as search result.
It's not PHP, but you said it didn't have to be PHP necessarily ;) Don't know if this helps, though...
In a Propel 1.3 schema.xml file, you can specify that you'd like all your models to extend a "BaseModel" class that YOU create.
In that BaseModel, you're going to re-define the save() method to be something like this:
public function save(PropelPDO $con = null)
{
if($this->getIsSearchable())
{
// update your search index here. Lucene, Sphinx, or otherwise
}
return parent::save($conn);
}
That takes care of keeping everything indexed. As for searching, I'd suggest creating a Search class with a few methods.
class Search
{
protected $_searchableTypes = array('music','video','blog');
public method findAll($search_term)
{
$results = array();
foreach($this->_searchableTypes as $type)
{
$results[] = $this->findType($type, $search_term);
}
return $results;
}
}

Resources