solr-api accessing document field is taking a lot of time - search

I am trying to access a field in custom request handler. I am accessing it like this for each document:
Document doc;
doc = reader.document(id);
DocFields = doc.getValues("state");
There are around 600,000 documents in the solr. For a query running on all the docs, it is taking more than 65 seconds.
I have also tried SolrIndexSearcher.doc method, but it is also taking around 60 seconds.
Removing the above lines of code bring down the qtime to milliseconds. But, I need to access that field for my algo.
Is there a more optimised way to do this?

It seems that you querying one document at a time which is slow.
If you need to query all documents try to query *:*(instead of asking for a specific id) and then iterate over the results.

Related

Cloudant - apply a view/mapReduce to a geospatial query

HI I'm new to cloudant (and couch and asking questions on stackoverflow so I hope I manage to be vaguely clear about what I'm asking ) and I'm trying to do probably the second most basic geo task but am hitting a dead end.
I've got a database of docs which are geojson objects, I've created an index so I can query for intersections etc but it seems the only options I have in the url is the format=legacy (gives me the ids) and the format=geojson and the include_docs parameter - what I'd like to do is give back a particular view of the result set - I'm not interested in the geometry of the object (which is a big lump of data and it's likely that a number of other properties may be in the document that I'd rather filter out)
is there a correct way to do this in a single api call or do I need to fetch the doc ids (legacy format) and then issue a second query to bring back my chosen 'view' for each document id given in the result of format=legacy response
Thanks

Alfresco webscript (js) and pagination

I have a question about the good way to use pagination with Alfresco.
I know the documentation (https://wiki.alfresco.com/wiki/4.0_JavaScript_API#Search_API)
and I use with success the query part.
I mean by that that I use the parameters maxItems and skipCount and they work the way I want.
This is an example of a query that I am doing :
var paging =
{
maxItems: 100,
skipCount: 0
};
var def =
{
query: "cm:name:test*"
page: paging
};
var results = search.query(def);
The problem is that, if I get the number of results I want (100 for example), I don't know how to get the maxResults of my query (I mean the total amount of result that Alfresco can give me with this query).
And I need this to :
know if there are more results
know how many pages of results are lasting
I'm using a workaround for the first need : I'm doing a query for (maxItems+1), and showing only maxItems. If I have maxItems+1, I know that there are more results. But this doesn't give me the total amount of result.
Do you have any idea ?
With the javascript search object you can't know if there are more items. This javascript object is backed by the class org.alfresco.repo.jscript.Search.java. As you can see the query method only returns the query results without any extra information. Compare it with org.alfresco.repo.links.LinkServiceImpl which gives you results wrapped in PagingResults.
So, as javacript search object doesn't provide hasMoreItems info, you need to perform some workaround, for instance first query without limits to know the total, and then apply pagination as desired.
You can find how many objects have been found by your query simply calling
results.length
paying attention to the fact that usually queries have a configured maximum result set of 1000 entries to save resources.
You can change this value by editing the <alfresco>/tomcat/webapps/alfresco/WEB_INF/classes/alfresco/repository.properties file.
So, but is an alternative to your solution, you can launch a query with no constraints and obtain the real value or the max results configured.
Then you can use this value to devise how many pages are available basing you calculation on the number of results for page.
Then dinamically pass the number of the current page to the builder of your query def and the results variable will contain the corresponding chunk of data.
In this SO post you can find more information about pagination.

Creating a MongoDB 'find' query to search between a specific number of matching documents

I am using the MongoDB NodeJS driver and need to search the database for a specific set of posts. However if the post count gets too big I want to search between perhaps the 50th - 100th set of matching posts with the query and return the value to the client. However would searching for so many documents and returning them serve as a performance issue? If so what would be the proper query term?
Sample using skip/limit:
//1st part
db.dummy.find().limit(10)
//2nd part
db.dummy.find().skip(10).limit(10)
Take a look at the mongodb documentation: http://docs.mongodb.org/manual/reference/method/db.collection.find/
limit: http://docs.mongodb.org/manual/reference/method/cursor.limit/#cursor.limit
skip:
http://docs.mongodb.org/manual/reference/method/cursor.skip/#cursor.skip
You can use Skip and limit
skip the first 49 objects , then limit it by 50
then you will get what you want !
I hope this will help you ,

Search for documents by key using Domino Data Service

Domino Data Service is a good thing but is it possible to search for documents by key.
I didnt find anything in the api and the url parameters about it.
I tried the above and the requests usually fail on the server timeout after 30 seconds. Calls to /api/data/documents won't serve the purpose with parameters like sortcolumn or keysexactmatch, therefore calls to
/api/data/collections should be used for these.
Also, I don't think that arguments like sortcolumn would work on a document collection, because there isn't a column to be sorted in the first place, columns are in the views and not in documents, so view collection should be queried instead. That also mimics the behavior of getDocumentByKey method, which can't be called against document, but against the view. So instead:
http://HOSTNAME/DATABASE.nsf/api/data/documents?search=QUERY&searchmaxdocs=N
I would call
http://HOSTNAME/DATABASE.nsf/api/data/collections/name/viewname?search=QUERY&searchmaxdocs=N
and instead of
http://HOSTNAME/DATABASE.nsf/api/data/documents?sortcolumn=COLUMN&sortorder=ascending&keys=ROWVALUE&keysexactmatch=true
I would call:
http://HOSTNAME/DATABASE.nsf/api/data/collections/name/viewname?sortcolumn=COLUMN&sortorder=ascending&keys=ROWVALUE&keysexactmatch=true
where 'viewname' is the name of the view that is searched.
That is much faster, which comes in handy when working with larger databases.
You would do something like the following:
GET http://HOSTNAME/DATABASE.nsf/api/data/documents?search=QUERY&searchmaxdocs=N
N would be the total number of documents to return and QUERY would be your search phrase. The QUERY would be the same as doing a full text search.
For column lookups it should be something like this:
GET http://HOSTNAME/DATABASE.nsf/api/data/documents?sortcolumn=COLUMN&sortorder=ascending&keys=ROWVALUE&keysexactmatch=true
COLUMN would be the column name. ROWVALUE would be the key you are looking for.
There are further options for this. More details here.
http://infolib.lotus.com/resources/domino/8.5.3/doc/designer_up1/en_us/DominoDataService.html#migratingtowebsphereportalversion7.0

How to get Post with Comments Count in single query with CouchDB?

How to get Post with Comments Count in single query with CouchDB?
I can use map-reduce to build standalone view [{key: post_id, value: comments_count}] but then I had to hit DB twice - one query to get the post, another to get comments_count.
There's also another way (Rails does this) - count comments manually, on the application server and save it in comment_count attribute of the post. But then we need to update the whole post document every time a new comment added or deleted.
It seems to me that CouchDB is not tuned for such a way, unlike RDBMS when we can update only the comment_count attribute in CouchDB we are forced to update the whole post document.
Maybe there's another way to do it?
Thanks.
The view's return json includes the document count as 'total_rows', so you don't need to compute anything yourself, just emit all the documents you want counted.
{"total_rows":3,"offset":0,"rows":[
{"id":...,"key":...,value:doc1},
{"id":...,"key":...,value:doc2},
{"id":...,"key":...,value:doc3}]
}

Resources