I'm using CouchDB with a home-grown C# interface library. The most common method of accessing CouchDB with this library results in a temp view being created. I've optimized the library so that it uses keys when possible. My question is this: are temp views cached by CouchDB? It seems that the first time I run one of these temp views it runs a bit slowly. After that, similar queries that use the same view code seem to execute way faster.
So does CouchDB cache views? And if so, how long to they stay cached? If I'm hitting the database at a fairly constant rate is there much use in switching to static views?
Temp views are not for production, only for testing. As your database grows they will only get slower. You should figure out what views you need and go from there.
If you really need dynamic queries you should look into couchdb-lucene. While designed for full-text search I've had some success using it for general queries.
Related
I have 4 databases and they have more than 200.000 datas. A viewPanel which shows all datas of database does not load correcttly. It turns out with an error after little bit waiting. If that view does not have lots of datas no error is given.
I could not find a solution for this situation :(
I added this line into Application Property but It did not solved my problem.
xsp.domino.view.navigator=ByNoteId
Regards
Cumhur Ata
There are a number of performance "sins" you can commit on Domino. Unfortunately Domino is too forgiving and somehow still works even if you do them. The typical sins:
Using #Yesterday, #Today, #Now, #Tomorrow ind a view selection formula or a sorted column in a view. I wrote an article about your options to mitigate that
Having code that does a view.refresh before opening a page
Using reader fields and accessing a view that is not categorized by that reader field. Hits only users who can see only few documents. Check this article for possible remedies
Not having a fast temp location for view rebuilds. Typical errors are: not enough disk I/O or having your transaction log on the same channel as your databases. Make sure you have a high performance server
For Windows servers: not taking care of disk fragmentation - includes links to performance trouble shooting
Not using ODS51/52 and have compression for data and design active. Takes a simple command to fix it
That's off my head what you can check. Loading 200k documents into a panel in one go doesn't look like a good UX approach. Paginate it eventually
I am trying to add lucene.net on my project where searching getting more complicated data. but transaction (or table modifying frequently like inserting new data or modifying the field which is used in lucene index).
Is it good to use lucene.net searching here?
How can I find modified fields & update to specific lucene index which is already created? Lucene index contains documents that are deleted from the table then how can I remove them from lucene index?
while loading right now,
I have removed index which are not available in the table based on unique Field
inserting if index does not exist otherwise updating all index which are matching table unique field
While loading page it's taking more time than normal, due to my removing/inserting/updating index method calling.
How can I proceed with it?
Lucene is absolutely suited for this type of feature. It is completely thread-safe... IF you use it the right way.
Solution pointers
Create a single IndexWriter and keep it in a globally accessible singleton (either a global static variable or via dependency injection). IWs are completely threadsafe. NEVER open multiple IWs on the same folder.
Perform all updates/deletes via this singleton. (I had one project doing 100's of ops/second with no issues, even on slightly crappy hardware).
Depending on the frequency of change and the latency acceptable to the app, you could:
Send an update/delete to the index every time you update the DB
Keep a "transaction log" or queue (probably in the same DB) of changed rows and deletions (which are are to track otherwise). Then update the index by consuming the log/queue.
To search, create your IndexSearcher with searcher = new IndexSearcher(writer.GetReader()). This is part of the NRT (near real time) pattern. NEVER create a separate IndexReader on an index folder that is also open by an IW.
Depending on your pattern of usage you may wish to introduce a period of "latency" between changes happening and those changes being "visible" to the searches...
Instances of IS are also threadsafe. So you can also keep an instance of an IS through which all your searches go. Then recreate it periodically (eg with a timer) then swap it using Interlocked.Exchange.
I previously created a small framework to isolate this from the app and make it reusable.
Caveat
Having said that... Hosting this inside IIS does raise some problems. IIS will occasionally restart your app. Is will also (by default) start the new instance before stopping the existing one, then swaps them (so you don't see the startup time of the new one).
So, for a short time there will be two instances of the writer (which is bad!)
You can tell IIS to disable "overlapping" or increase the time between restarts. But this will cause other side-effects.
So, you are actually better creating a separate service to host your lucene bits. A simple self hosted WebAPI Windows service is ideal and pretty simple. This also gives you better control over where the index folder goes and the ability to host it on a different machine (which isolates the disk IO load). And means that the service can be accessed from other parts of your system, tested separately etc etc
Why is this "better" than one of the other services suggested?
It's a matter of choice. I am a huge fan of ElasticSearch. It solves a lot of problems around scale and resilience. It also uses the latest version of Java Lucene which is far, far ahead of lucene.net in terms of capability and performance. (The same goes for the other two).
BUT, ES and Solr are Java (which may or may not be an issue for you). AzureSearch is hosted in Azure which again may or may not be an issue.
All three will require climbing a learning curve and will require infrastructure support or external third party SaaS commitment.
If you keep the service inhouse and in c# it keeps it simple and you have control over the capabilities and the shape of the API can be turned for your needs.
No "right" answer. You'll have to make choices based on your situation.
You should be indexing preferrably according to some schedule (periodically). The easiest approach is to keep the date of last index and then query for all the changes since then and index new, update and remove records. In order to keep track of removed entries in the database you will need to have a log of deleted records with a date it was removed. You can then query using that date to what needs to be removed from the lucene.
Now simply run that job every 2 minutes or so.
That said, Lucene.net is not really suited for web application, you should consider using ElasticSearch, SOLR or AzureSearch. Basically server that can handle load and multi threading better.
I'm playing around with the map and reduce through temporary views, but at 1,000,000+ documents it is a bit slow, rather than creating a separate dataset for testing, is it possible to only use a subset of data in the temporary view?
A map-reduce view is more like "CREATE INDEX" than it is like "SELECT * FROM".
In other words, when you do a map-reduce view, CouchDB will crunch through every document.
However, for testing, one thing you can do is make a normal view (not temporary). Just develop your work in a temporary design document, _design/my_experiments.
Save your map-reduce view code and then query the view with the ?stale=update_after option. You will probably get no results, however stale=update_after will tell CouchDB to begin processing the view. Now try your query again. You will see the results that have been processed so far. Now try a third time. You will see even more data reflected.
Roughly speaking, views process documents in the same order that a _changes query returns them to you: basically the first update is processed first, and then in order and the most recent change is processed last.
I have a couch database called restaurants.couch and once a week I would
Compact Database and the file size will go from 20MB to 11MB.
I see that the older versions are gone - which I don't need.
However, I notice that the .restaurants_design folder size is over 100MB
and there are 30+ .view files in here.
I executed Compact Views and Clean Up Views, and it reduced it to 8MB and to just 1 .view file. My app also ran a little faster.
My questions :
1) Why does .restaurants_design get so large ? What's the reason. It's just a view.
2) What's the benefit of Compact Views and Clean Up Views ( besides file size reduction ) ?
3) What are the side effects of Compact Views and Clean Up Views ? Will I ever regret doing this if I don't need versioning ?
4) How often should Compact Database, Compact Views, and Clean Up Views
be performed if the app does not really need versioning other than a simple
nosql database with some views.
CouchDb caches the view query result on disk for indexing to speed to the query. So if you have lots of views, and user query them with different parameters, they will be cached.
Compact Views - View indexes on disk are named after their MD5 hash of the view definition. When you change a view, old indexes remain on disk. To clean up all outdated view indexes (files named after the MD5 representation of views, that does not exist anymore) you can trigger a view cleanup:. Clean Up Views however, remove old views caches, since if you update a view, the old result of the view is still on disk.
The side effects is the next person who query the view will take longer time since CouchDB will need to rebuild the caches. Compact Views and Clean Up Views does not effect versioning, but Compact Database does.
The frequency really depend on the way your application work and the limitation of you deployment setup. However if you really want to disable versioning, try look into _revs_limit setting, so you can just set a limit of revision for all documents.
My CouchDB view indexes are being created slower than I would like. Writing the documents is not such a problem but the users can edit them offline and then bulk update, which seems to slow things right down.
This answer helped but I was just wondering is it better to separate out various views into different design documents (eg1) or to store them all in one (eg2).
Eg. 1
_design/posts/_view/id
_design/comments/_view/id
_design/tags/_view/id
Eg.2
_design/webresources/_view/_id?key="posts"
_design/webresources/_view/_id?key="comments"
_design/webresources/_view/_id?key="tags"
*This example is just for illustration purposes. I am only concerned with the time it takes to build the indexes.
You will gain better performance if you read often. Couchdb views are updated and build at read time. So you can can read the view every time the document updates to keep it hot*.
Or maybe listen to the changes feed and keep a track of documents updated. Once they reach a certain threshold value read a view.
Another option is use stale parameter.
If stale=ok is set, CouchDB will not refresh the view even if it is stale, the benefit is a an improved query latency. If stale=update_after is set, CouchDB will update the view after the stale result is returned
Every design document is a separate erlang process. So separating your views across different design documents will cause them to be built concurrently. However each view will still be built in a blocking manner. That is the two views across different design documents can start updating at the same time but the time it takes to update the individual views will be the same as if they were in the same design document.
*You don't necessarily have to care about the result. Our goal here is to trick couchdb to update the view. So you can fire off a request in a separate async process and be done with it.