How can I insert "typical" field values on CouchDB?

How can I insert "typical" field values on CouchDB? - couchdb

When I have several documents where there is a field which value is selected from a small groups of choices, is there any tool that prevents from introducing a wrong value?
Thank you
Hugo

Depending on what exactly you're trying to accomplish, validation functions sound like what you need. All the validation functions (1 per design document, as many design documents per database as you need) are run before each write. If any validation function throws an error, the write is refused.

Related

NoSQL - how to implement autosuggest and best matches properly?

We're building a database of cars and their properties, supposed to be stored in a DynamoDB.
Creating a cars table and filling it with objects that has properties like brand, model, year etc is easy.
But we also want a few other features en the admin interface:
Suggestions when typing
When creating a car, it should suggest brand and model from existing cars, when typing in the field.
Should we then maintain a list of brands and models in another table, and make a query to that table, when the user types?
Or is it good enough to query the "rich" table of car definitions, and get all values for brand, all model values where brand has a certain value, etc? My first thought is that it would be a heavy operation and we'd want a separate index of cars and models. But I'm not a NoSQL expert...
Best matches
When enrolling a new car in our system we want to use use an existing defined car as a reference if possible.
So when the user has typed in a brand, model, year etc we want to show a few options of the best matches - we can accept that they year etc. is different, but want the best matches first.
What is the best way to do matches like this on data in a NoSQL database? Any links to tools, concepts etc. will be appreciated :)
Thanks in advance

In dynamodb (all nosql), the less you create tables the best is your architecture (this is one of the main reason we use nosql), so no need of a new table, just add a new attribute and fill it with the searchable data you want, just have in mind that querying by dynamodb is case sensitive and you only can use the begins_with or the contains function to query data
The cons are :
You will use lot of reading capacity unit
You have to manage the capital letters
You have to fabric at each creation the searchable attribute
The solution I suggest is using aws cloudsearch, which gives an out of the boxes suggester, you will will have better results and give a better user experience, the indexation in cloudsearch is automatic each time you have a new item, but be aware of the pricing, however they will give you 30 day for free

Best way to build a custom table linked to a NotesView with sorting and paging

I have a view in my Xpage application that contains a lot of elements. From this view I need to build a table with custom rows (I can't just display the view, I need to build the rows to display myself because I need to compute data from other database, things that you can't do directly in a view).
In order to do so I know that I can use Dataview, Datatable or repeat control (other ideas maybe?). For sure I can't bring all the data on the client, it's way too much.
I am looking for a solution that will allow me to do paging (easy to do with the pager component) but more important sorting on header click. To be clear, I need sorting for all the entries of the view and not only for the current displayed page on the client.
What can be the more efficient way to do so ? I really have a lot of data to compute so I need the fastest way to do it.
(I can create several views with different sorting criteria if needed).

Any repeating control can have pagers applied to it. Also View Panels can include data not in the current view - just set the columnName property to blank and compute the value property. bear in mind you will not be able to sort on those column though - they're not columns, they're values computed at display time.
Any computed data is only computed for the entries currently shown. So if you have 5000 entries in the view but are only displaying 30 at a time, the computed data will only be computed for the current 30.
If your users need to be able to sort on all columns and you have a lot of data, basically they have to accept that they're requirements mean all that data needs computing when they enter the view...and whenever it's updated, by themselves or any other users. That's never going to be quick, and the requirements are the issue there, not the architecture. RDBMS may be better as a back-end, if that's a requirement, as long as the data doesn't go across multiple tables. Otherwise graph database structure may be a better alternative.
The bigger question is why the users need to sort on any column. Do the users really want to sort on the fifth column and then scroll down to entries beginning with a "H"? Do they want to sort on the fourth column and scroll down to entries for May 2014? On a Notes Client, that's a traditional approach, because it's easier than filtering. But usually users know what they're looking for - they don't want entries beginning "H", they want entries where the department is HR. If that's the case, sorting on all columns and paging is not the most efficient method either from a database design or a usability point of view.

To keep the processing faster and lightweight, I use JSON with JQuery DataTables.
Depending on the Data-size and usage, JSON could be generated on the fly or scheduled basis and saved in Lotus Notes Documents or ApplicationScope variables.
$.each(data, function(i, item) {
dataTable.row.add( [data[i].something1,data[i].something2,data[i].something3])
});

You can compute a viewColumn but if you have a lot going on I wouldn't go that route.
This is where Java in XPages SHINE!
Build a Java object to represent your row. So in java use backend logic to get all the data you need. Let's say you have a report of Sales Orders for a a company. And sales orders is pulling data from different places. Your company object would have a method like:
List<salesOrder> getOrders() {}
so in the repeat you call company.getOrders() and it returns all the rows that you worked out in java and populated. So your "rowData" collection name in the repeat can access all the data you want. Just build it into a table.
But now the sorting... We've been using jQuery DataTables to do just this.. It's all client side... your repeat comes down and then the DataTables kicks in and can make everything sortable... no need to rely on views.. works great...
Now it's all client side but supports paging and works pretty decent. If you're just pumping out LOTS of records - 6,000+ then you might want to look at outputting the data as json and taking advatange of some server cacheing... We're starting to use it with some really big output.. LOTS of rows and it's working well so far. Hopefully I'll have some examples on NotesIn9.com in the near future.

showing search results more efficiently?

I want to implement the auto-complete feature provided by various e-commerce stores. Functionality is pretty simple, when you type some characters, it start showing relevant suggestions.
I implemented it using solr (django-haystack), using the autocomplete method provided in haystack.query.SearchQuerySet . Basically, i get a list of results sorted by the score. Showing top n results as suggestions.
Solr document contains $product_name, $category_name and other fields. So the results which i generated looks like list of " in ".
Problem arise when i change the category name. If i change the category name, i have to update all the product belong to that particular category to reflect these changes in the auto-complete (update all documents in solr for products of this category).
Another way to do this is by putting just the id of the categories with product in the solr document. In that case, I have do look-up for category name each time, and this is not efficient.
Is there any other efficient way to do this?

Since you are changing the underlying data, the same has to be propogated to SOLR.
There are different approaches to do this:
Update the database, and reindex - Pros: Simple enough, Cons: Indexing time can be large.
Update database and Solr in tandem - Pros: Quick updates, almost instantaneous, Cons: Can lead to data inconsistency (if one update fails)
Update database, and schedule a delta-import in Solr. This is like a middle ground between the two above.
I would recommend the 3rd approach, but this would require some upfront schema design. Read more about delta import here, in context of DataImportHandler.

Transform MongoDB Data on Find

Is it possible to transform the returned data from a Find query in MongoDB?
As an example, I have a first and last field to store a user's first and last name. In certain queries, I wish to return the first name and last initial only (e.g. 'Joe Smith' returned as 'Joe S'). In MySQL a SUBSTRING() function could be used on the field in the SELECT statement.
Are there data transformations or string functions in Mongo like there are in SQL? If so can you please provide an example of usage. If not, is there a proposed method of transforming the data aside from looping through the returned object?

It is possible to do just about anything server-side with mongodb. The reason you will usually hear "no" is you sacrifice too much speed for it to make sense under ordinary circumstances. One of the main forces behind PyMongo, Mike Dirolf with 10gen, has a good blog post on using server-side javascript with pymongo here: http://dirolf.com/2010/04/05/stored-javascript-in-mongodb-and-pymongo.html. His example is for storing a javascript function to return the sum of two fields. But you could easily modify to return the first letter of your user name field. The gist would be something like:
db.system_js.first_letter = "function (x) { return x.charAt(0); }"
Understand first, though, that mongodb is made to be really good at retrieving your data, not really good at processing it. The recommendation (see for example 50 tips and tricks for mongodb developers from Kristina Chodorow by Oreilly) is to do what Andrew tersely alluded to doing above: make a first letter column and return that instead. Any processing can be more efficiently done in the application.
But if you feel that even querying for the fullname before returning fullname[0] from your 'view' is too much of a security risk, you don't need to do everything the fastest possible way. I'd avoided map-reduce in mongodb for awhile because of all the public concerns about speed. Then I ran my first map reduce and twiddled my thumbs for .1 seconds as it processed 80,000 10k documents. I realize in the scheme of things, that's tiny. But it illustrates that just because it's bad for a massive website to take a performance hit on some server side processing, doesn't mean it would matter to you. In my case, I imagine it would take me slightly longer to migrate to Hadoop than to just eat that .1 seconds every now and then. Good luck with your site

The question you should ask yourself is why you need that data. If you need it for display purposes, do that in your view code. If you need it for query purposes, then do as Andrew suggested, and store it as an extra field on the object. Mongo doesn't provide server-side transformations (usually, and where it does, you usually don't want to use them); the answer is usually to not treat your data as you would in a relational DB, but to use the more flexible nature of the data store to pre-bake your data into the formats that you're going to be using.
If you can provide more information on how this data should be used, then we might be able to answer a little more usefully.

Tagging and Analysing a Search Query

I'm developing a search engine which functions taking the semantics of data into account, unlike the usual keyword based index. I managed to develop a reasonable index for the search using metadata extraction methods and RDF, but I have difficulty in using such methods on the search query itself since the search query is very much shorter that the actual data. any idea how to perform a successful tagging of a search query, using similar methods, natural language processing, etc. ?
Thank You!

Yes, the sample size of a typical query is too small for semantic analysis to be of any value.
One approach might be to constrain or expand your query using drop-down menus for things like "Named Entities" or "Subject Verb Object" tuples.
Another approach would be to expand simple keywords using rules created from your metadata so that, for example, a query for 'car' might be expanded to the tuple pattern
(*,[drive,operate,sell],[car,automobile,vehicle])
before submission.
Finally, you might try expanding the query with a non-semantically valuable prefix and/or suffix to get the query size large enough to trigger OpenCalais' recognizer.
Something like 'The user has specified the following terms in her query: one, two, three.'.
And once the results are returned, filter out all results that match only the added prefix/suffix.
Just a few quick thoughts.

You need to build semantic tree. It will based on the combination of keywords.
For example, automobile -->vehicle --> car this relation technical aspect of car. travel --
hire/rent-->vehicle-->car this is something related to travel and rent a car.
In this case MongoDB will help you a lot.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string