How to append index level information to documents when returning search results - node.js

Relatively simple question -- I want to append index-level information onto each document when returning those documents. I do not want to copy that information into each document (makes it harder to adjust that information if it changes). I've found out that you can use the _meta tag to add information to the index level, but now I want it to be appended onto the document when returning results from a search query.
My specific use case is: I have indices that store posts per user (indices are structured as: posts-USER_ID). I'm performing a search across all posts across all user indices (search index: posts-*), and I want to return user information with each index (that user information being a JSON object with fields like username, userColor, displayName).
I see that fields like _index and _type are index-level and returned with each document automatically. I essentially want to return a custom field as well. As said above I've been able to successfully append this user information on _meta for an index but I can't figure out how to append it to documents returned from that index (for my search results from that multi-index query).
The reason I want this is because I need user information with post information on search (to display various things, username, displayName, coloring posts in the userColor). Ideally I'd prefer not to have to perform another query for each search result to retrieve user information (for each document result, querying the user that created that post -- seems expensive). I also would not like to copy that user information in each document in an index (so under a posts-USERID index adding a creator field with user information). But that seems insanely repetitive (as the indices are already partitioned per user) and when the user updates information that is very very expensive (would have to iterate through each document in "their" index and change their information.
What do I do / help!
(linked question in the elastic discussion page: https://discuss.elastic.co/t/how-to-append-index-level-information-to-documents-when-returning-search-results/262923)

Related

How to check for duplication before creating a new document in CouchDB/Cloudant?

We want to check if a document already exists in the database with the same fields and values of a new object we are trying to save to prevent duplicated item.
Note: This question is not about updating documents or about duplicated document IDs, we only check the data to prevent saving a new document with the same data of an existing one.
Preferably we'd like to accomplish this with Mango/Cloudant queries and not rely on views.
The idea so far is:
1) Scan the the data that we are trying to save and dynamically create a selector that matches that document's structure. (We can't have the selectors hardcoded because we have types of many documents)
2) Query de DB with for any documents matching that selector to if any document already exists that matches those criteria.
However I wonder about the performance of this approach since many of the selector fields will not be indexed.
I also much rather follow best practices than create something out of the blue, but haven't been able to find any known solutions for this specific scenario.
If you happen to know of any, please share.
Option 1 - Define a meaningful ID for your documents
The ID could be a logical coposition or a computed hash from the values that should be unique
If you want to check if a document ID already exists you can use the HEAD method
HEAD /db/docId
which returns 200-OK if the docId exits on the database.
If you would like to check if you have the same content in the new document and in the previous one, you may use the Validate Document Update Function which allows to compare both documents.
function(newDoc, oldDoc, userCtx, secObj) {
...
}
Option 2 - Use content hash computed outside CouchDB
Before create or update a document a hash should be computed using the values of the attributes that should be unique.
The hash is included in the document in a new attribute i.e. "key_hash"
Create a mango index using the "key_hash" attribute
When a new doc should be inserted, the hash should be computed and find for documents with the same hash value using a mango expression before the doc is inserted.
Option 3 - Compute hash in a View
Define a view which emit the computed hash for each document as key
Couchdb Javascript support does not include hashing functions, this could be difficult to include in a design document.
Use erlang to define the map function, where you can access to the erlang support for hashing.
Before creating a new document you should query the view using a the hash that you need to compute previously.
One solution would be to take Juanjo's and Alexis's comment one step further.
Select the keys you wish to keep unique
Put the values in a string and generate a hash
Set the document's _id to that hash
PUT the document on the database.
check return for failure
If another document already exists on the database with the same _id value, the PUT request will fail.

Get IDs of nodes via the edge collection only

I am writing an application that stores external data in ArangoDB for further processing inside the application. Let's assume I am talking about Photos in Photosets here.
Due to the nature of used APIs, I need to fetch Photosets befor I can load Photos. In the Photosets API reply, there is a list of Photo IDs that I later use to fetch the Photos. So I created an edge collection called photosInSets and store the edges between Photosets and Photos, although the Photos are not there yet.
Later on, I need to get a list of all needed Photos to load them via the API. All IDs are numeric. At the moment, I use the following AQL query to fetch the IDs of all required Photos:
FOR edge
IN photosInSets
RETURN DISTINCT TO_NUMBER(
SUBSTITUTE(edge._from, "photos/", "")
)
However... this does not look like a nice solution. I'd like to (at least) get rid of the string operation to remove the collection name. What's the nice way to do that?
One way you can find this is with a join on the photosInSets edge collection back to the photos collection.
Try a query that looks like this:
FOR e IN photoInSets
LET item = (FOR v IN photos FILTER e._from == v._id RETURN v._key)
RETURN item
This joins the _from reference in photoInSets with the _id back in the photos collection, then pulls the _key from photos, which won't have the collection name as part of it.
Have a look at a photo item and you'll see there is _id, _key and _rev as system attributes. It's fine to use the _key value if you want a string, it's not necessary to implement your own unique id unless there is a burning reason why you can't expose _key.
With a little manipulation, you could even return an array of objects stating which photo._key is a member of which photoSet, you'll just have to have two LET commands and return both results. One looking at the Photo, one looking at the photoSet.
I'm not official ArangoDB support, but I'm interested if they have another way of doing this.

How to find particular json document from couchdb

How to find particular json document details from couchdb
For ex : Database name : employee_mgmt, in that database contains 50 json documents. So i want to find particular employee json documents ( Find by employee id ).
CouchDB does in it self not provide you with collections/buckets, hence all your documents are peers. It's up to you to provide meta-data e.g. by having a property $doctype with a value representing what kind of document it is. This is useful if you are writing maps and e.g. want to create a view (secondary index) returning something applicable only to employees.
Know, if you just want to query by _id you don't need the above. Just do a simple GET with an URI as: http://host:port/databasename/documentid
More information: http://docs.couchdb.org/en/1.6.1/api/document/common.html#get--db-docid
If you want to get a batch of documents matching many _id use the builtin index _all_docs http://docs.couchdb.org/en/1.6.1/api/database/bulk-api.html#post--db-_all_docs

Implement Search Everything using Solr

How the search everything kind of application is indexing & keeping track of data into its search indexes.
Recently I have been working on Apache Solr which is producing amazing results for a search. But it was for one particular products catalog section that is being searched. As Solr is a stores it's data document, we indexed searchable fields as document in solr. I'm not sure how it can be used to build a search everything kind of search? And how should I index data into Solr?
By search everything I mean, to search into different module for information like Customers, Services, Accounts, Orders, Catalog, Support Ticket, etc. So search return results which is combined as a result from a single search form and user don't need to go into different forms for search that module?
Do I need to build different indexes for each such data models or store them into solr as single document? What is the best strategy to implement this.
You can store all that data in a single index with each document having an extra field that stores its type (Customer, Order, etc.). For the within-module search, just restrict the search query to documents of that type. For the Search All functionality, use copyField to copy all the relevant fields in each document type into one big field, and search with the document type field unconstrained.

Solr Merging Results of 2 Cores Into Only Those Results That Have A Matching Field

I am trying to figure out if how I can accomplish the following and none of the answers I have found so far seem to fit:
I have a fairly static and large set of resources I need to have indexed and searchable. Solr seems to be a perfect fit for that. In addition I need to have the ability for my users to add resources from the main data set to a 'Favourites' folder (which can include a few more tags added by them). The Favourites needs to be searchable in the same manner as the main data set, across all the same fields plus the additional ones.
My first thought was to have two separate schemas
- the first for the main data set and its metadata
- the second for the Favourites folder with all of the metadata from the main set copied over and then adding the additional fields.
Then I thought that would probably waste quite a bit of space (the number of users is much larger than the number of main resources).
So then I thought I could have the main data set with its metadata (Core0), same as above with the resourceId as the unique identifier. Then there would be second one (Core1) for the Favourites folder with the unique id of the resourceId, userId, grade, folder all concantenated. The resourceId would be a separate field also. In addition, I would create another schema/core (Core3) with all the fields from the other two and have a request handler defined on it that searches across the other 2 cores and returns the results through this core.
This third core would have searches run against it where the results would expect to only be returned for a single user. For example, a user searches their Favourites folder for all the items with Foo. The result is only those items the user has added to their Favourites with Foo somewhere in their main data set metadata. I guess the result handler from Core3 would break the search up into a search for all documents with Foo in Core0, a search across Core1 for userId and folder and then match up the resourceIds from both of them and eliminate those not in both. Or run a search on Core1 with the userId and folder and then having gotten that result set back, extract all the resourceIds and append an AND onto the search query to Core0 like: AND (resourceId = 1232232312 OR resourceId = 838388383 OR resourceId = 8637626491).
Could this be made to work? Or is there some simpler mechanism is Solr to resolve the merging of 2 searches across 2 cores and only return the results that match on (not necessarily a unique) field in both?
Thanks.
Problem looks like a data base join of 2 tables with resource id as the foreign key.
Ignore the post if what i understood is wrong.
First i will probably do it with a single core, with a field userid (indexed, but not stored), reindex a document every time a new user favorites it by appending his user id (delimited by something that analyzer ignores).
So searching gets easier (userId:"kaka's id" will fetch all my favorites)
I think it takes some work to do this and also if number of users who can like a document increases, userid field gets really long.
So in that case,i will move on to my next idea which is similar to yours,have a second core with (userid,resource id).Write a wrapper which first searches this core for all the favorites, then searches another core for all the resources in a where condition, but again..if a user favorites more resources, the query might exceed GET method's size limit..
If both doesn't seem to work, its time to think something more scalable, which leaves us the same space wasting option.
Am i missing something??

Resources