Siren "more like this" query - search

i am using the newest Siren distribution for Solr to index my data and search it. (http://siren.solutions/siren/downloads/)
Is there a simple way to search similar documents in my indexed data. Something similar to the MoreLikeThis query of Solr (https://cwiki.apache.org/confluence/display/solr/MoreLikeThis).
My goal is to find documents that have a similar json structure that the one i am interested in.
best,
Bernd

If I remember SIREn stores the RDF representation of each resource within a dedicated field of the Solr document. I don't think the default MLT component that comes with Solr works for your scenario.
I mean, enabling that component will produce some kind of result but I don't believe that it will follow your json "similarity" requirement.
On top of that I suggest you to post your request on SIREn mailing list [1]: I'm sure the dev team will address you on the right path.
[1] https://groups.google.com/forum/m/#!forum/siren-user

Related

How to create a partial search in Meteor that only returns permitted data

I'm trying to create a search feature in Meteor 1.8.1 that does the following:
returns partial matches, e.g. "fish" will find "fish", "fishcake" and "dogfish"
has server-side control of which documents are returned, so search results don't include documents that are not published to the user
is reasonably efficient
returns a limited number of results
This seems like it should be a common requirement, but I'm failing to find any solution.
MongoDB full text search will only return on whole words, so will only find "fish".
Easy search doesn't support server-side permissions, as far as I can tell.
I could try a regex solution but I think it would be expensive?
Thank you for any solutions!
Edit: From discussion it seems that Easy Search does support server-side filtering using a selector, and this would be the best solution. However, I can't get a selector working from the examples and documentation. For clarity, I've created a new question for that issue
The documentation explicitly states that for advanced use cases you may want to use elastic search and offers you a pluggable extension to ease the burden of integration.
https://matteodem.github.io/meteor-easy-search/docs/recipes/#advanced-search
You might wish that a search for cafe returns documents with the text café in them (special character). Or that your search string is split up by whitespace and those terms used to search across multiple fields.
You should consider using a search engine like ElasticSearch for your search if you have these usecases. ElasticSearch allows you to configure precisely how your fields are being searched. One way you can do that is by analyzing your data, so that searching itself is as fast as possible.

Way to dump the relations from Freebase?

I have ran through the Google API for Freebase, but still confusing.
Is there simple way to dump the relations from Freebase?
I want to dump all entity-name-pair with a specific relation (e.g. marry_with, ...), and also want the chinese entity names.
Should I
write MQL to query all entity satisfying the condition? (but the MQL service is going to be retired recently. )
or dump all freebase and parse?
or is there other API capable of doing this?
or other KB (YAGO, DBpedia, wikidata) is more easier of doing this?
Which way is easier to work out.
Please shed me some direction . thanks
Freebase was retired and Wikidata is the recommended alternative.
You can use the Wikidata Query API to get entities with a specific property.
For instance, the query http://wdq.wmflabs.org/api?q=CLAIM[26] retrieves the IDs of all items having the property spouse (P26).
You can combine this with the Wikidata API, for instance to get labels and aliases in English for the first three items returned by the previous query:
http://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q23|Q24|Q42&languages=en&props=labels|aliases

Data relationships as a context for search in Marklogic

I using marklogic's search functionality to create a search page. As of right now, I'm running an XQuery to get search results through search:search. As a bare bones example, see this code:
xquery version "1.0-ml";
import module namespace search = "http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
search:search('test',
<options xmlns='http://marklogic.com/appservices/search'></options>)
This search searches all content in the database, which is fine in many cases. In other cases, I search based on collections with cts:collection-query. The collections serve as great contexts for my searches.
Now, I would like to limit my search results based on a relationship of data in a "main" document. This "main" document has all the relationships in an object model. If that object model has a reference to a document, I want that document included in the search. Essentially, the "main"/model document is the context of the search.
I was trying to brainstorm some ideas of the best way to to this. Here's what I've come up with thus far, but I was hoping someone more familiar with Marklogic (I've only been working with it for 6 months) could lead me in a good direction:
Add all documents referenced in the model document to a unique collection. Then query search based on that collection. However, the collections would have to be updated as the model changed.
Load the model document into my code and get a list of all the references and add them to a query by cts:document-query (or the like).
Restructure my concept of a "model" somehow in my XML documents.
Thanks for any input or suggestions.
I would start with (2) and see if the performance is good enough. That will depend on your use-case, but I expect it should be fine for thousands or even hundreds of thousands of references.
Be sure to use a single-term cts:document-query($list-of-references). That will be faster than cts:or-query(for $ref in $list-of-references return cts:document-query($ref)), because the index lookup can be a single pass instead of N separate lookups.
All of these ideas would work fine. Deciding which to use depends on particulars of your application such as how often the main document is changed (and are you in control of it),
how hard it is to remodel your XML.
Another thing to consider is you can set a trigger on document updates which could perform the collection changes automatically.
-David Lee

Solr - Enriching the TermsComponent answer

I'm using Solr 3.5.0 (with WebSphere Commerce). While performing a search, commerce use the suggestion tool to suggest (auto-complete) search terms regarding the letters already typed on the search box.
Currently WebSphere Commerce is using the Solr's TermsComponent. But one of my new requirement is to be abble to enrich the list of suggested terms.
Do you know is there is any way to do that by creating a plain text dictionary, using an other solr component, ... ?
Thanks for reading,
and for your help.
Regards,
Dekx.
I think a plain-text dictionary probably wouldn't be a usable data source (even if you could use it, search linearly through a plain-text file would probably be too slow). If you create an index from you dictionary, you could probably incorporate it in the TermsComponent as a shard (see the TermsComponent documentation, under the heading "Distributed Search Support").
I don't believe TermsComponent supports searching multiple fields, so you'll want to make sure the same field name is used for the terms in the dictionary that you want to use (that is, if you are looking at the "name" field in the index, then create a "name" field in your indexed dictionary as well, rather than a "dictionaryentry" field)
Just to my mind, though, I fail to understand what the value this would be. Generally, it's intended to look at the terms available in the index on that field. "Enriching" it with more data, would just be providing suggestions that it won't actually be able to find when searching. Of course, I don't really know about your search implementation, but in most cases, that would certainly be my thought.

CouchDB - Queries with params

I'm new to CouchDB and I know my mindset is probably still too much in the relational DB sphere, but here goes:
It appears that querying on Couch is all done via Views. I read that temporary views are very inefficient and should be avoided in production.
So my question really is how would one do effective querying with parameters (as the views do not accept them). For example if I were to use Couch to power a blog site would I have to create a new view for each post equivalent to 'select post from posts where id=1'.
I understand that I can use lucene along side the querying to perfom a full text search on the results, but this is only really useful for textual content not numbers.
Im happy creating a boat load of static views as they can be created very simply on the fly. My worry is that this is not how Couch was supposed to be used and I'm missing something. Feel free to enlighten me.
Cheers, Chris.
Views do accept url parameters, key being the one your are looking for. You can even limit how many rows you get and sort as well.
Your views can be indexed by arbitrary JSON keys. This means you can create a view that emits documents like so, [username docid] => doc. Then you can query this view with http://url/to/view?key=[username docid].
You could create a view that emits [username type date] => doc. Now you can get all documents of a certain between a certain date (using startKey and endKey url parameters).
Your example of the blog is one that CouchDB is particularly well suited for. In fact I believe it's an example in the upcoming CouchDB book from O'reilly.
That said, some kinds of queries are not easily handled by CouchDB alone. couchdb-lucene can help here. Don't assume that's it's only good for full text search. I've been using it to run general complex queries against the database to good effect.

Resources