Way to dump the relations from Freebase? - nlp

I have ran through the Google API for Freebase, but still confusing.
Is there simple way to dump the relations from Freebase?
I want to dump all entity-name-pair with a specific relation (e.g. marry_with, ...), and also want the chinese entity names.
Should I
write MQL to query all entity satisfying the condition? (but the MQL service is going to be retired recently. )
or dump all freebase and parse?
or is there other API capable of doing this?
or other KB (YAGO, DBpedia, wikidata) is more easier of doing this?
Which way is easier to work out.
Please shed me some direction . thanks

Freebase was retired and Wikidata is the recommended alternative.
You can use the Wikidata Query API to get entities with a specific property.
For instance, the query http://wdq.wmflabs.org/api?q=CLAIM[26] retrieves the IDs of all items having the property spouse (P26).
You can combine this with the Wikidata API, for instance to get labels and aliases in English for the first three items returned by the previous query:
http://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q23|Q24|Q42&languages=en&props=labels|aliases

Related

CloudSearch, How to find documents where a key is not defined?

Rather simple question here. Using CloudSearch, how do I find an object that does NOT have a certain key/property defined.
eg. I have been storing Car objects all along without indexing their price. Now I have began indexing Car objects with their msrp... how do I find the Car objects stored without any indexed price?
(and price:null)
(and price:undefined)
and other similar 'falsy' statements and their stringified permutations all do not work.
I am using AWS sdk in Node.js.
TIA!
Niko
The option that will work without any reindexing is a range search like
(NOT (range field=price [0,}))
which matches cars with a price that is not between 0 and infinity (eg ones with no price). See this answer for a discussion of other options.
Side note: I get the impression that you may be using CloudSearch to store your data. If so, I would consider using a datastore (which are designed to store data) rather than CloudSearch (which is a search engine). For one, it'll make this sort of query much easier.

Designing indices to have paging with filters and random page jump Elasticsearch

I just want to have an expert opinion about my use case and the way I am planning to use indices to see if there is no problem in my approach or if there are any better ways to achieve it. Since I am new to ES, your opinions would really help me. We are storing data in couchdb in different databases based for each type of data.
I have database that serves as a link between 2 databases. For example, database A has 'floor' data, database B that links floor to items and then separate database for each item that a floor can have (e.g., card reader, camera etc).
We need to search for items that are linked to a floor and get them with filtering and paging. (Right now my links database has only ids and type but I am also planning to save name for each type as well in links db so that I can have filtering while I can do paging).
The way I want to achieve filtering and paging in my datastore is, I'll just have indices for each db. So based on floor, i'll get all its linked items for a type and 'search filter' (from index of links db) that would give me a page of certain items, i'll then use ids from that result to get those full objects (from index of) db of that item type.
Please let me know if there is any better approach in handling that, like e.g., if I can create one index for my floor and links and item databases and is it possible to do that through logstash couchdb plugin.
Many thanks.
Your setup does not sound wrong, but there are alternatives. You can use nested objects or parent-child relationships for an easier setup. Both approaches have their advantages. It all depends on the type of queries that you would like to do, and the amount of items that are related.
I would start by reading he next section of the definitive guide, that should give you a good start.
https://www.elastic.co/guide/en/elasticsearch/guide/current/modeling-your-data.html?q=model

PouchDB structure

i am new with nosql concept, so when i start to learn PouchDB, i found this conversion chart. My confusion is, how PouchDB handle if lets say i have multiple table, does it mean that i need to create multiple databases? Because from my understanding in pouchdb a database can store a lot of documents, but a document mean a row in sql or am i misunderstood?
The answer to this question seems to be surprisingly under-documented. While #llabball clearly gave a decent answer, I don't think that views are always the way to go.
As you can read here in the section When not to use map/reduce, Nolan explains that for simpler applications, the key is to abuse _ids, and leverage the power of allDocs().
In other words, if you had two separate types (say artists, and albums), then you could prefix the id of each type to obtain an easily searchable data set. For example _id: 'artist_name' & _id: 'album_title', would allow you to easily retrieve artists in name order.
Laying out the data this way will result in better performance due to not requiring extra indexes, and less code. Clearly however, if your data requirements are more complex, then views are the way to go.
... does it mean that i need to create multiple databases?
No.
... a document mean a row in sql or am i misunderstood?
That's right. The SQL table defines column header (name and type) - that are the JSON property names of the doc.
So, all docs (rows) with the same properties (a so called "schema") are the equivalent of your SQL table. You can have as much different schemata in one database as you want (visit json-schema.org for some inspiration).
How to request them separately? Create CouchDB views! You can get all/some "rows" of your tabular data (docs with the same schema) with one request as you know it from SQL.
To write such views easily the property type is very common for CouchDB docs. Your known name from a SQL table can be your type like doc.type: "animal"
Your view names will be maybe animalByName or animalByWeight. Depends on your needs.
Sometimes multiple-databases plan is a good option, like a database per user or even a database per user-feature. Take a look at this conversation on CouchDB mailing list.

Siren "more like this" query

i am using the newest Siren distribution for Solr to index my data and search it. (http://siren.solutions/siren/downloads/)
Is there a simple way to search similar documents in my indexed data. Something similar to the MoreLikeThis query of Solr (https://cwiki.apache.org/confluence/display/solr/MoreLikeThis).
My goal is to find documents that have a similar json structure that the one i am interested in.
best,
Bernd
If I remember SIREn stores the RDF representation of each resource within a dedicated field of the Solr document. I don't think the default MLT component that comes with Solr works for your scenario.
I mean, enabling that component will produce some kind of result but I don't believe that it will follow your json "similarity" requirement.
On top of that I suggest you to post your request on SIREn mailing list [1]: I'm sure the dev team will address you on the right path.
[1] https://groups.google.com/forum/m/#!forum/siren-user

CouchDB - Queries with params

I'm new to CouchDB and I know my mindset is probably still too much in the relational DB sphere, but here goes:
It appears that querying on Couch is all done via Views. I read that temporary views are very inefficient and should be avoided in production.
So my question really is how would one do effective querying with parameters (as the views do not accept them). For example if I were to use Couch to power a blog site would I have to create a new view for each post equivalent to 'select post from posts where id=1'.
I understand that I can use lucene along side the querying to perfom a full text search on the results, but this is only really useful for textual content not numbers.
Im happy creating a boat load of static views as they can be created very simply on the fly. My worry is that this is not how Couch was supposed to be used and I'm missing something. Feel free to enlighten me.
Cheers, Chris.
Views do accept url parameters, key being the one your are looking for. You can even limit how many rows you get and sort as well.
Your views can be indexed by arbitrary JSON keys. This means you can create a view that emits documents like so, [username docid] => doc. Then you can query this view with http://url/to/view?key=[username docid].
You could create a view that emits [username type date] => doc. Now you can get all documents of a certain between a certain date (using startKey and endKey url parameters).
Your example of the blog is one that CouchDB is particularly well suited for. In fact I believe it's an example in the upcoming CouchDB book from O'reilly.
That said, some kinds of queries are not easily handled by CouchDB alone. couchdb-lucene can help here. Don't assume that's it's only good for full text search. I've been using it to run general complex queries against the database to good effect.

Resources