I'm learning to use the CROSSREF rest API, and want to search for all DOI's with affiliation to a given university. How do I create a multi word search query for AND, not AND/OR?
There are over 100 million DOI's on CrossRef, but only 8217730 have authors affiliation metadata.
https://api.crossref.org/works?sample=10&filter=has-affiliation:true
Now if I want to query search for affiliations with "University of Southern Mississippi," I could use mississippi+university+southern, but this return AND/OR the three words.
I want only affiliations with all three words.
https://api.crossref.org/works?sample=10&query.affiliation=mississippi+university+southern
This returns all with the word university or southern or mississippi
CrossRef field query instructions are here
https://github.com/CrossRef/rest-api-doc#field-queries
and a github comment about the topic is here
https://github.com/CrossRef/rest-api-doc/commit/a4d047e0d1556e80aaab0f4b5aae420da2a99ea2 and here https://github.com/CrossRef/rest-api-doc/issues/170
The problem is the sample parameter in your URI. The sample parameter gives you random result.
Being able to select random results is useful for both testing and sampling. You can use the sample parameter to retrieve random results.
API description for Sample
So if you want to query just for all results of '' then use the URI without the sample part. Like:
https://api.crossref.org/works?query.affiliation=mississippi+university+southern
CrossRef API doesn’t support Boolean operators. The query.affiliation path returns multiple results with scores. You will see the highest score in the resulting json has a key: value out of ‘chosen: true’.
I believe that they have recently transitioned from Solr to Elasticsearch - https://www.crossref.org/blog/behind-the-scenes-improvements-to-the-rest-api/ - which supports Boolean search - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html - but I’m not sure how to use it on CrossRef API endpoints not if you even can.
Related
We have an index set up in Azure cognitive search that has two string fields (hash1 & hash2) containing separate hashes. We would like to query the index for documents where the two hashes within a document aren't equal.
I tried applying the filter: $filter=hash1 ne hash2, expecting the query to return all documents with mismatched hashes. Instead, I was greeted with the following error message:
"Invalid expression: Comparison must be between a field, range variable or function call and a literal value.\r\nParameter name: $filter"
From what I can gather there seems to be some kind of limitation preventing comparisons between fields. Would it be possible to perform this type of query in Azure cognitive search using a different technique?
I would use content enrichment in this case. Even if comparing two hashes with a query was supported, it would be inefficient compared to pre-calculating the value using a content enrichment technique.
Introduce a new boolean property called something like HasEqualHashes
Populate that property with an appropriate boolean value
Use a $filter to filter your content as you wish
search=whatever&$filter=HasEqualHashes
Note that two different scenarios determine how you can enrich your content.
CONTENT SUBMITTED VIA SDK
When you use the SDK to submit content, you can enrich your items any way you want using regular code. Populating your HasEqualHashes property is a trivial one-liner in C#.
CONTENT SUBMITTED USING BUILT-IN INDEXERS
If you use one of the built-in indexers, you have to learn and understand the concept of skillsets.
https://learn.microsoft.com/en-us/azure/search/cognitive-search-working-with-skillsets
Here are 2 search examples,
you can see the first search for "yael0079"
return an object where the email filed is yael0079#gmail.com as top score.
The second search for "yael0079#gmail.com"
return the object from before somewhere far below.
Now, I know the '#' tag consider as space, but still, I would expect the same object will get the higher score.
In the second case, since the # sign is considered punctuation, your query becomes yael0079 OR gmail.com. The term gmail.com matches also in other fields of the documents returned what adds to the overall relevance score. To learn more about query processing and scoring in Azure Search, please read: How full text search works in Azure Search.
I have read that there is the Search API. But it seems like this API does not exist for Node.JS.
How can I partially match strings for querying entities without knowing the full name of the attribute?
For example I want to select all users that start with a G. How can I accomplish this?
Thank you for your help!
While you cannot do a "true" partial string matching (i.e. contains) with Datastore, you can do a "begins with" query as described in this post:
Basically, create a composite inequality filter like this:
SELECT * FROM USER WHERE USERNAME >= 'G' AND USERNAME < 'G\ufffd'.
Here, \ufffd is the last valid unicode character.
This would return all entities with their usernames starting with 'G'. You can use the same technique for matching multiple characters (e.g. >= 'JA' and < 'JA\ufffd').
Note that the string values/indexes in the Datastore are case sensitive, so you need an indexed property with all characters in either lower case or upper case so you can perform the search accordingly.
You can also mimic a word search like this -
Let's say you have a property named name that stores the following values:
John Doe
John Smith
James Mike Murphy
To do a word search (find entities with word smith or james and murphy) - create another property (e.g. nameIndex) and store the words from name as an array property (note that all words are converted to lower case).
["john","doe"]
["john", "smith"]
["james", "mike" "murphy"]
Now you can do a word search using the nameIndex property -
SELECT * FROM Entity WHERE nameIndex = 'smith'
SELECT * FROM Entity WHERE nameIndex = 'james' AND nameIndex='murphy'
Again, note that the nameIndex need to store the data in a fixed case (lower or upper) and your query parameters should use that case. Also, OR queries not supported unless the client library you are using supports it (typically done by running multiple queries).
This approach won't work if your property has more than 1500 bytes of data (limit for indexed properties)
Again, the proposed solutions are not replacement for full text search engines, rather a couple of tricks you could do with Datastore alone and may satisfy simple requirements.
You can't perform partial match searches on the Datastore entities (let alone without knowing the name of the property/attribute). See Appengine Search API vs Datastore
And the Search API is, indeed, not available in the flexible environment (that includes Node.JS). A potential alternative is indicated the Search section in Migrating Services from the Standard Environment to the Flexible Environment:
The Search service is currently unavailable outside of the standard
environment. You can host any full-text search database such as
ElasticSearch on Google Compute Engine and access it from both
the standard and flexible environments.
UPDATE:
Node.JS is currently available in the standard environment as well, see:
Now, you can deploy your Node.js app to App Engine standard environment
Google App Engine Node.js Standard Environment Documentation
Folks, I was wondering what is the best way to model document and/or map functions that allows me "Not Equals" queries.
For example, my documents are:
1. { name : 'George' }
2. { name : 'Carlin' }
I want to trigger a query that returns every documents where name not equals 'John'.
Note: I don't have all possible names before hand. So the parameters in query can be any random text like 'John' in my example.
In short: there is no easy solution.
You have four options:
sending a multi range query
filter the view response with a server-side list function
using a CouchDB plugin
use the mango query language
sending a multi range query
You can request the view with two ranges defined by startkey and endkey. You have to choose the range so, that the key John is not requested.
Unfortunately you have to find the commit request that somewhere exists and compile your CouchDB with it. Its not included in the official source.
filter the view response with a server-side list function
Its not recommended but you can use a list function and ignore the row with the key John in your response. Its like you will do it with a JavaScript array.
using a CouchDB plugin
Create an additional index with e.g. couchdb-lucene. The lucene server has such query capabilities.
use the "mango" query language
Its included in the CouchDB 2.0 developer preview. Not ready for production but will be definitely included in the stable release.
I know for a fact that there are atleast 5-6 POI within the 50 mile radius in this area. However, I don't get any results on this query.
https://api.foursquare.com/v2/venues/suggestCompletion?ll=-44.67,167.92&query=milford&radius=50000
I see results when I try search api (it doesnt use query as mentioned in documentation):
https://api.foursquare.com/v2/venues/search?ll=-44.67,167.92&intent=checkin&query=milford&radius=50000
No results with intent match on the search query.
I really like the suggestcompletion api (compact). Any suggestion/input would be great?
Thanks!
The suggestcompletion endpoint is used to suggest venues whose names start with the provided query. The endpoint is used to provide autocomplete results for search input fields. It is not used as a general purpose venue search - you should use the /venues/search endpoint for this purpose.
looks like you have missed out the API version param. You need to denote it by adding this into your request :
&v=20150826
suggestCompletion is included into newer API released on 20150826 which differs from default one that not including suggestCompletion feature.