Neo4j quick lookup of string within property of many nodes - search

I'm using neo 2.2.2 and i'm currently using Regex search to find a string in the name property over 600k nodes.
Each node is structured with a minimum of the following two properties.
{
name: 'some string of text',
sid: 12345
}
I've created an index on name and another index on sid. Lookups on sid are very fast. Searches [using regex] are very slow. Currently I'm searching for a string with a * before and after.
What can be done with neo to make searching for string within a property very fast?
If doing something special within neo is not ideal, I could theoretically standup some supplementary algorithm/service separate from Neo4j that searches for a string value within the name property, and then gives me the sid, which then is used to look up the node within neo.
Help me do fast string search with neo4j, please. :)

You can use legacy fulltext indexing to speed up your search. This blog shows you how.

In general Regexes are very expensive. From my point of view, you should find another solution for that.
Could you please tell us more about your use case and why you want to use Regex?
One solution for that you already suggest. Store SID and Name in another format (or database), which has better performance for Regex searching than Neo4j.
Or do some analysis of name property content and base on that create representation of the content as a graph.
e.g.
* Node for a count of letters in name property
* Node for starting letter
* Split name property to multiple properties
* etc...

Related

finding organization and industry/sector from string in dbpedia

I am generating a short list of 10 to 20 strings which I want to lookup on dbpedia to see if they have an organization tag and if so return the industry/sector tag. I have been looking at the SPARQLwrapper queries on their website but am having trouble constructing one that returns organization and sector/industry for my string. Is there a way to do this?
If I use the code below I get a list of industry types I think rather than the industry of the company.
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
SELECT ?industry WHERE
{ <http://dbpedia.org/resource/IBM> a ?industry}
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
Instead of looking at queries which are meant to help you understand the querying tool, you should start by looking at the data which is being queried. For instance, just click http://dbpedia.org/resource/IBM, and look at the properties (the left hand column) to see its rdf:type values (of which there are MANY)!
Note that IBM is not described as a ?industry. IBM is described as a <http://dbpedia.org/resource/Public_company> (among other things). On the other hand, IBM is also described as having three values for <http://dbpedia.org/ontology/industry> --
<http://dbpedia.org/resource/Cloud_computing>
<http://dbpedia.org/resource/Information_technology>
<http://dbpedia.org/resource/Cognitive_computing>
I don't know whether these are what you're actually looking for or not, but hopefully what I've done above will start you down the right path to whatever you do want to get out of DBpedia.

Returning accented as well as normal result set via azure search filters

Does anyone know how to ensure we can return normal result as well as accented result set via the azure search filter. For e.g the below filter query in Azure search returns a name called unicorn when i check for record with name unicorn.
var result= searchServiceClient.Documents.SearchAsync<myDto>("*",new SearchParameters
{
SearchFields = new List<string> {"Name"},
Filter = "Name eq 'unicorn'"
});
This is all good but what i want is i want to write a filter such that it returns record named unicorn as well as record named únicorn (please note the first accented character) provided that both record exist.
This can be achieved when searching for such name via the search query using language or Standard ASCII folding search analyzer as mentioned in this link. What i am struggling to find out is how can we implement the same with azure filters?
Please let me know if anyone has got any solutions around this.
Filters are applied on the non-analyzed representation of the data, so I don’t think there’s any way to do any type of linguistic analysis on filters. One way to work around this is to manually create a field which only do lowercasing + asciifloding (no tokenization) and then search lucene queries that look like this:
"normal search query terms" AND customFilterColumn:"filtérValuèWithÄccents"
Basically the document would both need to match the search terms in any field AND also match the filter term in the “customFilterColumn”. This may not be sufficient for your needs, but at least you understand the art of the possible.
Using filters it won't work unless you specify in advance all the possibilities:
for example:
$filter=name eq 'unicorn' or name eq 'únicorn'
You'd better work with a different analyzer that will change accents to it's root form. As another possibility, you can try fuzzy search:
search=unicorn~&highlight=Name

MarkLogic Text Search: Return Results Based on Matches Inside Attributes

I am using Marklogic's search:search() functions to handle searching within my application, and I have a use case where users need to be able to perform a text search that returns matches from an attribute on my document.
For example, using this document:
<document attr="foo attribute value">Some child content</document>
I want users to be able to perform a text search (not using constraints) for "foo", and to return my document based on the match within the attribute #attr. Is there some way to configure the query options to allow this?
Typing in attr:"foo" is not a workable solution, so using attribute range constraints won't help, and users still need to be able to search for other child content not in the attribute node. I'm thinking perhaps there is a way to add a cts:query OR'd into the search via the options, that allows this attribute to be searched?
Open to any and all other solutions.
Thanks!
Edit:
Some additional information, to help clarify:
I need to be able to find matches within the attribute, and elsewhere within the content. Using the example above, searches for "foo", "child content", or "foo child content" should all return my document as a result. This means that any query options that are AND'd with the search (like <additional-query>, which is intended to help constrain your search and not expand it) won't work. What I'm looking for is (likely) an additional query option that will be OR'd with the original search, so as to allow searching by child node content, attribute content, or a mix of the two.
In other words, I'd like MarkLogic to treat any attribute node content exactly the same as element text nodes, as far as searching is concerned.
Thanks!!
You could accomplish this search with a serialized element-attribute-word cts query in the additional-query options for the search API. The element attribute word query will use the universal index to match individual tokens within attributes.
In MarkLogic 9 You may be able to use the following to perform your search:
import module namespace search = "http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
search:search("",
<options xmlns="http://marklogic.com/appservices/search">
<additional-query>
<cts:element-attribute-word-query xmlns:cts="http://marklogic.com/cts">
<cts:element>document</cts:element>
<cts:attribute>attr</cts:attribute>
<cts:text>foo</cts:text>
</cts:element-attribute-word-query>
</additional-query>
</options>
)
MarkLogic has ways to parse query text and map a value to an attribute word or value query.
First, you can use cts:parse():
http://docs.marklogic.com/guide/search-dev/cts_query#id_71878
http://docs.marklogic.com/cts.parse
Second, you can use search:search() with constraints defined in an XML payload:
http://docs.marklogic.com/guide/search-dev/query-options#id_39116
http://docs.marklogic.com/guide/search-dev/appendixa#id_36346
I'd look into using the <default> option of <term>. For details see http://docs.marklogic.com/guide/search-dev/appendixa#id_31590
Alternatively, consider doing query expansion. The idea behind that is that a end user send a search string. You parse it using search:parse of cts:parse (as suggested by Erik), and instead of submitting that query as-is to MarkLogic, you process the cts:query tree, to look for terms you want to adjust, or expand. Typically used to automatically blend in synonyms, related terms, or translations, but could be used to copy individual terms, and automatically add queries on attributes for those.
HTH!

Google Datastore partial string matching

I have read that there is the Search API. But it seems like this API does not exist for Node.JS.
How can I partially match strings for querying entities without knowing the full name of the attribute?
For example I want to select all users that start with a G. How can I accomplish this?
Thank you for your help!
While you cannot do a "true" partial string matching (i.e. contains) with Datastore, you can do a "begins with" query as described in this post:
Basically, create a composite inequality filter like this:
SELECT * FROM USER WHERE USERNAME >= 'G' AND USERNAME < 'G\ufffd'.
Here, \ufffd is the last valid unicode character.
This would return all entities with their usernames starting with 'G'. You can use the same technique for matching multiple characters (e.g. >= 'JA' and < 'JA\ufffd').
Note that the string values/indexes in the Datastore are case sensitive, so you need an indexed property with all characters in either lower case or upper case so you can perform the search accordingly.
You can also mimic a word search like this -
Let's say you have a property named name that stores the following values:
John Doe
John Smith
James Mike Murphy
To do a word search (find entities with word smith or james and murphy) - create another property (e.g. nameIndex) and store the words from name as an array property (note that all words are converted to lower case).
["john","doe"]
["john", "smith"]
["james", "mike" "murphy"]
Now you can do a word search using the nameIndex property -
SELECT * FROM Entity WHERE nameIndex = 'smith'
SELECT * FROM Entity WHERE nameIndex = 'james' AND nameIndex='murphy'
Again, note that the nameIndex need to store the data in a fixed case (lower or upper) and your query parameters should use that case. Also, OR queries not supported unless the client library you are using supports it (typically done by running multiple queries).
This approach won't work if your property has more than 1500 bytes of data (limit for indexed properties)
Again, the proposed solutions are not replacement for full text search engines, rather a couple of tricks you could do with Datastore alone and may satisfy simple requirements.
You can't perform partial match searches on the Datastore entities (let alone without knowing the name of the property/attribute). See Appengine Search API vs Datastore
And the Search API is, indeed, not available in the flexible environment (that includes Node.JS). A potential alternative is indicated the Search section in Migrating Services from the Standard Environment to the Flexible Environment:
The Search service is currently unavailable outside of the standard
environment. You can host any full-text search database such as
ElasticSearch on Google Compute Engine and access it from both
the standard and flexible environments.
UPDATE:
Node.JS is currently available in the standard environment as well, see:
Now, you can deploy your Node.js app to App Engine standard environment
Google App Engine Node.js Standard Environment Documentation

How to implement faceted search suggestion with number of relevant items in Solr?

Hi
I have a very specific need in my company for the system's search engine, and I can't seem to find a solution.
We have a SOLR index of items, all of them have the same fields, with one of the fields being "Type", (And ofcourse, "Title", "Text", and so on).
What I need is: I get an Item Type and a Query String, and I need to return a list of search suggestion with each also saying how meny items of the correct type will that suggested string return.
Something like, if the original string is "goo" I'll get
Goo 10
Google 52
Goolag 2
and so on.
now, How do I do it?
I don't want to re-query SOLR for each different suggestion, but if there is no other way, I just might.
Thanks in advance
you can try edge n-gram tokenization
http://search.lucidimagination.com/search/document/CDRG_ch05_5.5.6
You can try facets. Take a look at my more detailed description ('Autocompletion').
This was implemented at http://jetwick.com with Solr ... now using ElasticSearch but the Solr sources are still available and the idea is also the identical https://github.com/karussell/Jetwick
The SpellCheckComponent of Solr (that gives the suggestions) have extended results that can give the frequency of every suggestion in the index - http://wiki.apache.org/solr/SpellCheckComponent#Extended_Results.
However, the .Net component SolrNet, does not currently seem to support the extendedResults option: "All of the SpellCheckComponent parameters are supported, except for the extendedResults option" - http://code.google.com/p/solrnet/wiki/SpellChecking.
This is implemented using a facet field query with a Prefix set. You can test this using the xml handler like this:
http://localhost:8983/solr/select/?rows=0&facet=true&facet.field=type&f.type.prefix=goo

Resources