Searching with space not working on specific field by Query String - search

Searching "follow back" in Twitter User's description field that I have indexed already with following mapping.
Note: Only Highlight some of mapping.
1.
'analysis' => array(
'analyzer' => array(
'myanalyzer' => array(
"type" => "standard",
'stopwords' => '_none_',
),
)
)
2.
$mapping->setParam('index_analyzer', 'myanalyzer');
$mapping->setParam('search_analyzer', 'myanalyzer');
3.
'description' => array('type' => 'string', "index" => "not_analyzed"),
4.
//search something
$queryString = new \Elastica\Query\QueryString();
$queryString->setDefaultOperator( "AND" );
// $queryString->setFields(array("user.description"));
$queryString->setQuery('follow back');
When Searched while setFields is commented it gives me lot of results like
IF YOU FOLLOW ILL FOLLOW BACK! :) 100% follow back! :)
Follow me i follow back :) instagram:juliemar25 i follow back
But after uncomment setFields and defaultOperator to AND, then it show no results.
AND by uncomment setFields and defaultOperator to OR, it shows me only results that have "follow" in description nothing else.
Q1: Why white space not working on setFields instead working with _all?
While Using Match Query
$matchQuery = new \Elastica\Query\Match();
$matchfield = "user.description";
$queryToMatch = "follow back";
$matchQuery->setFieldQuery($matchfield, $queryToMatch);
It also show only two results that have "follow back" only in description. But after changing to match field to _all it show lot of results that contains "follow back" in description field
Q2. Why it is happening? How can I search for space separated words?

This is because you have set "description" field to be not_analyzed as per the mapping above.
This would result in the description field payload being indexed as is and a match occurs when the 'description' field is an exact search phrase which in this case is "follow back"
Removing "index" => "not_analyzed" should fix it.

Related

Is it possible to find documents using incomplete words and ArangoSearch?

For example, let's pretend my document contained the attribute "description" and the value of it was "Quick brown fox". Could ArangoSearch use the input, "Quic" and be able to find the document that contains the description, "Quick brown fox"?
As far as I know, ArangoSearch can only find matches if the token/word is completed. Is this true?
Here's some query code to show what I'm talking about. If the binding variable, #searchInputValue, takes the value of "Quic", it won't find the document, but if it takes the value of "Quick", it does find the document.
FOR document IN v_test
SEARCH ANALYZER(
(
document.description IN TOKENS('#searchInputValue', 'text_en')
)
, 'text_en'
)
RETURN document
You can use the FULLTEXT function of AQL:
https://docs.arangodb.com/3.0/AQL/Functions/Fulltext.html
However you can't write the prefix syntax directly in AQL when using Input Parameters. You have to foormat the searchInputValue, to pass:
Quic,+prefix:Quic
So you can write your query as:
FOR res IN FULLTEXT(v_test, "description", #searchInputValue)
RETURN res

How can I search the special characters in Solr

I'm used Solr 6.6.2
I need to search the special characters and highlight it in Solr,
But it does not work,
my data :
[
{
"id" : "test1",
"title" : "test1# title C# ",
"dynamic_s": 5
},
{
"id" : "test2",
"title" : "test2 title C#",
"dynamic_s": 10
},
{
"id" : "test3",
"title" : "test3 title",
"dynamic_s": 0
}
]
When I search "C#",
Then it will just response like this "test1# title C# ",
It just highlights "C" this word...and "#" will not searching and highlight.
How can I make the search and highlight work for special characters?
The StandardTokenizer splits tokens on special characters, meaning that # will split the content into separate tokens - the first token will be C - and that's what's being highlighted. You'll probably get the exact same result if you just search for C.
The tokenization process will make your tokens end up being test2 title C .
Using a field type with a WhitespaceTokenizer that only splits on whitespace will probably be a better choice for this exact use case, but it's impossible to say if that'll be a good match for your regular search behavior (i.e. if you actually want to match 'C' to `C-99' etc., splitting by those characters can be needed). But - you can use a specific field for highlighting, and that fields analysis chain will be used to determine what to highlight. And you can ask for both the original and the more specific field to be highlighted, and then use the best result in your frontend application.

Logstash Filter : syntax

Ive recently began learning logstash and the syntax is confusing me.
eg : for match i have various codes:
match => [ "%{[date]}" , "YYYY-MM-dd HH:mm:ss" ]
match => { "message" => "%{COMBINEDAPACHELOG}" }
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
What does each of these keys ("%{[date]}", "message", "timestamp") mean. And where can i find a proper documentation that explains all the keywords and syntax.
Please help and provide links if possible.
The grok{} filter has a match parameter that takes a field and a pattern. It will apply the pattern, trying to extract new fields from it. Your second example is from grok, so it will try to apply the COMBINEDAPACHELOG pattern against the text in the "message" field.
The doc for grok{} is here, and there are detailed blogs, too.
The other two examples look like they're from the date{} filter, which does a similar thing. It takes a field containing a string that represents a date, applies the given pattern to that field, and (by default) replaces the value in the #timestamp field.
The doc for date{} is here and examples here.

Elasticsearch: How to get the length of a string field(before analysis)?

My index has a string field containing a variable length random id. Obviously it shouldn't be analysed.
But I don't know much about elasticsearch especially when I created the index.
Today I tried a lot to filter documents based on the length of id, finally I got this groovy script:
doc['myfield'].values.size()
or
doc['myfield'].value.size()
both returns mysterious numbers, I think that's because of the field got analysed.
If it's really the case, is there any way to get the original length or fix the problem, without rebuild the whole index?
Use _source instead of doc. That's using the source of the document, meaning the initial indexed text:
_source['myfield'].value.size()
If possible, try to re-index the documents to:
use doc[field] on a not-analyzed version of that field
even better, find out the size of the field before you index the document and consider adding its size as a regular field in the document itself
Elasticsearch stores a string as tokenized in the data structure ( Field data cache )where we have script access to.
So assuming that your field is not not_analyzed , doc['field'].values will look like this
"In america" => [ "in" , "america" ]
Hence what you get from doc['field'].values is a array and not a string.
Now the story doesn't change even if you have a single token or have the field as not_analyzed.
"america" => [ "america" ]
Now to see the size of the first token , you can use the following request
{
"script_fields": {
"test1": {
"script": "doc['field'].values[0].size()"
}
}
}

Foursquare returns "Invalid mention" when adding a check-in

I'm trying to add a check-in with a mention, but it fails returning:
'meta' =>
array (
'code' => 400,
'errorType' => 'param_error',
'errorDetail' => 'Invalid mention: (10,27,2147775)',
)
These are the parameters I used:
array (
'venueId' => '4d9c6d228efaa14376464cb7',
'shout' => ' — with Rihards Ščeredins',
'll' => '56.9262,24.02096',
'mentions' => '10,27,2147775',
)
I used the same syntax in the shout message as Foursquare does " — with " followed by the mentioned user names.
Why it fails to add the check-in? Does the mentioned friends name in shout has to match the user's Foursquare name? Cause currently it does not..
Update:
I thought it could be related to multi-byte character length but after playing around these both fail with the same error.
Using mb_strlen and mb_strpos:
'shout' => ' — with Rihards Ščeredins' with indices '8,25,2147775'
Using strlen and strpos:
'shout' => ' — with Rihards Ščeredins' with indices '10,29,2147775'
Any idea what else could I try?
The first two numbers of the comma-delimited string are the indices in the string where the mention goes. Your string has 25 characters, yet you index all the way out to 27. Your mentions param should probably be something like "8,25,2147775"

Resources