Solr Arabic search not working unless words are neighbors - search

So far my solr deployment for a set of Arabic data I have is working great. The stemming and normalization are all quite fantastic.
The problem now is that the arabic search does not work UNLESS the words all form a contiguous phase. For example, let's say the following phrase:
اسْمُهُ دَاوُدُ بْنُ أَبِي
works just fine and gives me the desired data. However, if I search:
اسْمُهُ دَاوُدُ أَبِي
Then I get 0 results. Notice the second line is merely missing one of the words from the line above.
I should be able to get results even if the words don't appear next to each other in the text itself.
Any ideas would be much appreciated. My schema is as follows:
<fieldType name="text_general_arabic" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="arabic_stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.ArabicNormalizationFilterFactory"/>
<filter class="solr.ArabicStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="arabic_stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.ArabicNormalizationFilterFactory"/>
<filter class="solr.ArabicStemFilterFactory"/>
</analyzer>
</fieldType>

try use the dismax instead of solr normal boolean query it gives u phrases search and cross fields search, because u do phrases query.
example:
q=أحمد+فنان+مجتهد&wt=json&indent=true&defType=edismax&qf=title_ar+title_en+title&mm=70%25&stopwords=true&lowercaseOperators=true
as you will see in the response i get a match but not exact match, check the expression above you will see mm("minimum match") is setted to 70% means that 70% of the phrase should be match. for more info see(https://wiki.apache.org/solr/ExtendedDisMax)
the result to the above query is :
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"lowercaseOperators": "true",
"mm": "70%",
"indent": "true",
"uf": "title_ar title_en title",
"q": "أحمد فنان مجتهد",
"qf": "title_ar title_en title",
"_": "1393151025195",
"stopwords": "true",
"wt": "json",
"defType": "edismax"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"id": "1",
"title": [
"ahmad is popular artist"
],
"title_en": [
"ahmad is popular artist"
],
"title_ar": [
"أحمد فنان مشهور"
],
"version": 1460824159992938500
}
]
}
}

Related

Sort Solr documents based on a substring from a multivalued field

Not sure if I can achieve this
I have the below documents in the index
{
"name": "nissan",
"type": "product",
"features":["build_100",
"stability_80"]
}
{
"name": "toyota",
"type": "product",
"features":["stability_100",
"design_30"]
}
{
"name": "Audi",
"type": "product",
"features":["build_70",
"design_100"]
}
For a search of build in the features field "design" I get doc 2 and 3 back from recall and my question is that is there a way I could sort/rank the documents based on the number after the "_", so that in the above case I would get doc3 first and then doc 2?
If this can be achieved by changing the document structure then that is also fine with me.
Index them as independent fields and make sure to enable docValues on them (enabled by default on recent version of Solr).
<dynamicField name="features_*" type="int" indexed="true" stored="true"/>
You then index each feature as a separate field:
"feature_design": 100,
"feature_build": 70,
and so on. Sorting by the field can then be done in the same was you'd sort on any other field (sort=feature_design).

Solr MultiPhraseQuery Not Returning Correct Results

I am having trouble creating a Solr search for substrings. For example, when a user searches for "Alfa Romeo Land Car", I want to only match complete brands (only "Alfa Romeo", not "Land Rover"). The way I am trying to do this is by creating shingles from my query and then trying to do an exact match against my "car brands" Solr core.
So if a user searches for "A B C", I would like to get the shingles [A, AB, ABC, B, BC, C].
But when I use the Solr configuration below, when I search for "A B C" (using EDisMax or the standard query parser) Solr returns nothing, but if search for "ABC" I get the matching result "ABC".
Here is my schema.xml file:
<field name="id" type="tint" indexed="true" stored="true" required="true"/>
<field name="name" type="text_exact" indexed="true" stored="true" required="true"/>
<field name="seoAlias" type="string" indexed="true" stored="true" required="true"/>
<fieldType name="text_exact" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="0" generateWordParts="0" catenateAll="1" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="0" generateWordParts="1" catenateAll="0" />
<filter class="solr.ShingleFilterFactory" outputUnigrams="true" outputUnigramsIfNoShingles="true" tokenSeparator="" maxShingleSize="5"/>
</analyzer>
</fieldType>
Here are the documents in my Solr core:
"response": {
"numFound": 7,
"start": 0,
"docs": [
{
"id": 1,
"name": "A B C D",
"seoAlias": "abce",
"_version_": 1524585748644233200
},
{
"id": 2,
"name": "A B C",
"seoAlias": "abce",
"_version_": 1524586301229105200
},
{
"id": 3,
"name": "B C D",
"seoAlias": "abce",
"_version_": 1524586311147585500
},
{
"id": 4,
"name": "A B",
"seoAlias": "abce",
"_version_": 1524586322261442600
},
{
"id": 5,
"name": "B C",
"seoAlias": "abce",
"_version_": 1524586329997836300
},
{
"id": 6,
"name": "C D",
"seoAlias": "abce",
"_version_": 1524586338173583400
},
{
"id": 7,
"name": "B",
"seoAlias": "abce",
"_version_": 1524652609127841800
}
]
},
In the Solr admin webpage, if I go to "Schema Browser", then select the field in question, and press "Load Term Info" I can see the following indexed terms:
6
/6 Top-Terms:
1
ABC
ABCD
BC
BCD
CD
AB
When I search for "A B C" I want the following shingles [ABC AB BC A B C]
but from debug query I get:
"response": {
"numFound": 0,
"start": 0,
"docs": []
},
"debug": {
"rawquerystring": "*:*",
"querystring": "*:*",
"parsedquery": "MatchAllDocsQuery(*:*)",
"parsedquery_toString": "*:*",
"explain": {},
"QParser": "LuceneQParser",
"filter_queries": [
"name:\"A B C\""
],
"parsed_filter_queries": [
"**MultiPhraseQuery**(name:\"(A AB ABC) (B BC) C\")"
],
I think that the problem may be related to MultiPhraseQuery. It creates what appear to be the correct shingles, but it seems that Solr does not search with these string. Does anybody know what I'm missing?
Thanks a lot in advance

two special signs separated with blanko not found by solr (e.g: ! !)

Two special signs separated with blanko not found by solr (e.g: ! !)
I have this index:
http://localhost:8983/solr/koolcha/get?id=547deb3649dbae548b0f0100
{
"doc": {
"status": "xxxxxx",
"updated": "2014-12-05T09:47:27Z",
"ns": "foo3.bags",
"created": "2014-12-02T16:39:18Z",
"_ts": 6.2177735253447e+18,
"label": "_DSC0571.tif",
"project": "xxxxx",
"assignee": "xxxxx",
"folderid": "! !",
"_version_": 1.5180111153642e+18,
"_id": "547deb3649dbae548b0f0100",
"bagid": "xxxxx"
}
}
When I try to search it by 'folderid'
http://localhost:8983/solr/koolcha/select?q=folderid:\!%20\!
solr do not find anything
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="q">folderid:\! \!</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>
If I put some other value it works, even with special signs e.g.'!!' will work.
Only combination of special signs with blanks return nothing.
Is this a bug in Solr? Or I do something wrong?
Besides escaping I think you have to quote the filter query:
http://localhost:8983/solr/koolcha/select?q=folderid:"\! \!"
And use the Lucene Query parser (which you probably already do).

Conditionally take out elements on Groovy Closure

I am using a Groovy library call ws-lite for web service testing. The way it works is it takes a closure and generate XML and send it to a web service end point.
See below for a simple example of what this closure looks like:
def bookXml = {
books {
book(available: "20", id: "1") {
title("Don Xijote")
author(id: "1", "Manuel De Cervantes")
}
book(available: "14", id: "2") {
title("Catcher in the Rye")
author(id: "2", "JD Salinger")
}
book(available: "13", id: "3") {
title("Alice in Wonderland")
author(id: "3", "Lewis Carroll")
}
}
}
Will generate XML in the request as below:
<?xml version="1.0" encoding="UTF-8"?>
<books>
<book available="20" id="1">
<title>Don Xijote</title>
<author id="1">Manuel De Cervantes</author>
</book>
<book available="14" id="2">
<title>Catcher in the Rye</title>
<author id="2">JD Salinger</author>
</book>
<book available="13" id="3">
<title>Alice in Wonderland</title>
<author id="3">Lewis Carroll</author>
</book>
</books>
In order to make my clients more flexible, I normally pass the data structure from my test to the client as a map:
def bookMap = [
books: [[
id : "1",
available: "20",
title : "Don Xijote",
author : [
id : "1",
name: "Manuel De Cervantes"
]
], [
id : "2",
available: "14",
title : "Catcher in the Rye",
author : [
id : "2",
name: "JD Salinger"
]
], [
id : "3",
available: "13",
title : "Alice in Wonderland",
author : [
id : "3",
name: "Lewis Carroll"
]
]
]
]
This is how the client looks like now:
def bookXml = {
books {
bookMap.books.book.each {
book(available: it.available, id: it.id) {
title(it.available.title)
author(id: it.author.id, it.author.name)
}
}
}
}
One thing I want to do is in the bookXml closure, is there a way that I can take out a tag, if the value in my data structure is null?
For example, if title of my first book is null in the map, then in the closure, it won't create this tag title for book one.
I know how this can be done in groovy collection using collectentries for map and collect for list, but I don't know much about transforming closure.
Can you please share some insight with me?
Thanks.
I do not have much knowledge of builders, but it seems that the question is about how to ignore keys will null values in a map.
This can be achieved by using the each() method with a two-arg closure. The two arguments passed to the closure in this case will be each entry's key and value.
To demonstrate -
def book = [
id : "1",
available: "20",
title : null
]
book.each {key, value->
if (value) {
println "$key->$value"
}
}
I highly doubt you can do what you want in a simple way. If you are not into ASTs, then a closure is not a data structure which you can manipulate easily.
IMO, you should make your input map consistent before passing it to bookXml. Other than that, stick to #diveshpremdeep answer.

Trying to implement Fusion Charts in xPages; chart not loading

I am attempting to load fusion charts into an xPage. Following this tutorial http://docs.fusioncharts.com/tutorial-getting-started-your-first-charts-building-your-first-chart.html
The chart is not loading and is showing chart loading. At the bottom of the page I get a few error messages in the js console.
Here is the code I am using for the page. (url changed from actual url)
<xp:this.resources>
<xp:script src="https://URL/Field/fplan.nsf/fusioncharts/js/fusioncharts.js"
clientSide="true">
</xp:script>
<xp:script
src="https://URL/Field/fplan.nsf/fusioncharts/js/themes/fusioncharts.theme.fint.js"
clientSide="true">
</xp:script>
</xp:this.resources>
<xp:scriptBlock type="text/javascript">
<xp:this.value><![CDATA[
FusionCharts.ready(function(){
var revenueChart = new FusionCharts({
"type": "column2d",
"renderAt": "chartContainer",
"width": "500",
"height": "300",
"dataFormat": "json",
"dataSource": {
"chart": {
"caption": "Monthly revenue for last year",
"subCaption": "Harry's SuperMart",
"xAxisName": "Month",
"yAxisName": "Revenues (In USD)",
"theme": "fint"
},
"data": [
{
"label": "Jan",
"value": "420000"
},
{
"label": "Feb",
"value": "810000"
},
{
"label": "Mar",
"value": "720000"
},
{
"label": "Apr",
"value": "550000"
},
{
"label": "May",
"value": "910000"
},
{
"label": "Jun",
"value": "510000"
},
{
"label": "Jul",
"value": "680000"
},
{
"label": "Aug",
"value": "620000"
},
{
"label": "Sep",
"value": "610000"
},
{
"label": "Oct",
"value": "490000"
},
{
"label": "Nov",
"value": "900000"
},
{
"label": "Dec",
"value": "730000"
}
]
}
});
revenueChart.render();
})]]></xp:this.value>
</xp:scriptBlock>
<div id="chartContainer">FusionCharts XT will load here!</div>
Uncaught TypeError: Cannot set property 'desc' of undefined fusioncharts.js:436(anonymous function) fusioncharts.js:436v.core fusioncharts.js:20v.registrars.module fusioncharts.js:19v.extend.register fusioncharts.js:22(anonymous function) fusioncharts.js:236
Uncaught TypeError: Cannot read property 'fn' of undefined fusioncharts.js:437(anonymous function) fusioncharts.js:437v.core fusioncharts.js:20v.registrars.module fusioncharts.js:19v.extend.register fusioncharts.js:22(anonymous function) fusioncharts.js:437
Uncaught TypeError: undefined is not a function fusioncharts.js:129(anonymous function) fusioncharts.js:129c fusioncharts.js:32H fusioncharts.js:32b.triggerEvent fusioncharts.js:36d.raiseEvent fusioncharts.js:36d.extend.render fusioncharts.js:70(anonymous function) chart.xsp:88(anonymous function)
If I drop an HTML file into the NSF as a file in the webcontent folder I am able to load the chart so at least I know the fusionchart files are sound.
Any assistance in how to implement this would be appreciated.
I guess I would start by moving the chart initialization code into a client side script library and adding that as a resource to see if that makes a difference.
Try loading the script in the following way:
<xp:this.resources>
<xp:headTag tagName="script">
<xp:this.attributes>
<xp:parameter name="type" value="text/javascript" />
<xp:parameter name="src" value=
"https://URL/Field/fplan.nsf/fusioncharts/js/fusioncharts.js" />
</xp:this.attributes>
</xp:headTag>
<xp:headTag tagName="script">
<xp:this.attributes>
<xp:parameter name="type" value="text/javascript" />
<xp:parameter name="src" value=
"https://URL/Field/fplan.nsf/fusioncharts/js/themes/fusioncharts.theme.fint.js" />
</xp:this.attributes>
</xp:headTag>
</xp:this.resources>

Resources