I'm using scrapy to scrape this kind of product. I want to scrape data as <li> between <b>Indication</b>and <b>Contre-indications</b> and then the next <b></b> for each there is not predictable keyword.
Here is the source code of the requested page.
<article class="col-md-10 col-md-push-1">
<p><b>Caractéristiques des croquettes pour chat Royal Canin Veterinary Diet - Urinary S/O LP 34 :</b>
</p><ul>
<li>struvite.</li>
<li>la vessie.</li>
<li>d'oxalate de calcium.
</li>
<li>maintien de la muqueuse vésicale </li></ul><p></p>
<p><b>Remarques :</b>
</p><ul>
<li> Urinary S/O Feline</li>
<li>chez le chat âgé, rénal avant la prescription de l'Urinary S/O Feline</li></ul><p></p>
<p><b>Indications :</b>
</p><ul>
<li>dissolution des calculs urinaires de struvite</li>
<li>gestion des récidives d’urolithiase à struvite et à oxalate de calcium dans un seul aliment</li></ul><p></p>
<p><b>Contre-indications :</b>
</p><ul>
<li>insuffisance rénale chronique, acidose métabolique</li>
<li>traitement avec des médicaments acidifiant l'urine</li>
<li>lactation, gestation, croissance</li></ul><p></p>
<p><b>Durée du traitement :</b> 5 à 12 semaines sont nécessaires pour obtenir la dissolution des calculs de struvites.<br>
P</p>
</article>
First approach : with regex, parse as free text. Didn't manage to obtain anything great with this regular formula (<b>[Ii]ndication[s]{0,1}.*?</b>)([\n\r]*.*)(<b>Contre-[Ii]ndication[s]{0,1}.*?</b>). It was working okay in the tester but the .re in Python wasn't finding any match. Okay let's move on.
Second Approach : I tried to extract using scrapy :
l.add_xpath('contre_indication','//*[#id="description-panel"]/div/article/b[starts-with(text(),"Contre-indications")]/following-sibling::ul/li/text()')
l.add_xpath('contre_indication','//*[#id="description-panel"]/div/article/p/b[starts-with(text(),"Contre-indications")]/following-sibling::ul/li/text()')
l.add_xpath('indication','//*[#id="description-panel"]/div/article/b[starts-with(text(),"Indication")]/following-sibling::ul/li/text()')
l.add_xpath('indication','//*[#id="description-panel"]/div/article/p/b[starts-with(text(),"Indication")]/following-sibling::ul/li/text()')
Sometimes the keyword xpath is a /b/ alone and sometimes a /p/b. This is the reason why there is two xpath for each.
Here at best I have the whole text between <li> but with not distinction of Indication/Contre-indications.
Expected output would be :
Indication : ["dissolution des calculs urinaires de struvite","gestion des récidives d’urolithiase à struvite et à oxalate de calcium dans un seul aliment"]
Contre-indication : ["insuffisance rénale chronique, acidose métabolique"..."lactation, gestation, croissance"]
I'm very keen to know the working approach of this kind of problem.
Kind regards
You can acomplish this with xpath selectors:
'//p[contains(b/text(),"Contre-indications")]/following-sibling::ul[1]/li/text()'
Explaining the xpath:
//p - select all paragraph nodes
[contains(b/text(),"Contre-indications")] - that contain some text in child node b's text
//following-sibling::ul[1] - select sibling of paragraph node that is first of unordered list kind.
//li/text() - select text of any children that are list nodes
If you run it in scrapy shell:
$ scrapy shell
> body = ...
> from parsel import Selector
> sel = Selector(text=body)
> sel.xpath('//p[contains(b/text(),"Indication")]/following-sibling::ul[1]/li/text()').extract()
['dissolution des calculs urinaires de struvite', 'gestion des récidives d’urolithiase à struvite et à oxalate de calcium dans un seul aliment']
> sel.xpath('//p[contains(b/text(),"Contre-indications")]/following-sibling::ul[1]/li/text()').extract()
['insuffisance rénale chronique, acidose métabolique', "traitement avec des médicaments acidifiant l'urine", 'lactation, gestation, croissance']
In our hybris schema, we are using LowerCaseFilterFactory. Also, the name/description is of type "text". Hence, solr treats 'ConsTRUCTION' and 'construction' the same way.
However, if I search for (two differnt keywords combined) toysChildren, then I get many results. However, toyschildren has no results.
So, wish to decode following code to understand what's letting it do that.
solrQuery-toysChildren:
q=_query_:"\{\!multiMaxScore\+tie%3D0.0\}\(\(code_text\:toysChildren\^90.0\)\
+OR\+\(keywords_text_en_mv\:toysChildren\^100.0\)\
+OR\+\(name_text_en\:toysChildren\)\)\
+OR\+\(\(keywords_text_en_mv\:toysChildren\~\^10.0\)\)\
+OR\+\(\(keywords_text_en_mv\:toysChildren\*\^50.0\)\
+OR\+\(name_text_en\:toysChildren\*\^45.0\)\)\
+OR\+\(\(keywords_text_en_mv\:\"toysChildren\"\^100.0\)\
+OR\+\(name_text_en\:\"toysChildren\"\~0.0\^90.0\)\)"
&sort=score+desc&start=0&rows=100&facet.field=gender_string_mv
&facet.field=price_gbp_string
&facet.field=categoryPath_string_mv
&facet.field=allCategories_string_mv
&facet.field=excludeFromGiftFinder_boolean
&facet.field=productVisible_boolean
&facet.field=category_string_mv
&facet.field=brand_string_mv
&facet.field={!ex%3Dfk8}productType_string
&facet.field=age_string_mv
&facet=true
&fq=productVisible_boolean:true
&fq={!tag%3Dfk8}productType_string:(BUNDLE+OR+REGULAR+OR+ESD)
&fq=(catalogId:"coreProductCatalog"+AND+catalogVersion:"Online")
&facet.sort=count
&facet.mincount=1
&facet.limit=50
&spellcheck=true
&spellcheck.q=toysChildren&spellcheck.dictionary=en
&spellcheck.collate=true
Note: Above is the solrQuery formed up in DefaultFacetSearchStrategy of type "SolrQuery".
Query response is:
{responseHeader={status=0,QTime=18,params={facet.field=[gender_string_mv, price_gbp_string, categoryPath_string_mv, allCategories_string_mv, excludeFromGiftFinder_boolean, productVisible_boolean, category_string_mv, brand_string_mv, {!ex=fk8}productType_string, age_string_mv],spellcheck.dictionary=en,start=0,sort=score desc,fq=[productVisible_boolean:true, {!tag=fk8}productType_string:(BUNDLE OR REGULAR OR ESD), (catalogId:"coreProductCatalog" AND catalogVersion:"Online")],rows=100,version=2,q=_query_:"\{\!multiMaxScore\ tie=0.0\}\(\(code_text\:toysChildren\^90.0\)\ OR\ \(keywords_text_en_mv\:toysChildren\^100.0\)\ OR\ \(name_text_en\:toysChildren\)\)\ OR\ \(\(keywords_text_en_mv\:toysChildren\~\^10.0\)\)\ OR\ \(\(keywords_text_en_mv\:toysChildren\*\^50.0\)\ OR\ \(name_text_en\:toysChildren\*\^45.0\)\)\ OR\ \(\(keywords_text_en_mv\:\"toysChildren\"\^100.0\)\ OR\ \(name_text_en\:\"toysChildren\"\~0.0\^90.0\)\)",facet.limit=50,spellcheck.q=toysChildren,spellcheck=true,facet.mincount=1,facet=true,wt=javabin,facet.sort=count,spellcheck.collate=true}},response={numFound=3,start=0,docs=[SolrDocument{indexOperationId_long=79, id=coreProductCatalog/Online/100310, pk=8796107702273, catalogId=coreProductCatalog, catalogVersion=Online, allCategoryCodes_string=/SM06010425/SM060104/SM0601, price_gbp_string=£0 - £19.99, allCategories_string_mv=[SM06010425, SM0601, SM060104], category_string_mv=[SM06010425, SM0601, SM060104], rating_double=5.0, totalReviews_int=1, productType_string=REGULAR, excludeFromGiftFinder_boolean=true, pictureJson_string={"240":"https://image.smythstoys.com/picture/desktop/100310.jpg","220":"https://image.smythstoys.com/picture/tablet/100310.jpg","180":"https://image.smythstoys.com/picture/mobile/100310.jpg"}, gender_string_mv=[Female], autosuggest_en=[Sylvanian Families, Toys, Fashion & Dolls, Sylvanian Children's Bedroom Furniture], spellcheck_en=[Sylvanian Families, Toys, Fashion & Dolls, With 2 beech-style beds which can be stacked on top of each to make, Sylvanian Children's Bedroom Furniture], categoryName_text_en_mv=[Sylvanian Families, Toys, Fashion & Dolls], productVisible_boolean=true, url_en_string=/toys/fashion-and-dolls/sylvanian-families/sylvanian-children-s-bedroom-furniture/p/100310, pictureMap_string={min-width:1200=https://image.smythstoys.com/picture/desktop/100310.jpg, min-width:768=https://image.smythstoys.com/picture/tablet/100310.jpg, max-width:768=https://image.smythstoys.com/picture/mobile/100310.jpg}, priceValue_gbp_double=11.99, categoryNamePath_string_mv=[Toys, Toys > Fashion & Dolls, Toys > Fashion & Dolls > Sylvanian Families], categoryMetaTitle_string=SM06010425_Sylvanian Families: Awesome deals only at Smyths Toys UK, categoryMetaDescription_string=Sylvanian Families! Shop for an excellent range. Watch out for great offers at Smyths Toys UK, code_text=100310, description_text_en=With 2 beech-style beds which can be stacked on top of each to make, name_text_en=Sylvanian Children's Bedroom Furniture, name_sortable_en_sortabletext=Sylvanian Children's Bedroom Furniture, brand_string_mv=[Sylvanian], age_string_mv=[6 - 8 Years, 3 - 5 Years], categoryPath_string_mv=[/SM0601/SM060104, /SM0601, /SM0601/SM060104/SM06010425], customCategoryPath_string_mv=[/curl/toys/c/SM0601, /curl/toys/c/SM0601/curl/toys/fashion-and-dolls/c/SM060104, /curl/toys/c/SM0601/curl/toys/fashion-and-dolls/c/SM060104/curl/toys/fashion-and-dolls/sylvanian-families/c/SM06010425], ukBestsellerRating_en_int=999999, ukBestsellerRating_sortable_en_int=999999, pictureUrl_string=https://image.smythstoys.com/picture/desktop/100310.jpg, _version_=1577946952144781312}, SolrDocument{indexOperationId_long=79, id=coreProductCatalog/Online/100471, pk=8796108128257, catalogId=coreProductCatalog, catalogVersion=Online, allCategoryCodes_string=/SM06010326/SM060103/SM0601, price_gbp_string=£0 - £19.99, allCategories_string_mv=[SM06010326, SM0601, SM060103], category_string_mv=[SM06010326, SM0601, SM060103], rating_double=4.3, totalReviews_int=4, productType_string=REGULAR, excludeFromGiftFinder_boolean=false, pictureJson_string={"240":"https://image.smythstoys.com/picture/desktop/100471.jpg","220":"https://image.smythstoys.com/picture/tablet/100471.jpg","180":"https://image.smythstoys.com/picture/mobile/100471.jpg"}, gender_string_mv=[Male], autosuggest_en=[Vtech Infant, Toys, Pre-School & Electronic Learning, Toy Story Mr. Potato Head], spellcheck_en=[Vtech Infant, Toys, Pre-School & Electronic Learning, Includes lots of accessories and a special compartment for, Toy Story Mr. Potato Head], categoryName_text_en_mv=[Vtech Infant, Toys, Pre-School & Electronic Learning], productVisible_boolean=true, url_en_string=/toys/pre-school-and-electronic-learning/vtech-infant/toy-story-mr-potato-head/p/100471, pictureMap_string={min-width:1200=https://image.smythstoys.com/picture/desktop/100471.jpg, min-width:768=https://image.smythstoys.com/picture/tablet/100471.jpg, max-width:768=https://image.smythstoys.com/picture/mobile/100471.jpg}, priceValue_gbp_double=9.99, categoryNamePath_string_mv=[Toys, Toys > Pre-School & Electronic Learning, Toys > Pre-School & Electronic Learning > Vtech Infant], categoryMetaTitle_string=SM06010326_Vtech Infant: Awesome deals only at Smyths Toys UK, categoryMetaDescription_string=Vtech Infant! Shop for an excellent range. Watch out for great offers at Smyths Toys UK, code_text=100471, description_text_en=Includes lots of accessories and a special compartment for, name_text_en=Toy Story Mr. Potato Head, name_sortable_en_sortabletext=Toy Story Mr. Potato Head, brand_string_mv=[Toy Story], age_string_mv=[9 - 11 Years, 6 - 8 Years, 3 - 5 Years], categoryPath_string_mv=[/SM0601/SM060103/SM06010326, /SM0601/SM060103, /SM0601], customCategoryPath_string_mv=[/curl/toys/c/SM0601, /curl/toys/c/SM0601/curl/toys/pre-school-and-electronic-learning/c/SM060103, /curl/toys/c/SM0601/curl/toys/pre-school-and-electronic-learning/c/SM060103/curl/toys/pre-school-and-electronic-learning/vtech-infant/c/SM06010326], ukBestsellerRating_en_int=999999, ukBestsellerRating_sortable_en_int=999999, pictureUrl_string=https://image.smythstoys.com/picture/desktop/100471.jpg, _version_=1577946952157364224}, SolrDocument{indexOperationId_long=79, id=coreProductCatalog/Online/100838, pk=8796111962113, catalogId=coreProductCatalog, catalogVersion=Online, allCategoryCodes_string=/SM060307/SM0603, price_gbp_string=£0 - £19.99, allCategories_string_mv=[SM060307, SM0603], category_string_mv=[SM060307, SM0603], rating_double=0.0, productType_string=REGULAR, excludeFromGiftFinder_boolean=false, pictureJson_string={"240":"https://image.smythstoys.com/picture/desktop/100838.jpg","220":"https://image.smythstoys.com/picture/tablet/100838.jpg","180":"https://image.smythstoys.com/picture/mobile/100838.jpg"}, gender_string_mv=[Female], autosuggest_en=[Sports Equipment, Outdoor, 8oz Childrens Boxing Gloves], spellcheck_en=[Sports Equipment, Outdoor, 8oz childrens boxing gloves., 8oz Childrens Boxing Gloves], categoryName_text_en_mv=[Sports Equipment, Outdoor], productVisible_boolean=true, url_en_string=/outdoor/sports-equipment/8oz-childrens-boxing-gloves/p/100838, pictureMap_string={min-width:1200=https://image.smythstoys.com/picture/desktop/100838.jpg, min-width:768=https://image.smythstoys.com/picture/tablet/100838.jpg, max-width:768=https://image.smythstoys.com/picture/mobile/100838.jpg}, priceValue_gbp_double=4.99, categoryNamePath_string_mv=[Outdoor, Outdoor > Sports Equipment], categoryMetaTitle_string=SM060307_Sports Equipment: Awesome deals only at Smyths Toys UK, categoryMetaDescription_string=Sports Equipment! Shop for an excellent range. Watch out for great offers at Smyths Toys UK, code_text=100838, description_text_en=8oz childrens boxing gloves., name_text_en=8oz Childrens Boxing Gloves, name_sortable_en_sortabletext=8oz Childrens Boxing Gloves, age_string_mv=[9 - 11 Years, 6 - 8 Years], categoryPath_string_mv=[/SM0603/SM060307, /SM0603], customCategoryPath_string_mv=[/curl/outdoor/c/SM0603, /curl/outdoor/c/SM0603/curl/outdoor/sports-equipment/c/SM060307], ukBestsellerRating_en_int=999999, ukBestsellerRating_sortable_en_int=999999, pictureUrl_string=https://image.smythstoys.com/picture/desktop/100838.jpg, _version_=1577946952254881793}]},facet_counts={facet_queries={},facet_fields={gender_string_mv={Female=2,Male=1},price_gbp_string={£0 - £19.99=3},categoryPath_string_mv={/SM0601=2,/SM0601/SM060103=1,/SM0601/SM060103/SM06010326=1,/SM0601/SM060104=1,/SM0601/SM060104/SM06010425=1,/SM0603=1,/SM0603/SM060307=1},allCategories_string_mv={SM0601=2,SM060103=1,SM06010326=1,SM060104=1,SM06010425=1,SM0603=1,SM060307=1},excludeFromGiftFinder_boolean={false=2,true=1},productVisible_boolean={true=3},category_string_mv={SM0601=2,SM060103=1,SM06010326=1,SM060104=1,SM06010425=1,SM0603=1,SM060307=1},brand_string_mv={Sylvanian=1,Toy Story=1},productType_string={REGULAR=3},age_string_mv={6 - 8 Years=3,3 - 5 Years=2,9 - 11 Years=2}},facet_ranges={},facet_intervals={},facet_heatmaps={}},spellcheck={suggestions={},collations={}}}
Schema.xml - Some snippets: snippet 1-
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true" />
<filter class="solr.ManagedStopFilterFactory" managed="en" />
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.ManagedSynonymFilterFactory" managed="en" />
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="English" />
</analyzer>
</fieldType>
schema.xml Snippet 2:
<field name="text" type="textgen" indexed="true" stored="false" />
schema.xml Snippet 3:
<fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
schema.xml Snippet 4:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
Most likely you have different results because of solr.WordDelimiterFilterFactory and the setting splitOnCaseChange=1 which will break toysChildren into toys and Children. And in case of toyschildren, there is no case change, so you will only have token toyschildren and that's exactly what make the difference in your query results.
You have several choices depends on what is expected behaviour of your system. You could turn of this setting or completely remove solr.WordDelimiterFilterFactory from fieldtype.
I'm doing the search with term compte-courant. I have thesarus file which contains thesaurus entry and it's value is "comptes-courants". When I do the search, search is returning the document which is containing the "comptes" or "compte". For those result documents <search:highlight> is not available. Please help to get the documents which contains only compte-courant or comptes-courants. Please find the attached search options and result content below.
Search Option
<options xmlns="http://marklogic.com/appservices/search">
<debug>false</debug>
<search-option>score-logtfidf</search-option>
<search-option>unfiltered</search-option>
<term>
<term-option>case-insensitive</term-option>
<term-option>diacritic-insensitive</term-option>
<term-option>punctuation-insensitive</term-option>
</term>
<quality-weight>5.0</quality-weight>
<return-constraints>false</return-constraints>
<return-facets>true</return-facets>
<return-qtext>true</return-qtext>
<return-query>false</return-query>
<return-results>true</return-results>
<return-metrics>true</return-metrics>
<return-similar>false</return-similar>
<transform-results apply="src-snippet" ns="/src-snippet"
at="/src-snippet.xqy">
<per-match-tokens>30</per-match-tokens>
<max-matches>4</max-matches>
<max-snippet-chars>200</max-snippet-chars>
<preferred-elements />
</transform-results>
<additional-query>
<cts:and-query xmlns:cts="http://marklogic.com/cts">
<cts:directory-query depth="infinity">
<cts:uri>/DOCS/</cts:uri>
</cts:directory-query>
<cts:word-query>
<cts:text xml:lang="en">compte-courant</cts:text>
<cts:text xml:lang="en">comptes-courants</cts:text>
</cts:word-query>
</cts:and-query>
</additional-query>
<sort-order type="xs:date" direction="descending">
<element ns="" name="sortabledate" />
<annotation>Sort by Date</annotation>
</sort-order>
<sort-order direction="descending">
<score />
</sort-order>
<grammar>
<starter strength="40" apply="grouping" delimiter=")">(</starter>
<starter strength="10" apply="prefix" element="cts:not-query">-</starter>
<joiner strength="30" apply="infix" element="cts:and-query"
tokenize="word">AND</joiner>
<joiner strength="20" apply="infix" element="cts:or-query"
tokenize="word">OR</joiner>
</grammar>
search:search("", $searchOptions, "1", "4") returns the following result.
Result
<search:response snippet-format="src-snippet" total="640"
start="1" page-length="10" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="" xmlns:search="http://marklogic.com/appservices/search">
<search:result index="1" uri="/DOCS/JK_KAJD-10194_MAR03.xml"
path="fn:doc("/DOCS/JK_KAJD-10194_MAR03.xml")" score="61440"
confidence="0.363169" fitness="0.534633">
<search:snippet>
<src-term />
<tool-tip>
<search:match
path="fn:doc("/DOCS/JK_KAJD-10194_MAR03.xml")/CASEDOC">Cour de justice de l'Union européenne, 4e chambre, 10 Novembre
2016 - n° C-156/15 JK_KAJD-10194_MAR03
</search:match>
</tool-tip>
<title>Cour de justice de l'Union européenne, 4e chambre, 10 Novembre
2016 - n° C-156/15
</title>
</search:snippet>
</search:result>
<search:result index="2" uri="/DOCS/JP_KID-630822_MAR03.xml"
path="fn:doc("/DOCS/JP_KID-630822_MAR03.xml")" score="61440"
confidence="0.363169" fitness="0.534633">
<search:snippet>
<src-term />
<tool-tip>
<search:match
path="fn:doc("/DOCS/JP_KID-630822_MAR03.xml")/CASEDOC">Conseil d'État, 3e sous-section, 16 Juillet 2015 - n° 388760
JP_KID-630822_MAR03
</search:match>
</tool-tip>
<title>Conseil d'État, 3e sous-section, 16 Juillet 2015 - n° 388760
</title>
</search:snippet>
</search:result>
<search:result index="3" uri="/DOCS/JP_KICA-0031257_MAR03.xml"
path="fn:doc("/DOCS/JP_KICA-0031257_MAR03.xml")" score="98304"
confidence="0.459376" fitness="0.676263">
<search:snippet>
<src-term>compte-courant</src-term>
<tool-tip>
<search:match
path="fn:doc("/DOCS/JP_KICA-0031257_MAR03.xml")/CASEDOC/*:body/*:content/*:judgments/*:judgment/*:judgmentbody/*:considerations[2]/p[13]/text">
ALORS QUE, D'AUTRE PART, en considérant que l'enregistrement des
redevances sur un
<search:highlight>compte-courant</search:highlight>
personnel valait mise à disposition des redevances de la location
gérance à M. X....
</search:match>
</tool-tip>
<title>Cour de cassation, 2e chambre civile, 9 Juillet 2015 – n°
14-21.758
</title>
</search:snippet>
</search:result>
<search:result index="4" uri="/DOCS/JP_KASS-0007470_MAR03.xml"
path="fn:doc("/DOCS/JP_KASS-0007470_MAR03.xml")" score="98304"
confidence="0.459376" fitness="0.676263">
<search:snippet>
<src-term>compte-courant</src-term>
<tool-tip>
<search:match>
ALORS QUE, D'AUTRE PART, en considérant que l'enregistrement des
redevances sur un
<search:highlight>compte-courant</search:highlight>
personnel valait mise à disposition des redevances de la location
gérance à M. X....
</search:match>
</tool-tip>
<title>Cour de cassation, 2e chambre civile, 9 Juillet 2015 – n°
14-21.755
</title>
</search:snippet>
</search:result>
<search:qtext />
<search:metrics>
<search:query-resolution-time>PT0S</search:query-resolution-time>
<search:facet-resolution-time>PT0S</search:facet-resolution-time>
<search:snippet-resolution-time>PT0.672S
</search:snippet-resolution-time>
<search:total-time>PT0.672S</search:total-time>
</search:metrics>
I think the issue you are running into is that the default search grammer treats the hyphen as "not with," so it's looking for "compte" or "comptes" without courants. It might be a different issue, but I know we had this on a recent project...try adding a different search grammar to your options. You can also wrap your search string in quotes so it is literal, but then you lose the "fuzzy" matching.
Here's a grammar that should work for this, if the root cause is what I think it is:
<grammar xmlns="http://marklogic.com/appservices/search">
<quotation>"</quotation>
<implicit>
<cts:and-query strength="20" xmlns:cts="http://marklogic.com/cts"/>
</implicit>
<starter strength="30" apply="grouping" delimiter=")">(</starter>
<joiner strength="10" apply="infix" element="cts:or-query"
tokenize="word">OR</joiner>
<joiner strength="20" apply="infix" element="cts:and-query"
tokenize="word">AND</joiner>
<joiner strength="30" apply="infix" element="cts:near-query"
tokenize="word">NEAR</joiner>
<joiner strength="30" apply="near2" consume="2"
element="cts:near-query">NEAR/</joiner>
<joiner strength="32" apply="boost" element="cts:boost-query"
tokenize="word">BOOST</joiner>
<joiner strength="35" apply="not-in" element="cts:not-in-query"
tokenize="word">NOT_IN</joiner>
<joiner strength="50" apply="constraint">:</joiner>
<joiner strength="50" apply="constraint" compare="LT"
tokenize="word">LT</joiner>
<joiner strength="50" apply="constraint" compare="LE"
tokenize="word">LE</joiner>
<joiner strength="50" apply="constraint" compare="GT"
tokenize="word">GT</joiner>
<joiner strength="50" apply="constraint" compare="GE"
tokenize="word">GE</joiner>
<joiner strength="50" apply="constraint" compare="NE"
tokenize="word">NE</joiner>
</grammar>