How can I exclude the Path property from being indexed?

How can I exclude the Path property from being indexed? - sharepoint

I have tried to exclude the Path property from being indexed from
Shared Services->Search Administration->managed Properties->Crawled Properties.
I have un-checked the check box that says "Include values for this property in the search index". for all three: "Basic:11(Text), Basic:9(Text), Web:2(Text)" mappings.
I have reset the crawl contents, and done a full crawl.
Yet, when searching, the search service still return items where only the path matched my search text.
Is this the correct way of excluding the Path property?
Update:
The reason for excluding the Path is that when searching with IsDocument:1 in order to just search documents, and the keyword "orange" for example, SharePoint will return hits where the path contains the word "orange" even if the documents in that path doesn't contain the keyword in question.

I do not really understand why you need to disable this? However, you cannot exclude the Path property as it is an integral part of Search and without it, the search results would not be linked. Keywords in the Path property (URL) is also an important factor in the relevance calculation.

Related

How do you construct an Azure Search query to return a wildcard search based solely on a specific field?

If I may have missed this in some other area of SO please redirect me but I don't think this is a duped question.
I am using Azure Search with an index field called Title which is searchable and filterable using a Standard Lucerne Analyzer.
When using the built-in explorer, if I want to return all Job Titles that are explicitly named Full Stack Developer I can achieve it this way:
$filter=Title eq 'Full Stack Developer'&$count=true
But if I want to retrieve all the Job Titles using a wildcard to return all records having Full Stack in the name this way:
$filter=Title eq 'Full Stack*'&$count=true
The first 20 or so records returned are spot on, but after that point I get a mix of records that have absolutely nothing in common with the Title I specified in the query. My initial assumption was that perhaps Azure was including my specified Title performing an inclusive keyword search on the text as well.
Though I found a few instances where that hypothesis seemed to prove out, so many more of the records returned invalidated that altogether.
Maybe I don't understand fully the mechanics under the hood of Azure Search and so though my query appears to be valid; my expectation of the result is way off.
So how should my query look to perform a wildcard resulting to guarantee the words specified in the search to be included in the Titles returned, if this should be possible? And what would be the correct syntax to condition the return to accommodate for OR operators to be inclusive?

Azure Cognitive Search allows you to perform wildcard searches limited to specific fields only. To do so, you will need to specify the name of the fields in which you want to perform the search in searchFields parameter.
Your search URL in this case would be something like:
https://accountname.search.windows.net/indexes/indexname/docs?api-version=2020-06-30&searchFields=Title&search=Full Stack*
From the link here:

Smart search results behaviour of compound index of multiple page types

Can someone confirm the behaviour of the Smart search results webpart when using a Smart search filter on a particular field, documentation here, when the index, and the expected results, are compound of multiple page types?
In my scenario I have 2 page types, one is always a child of the other, my hypothetical scenario would be a Folder and File types as an example.
I've configured the index with Pages type and Standard analyzer to include all Folder and File types under the path /MyOS/% on the tree.
The search page, includes the Smart search results webpart and a Smart search filter, a checkbox for the File's field FileIsHidden.
What I'm trying to ascertain is the possibility for the results to include all folders that have a hidden field, as well as the files?
Client has a v8.2 license and now has a requirement similar to this scenario.
Thanks so much for any help in advance.

Firstly what i would do is download the latest version of LUKE, it's a lucene inspector that allows you to run queries, inspect the data, etc.
https://code.google.com/archive/p/luke/downloads
Your search indexes are in the App_Data/Modules/SmartSearch/[SearchName], now i am not sure if LUKE can query 2 indexes as the same time, however you can run hte same query against both and see if it's filtering out results one way or another.
If you are trying to query where a field must be a value, and the other page type does not have the field, it probably is filtered out. What you need to do is use the lucene syntax to say "(classname = 'cms.file' and fileonlyproperty = '' OR classname <> 'cms.file')" so to say.
You'll have to test, but say the class name is cms.file and cms.folder, and the property is FileIsHidden, i think the syntax would be:
+((FieldIsHidden:(true) and classname:('cms.file')) OR (NOT classname:('cms.file'))
But you'll have to test that.

MarkLogic Text Search: Return Results Based on Matches Inside Attributes

I am using Marklogic's search:search() functions to handle searching within my application, and I have a use case where users need to be able to perform a text search that returns matches from an attribute on my document.
For example, using this document:
<document attr="foo attribute value">Some child content</document>
I want users to be able to perform a text search (not using constraints) for "foo", and to return my document based on the match within the attribute #attr. Is there some way to configure the query options to allow this?
Typing in attr:"foo" is not a workable solution, so using attribute range constraints won't help, and users still need to be able to search for other child content not in the attribute node. I'm thinking perhaps there is a way to add a cts:query OR'd into the search via the options, that allows this attribute to be searched?
Open to any and all other solutions.
Thanks!
Edit:
Some additional information, to help clarify:
I need to be able to find matches within the attribute, and elsewhere within the content. Using the example above, searches for "foo", "child content", or "foo child content" should all return my document as a result. This means that any query options that are AND'd with the search (like <additional-query>, which is intended to help constrain your search and not expand it) won't work. What I'm looking for is (likely) an additional query option that will be OR'd with the original search, so as to allow searching by child node content, attribute content, or a mix of the two.
In other words, I'd like MarkLogic to treat any attribute node content exactly the same as element text nodes, as far as searching is concerned.
Thanks!!

You could accomplish this search with a serialized element-attribute-word cts query in the additional-query options for the search API. The element attribute word query will use the universal index to match individual tokens within attributes.
In MarkLogic 9 You may be able to use the following to perform your search:
import module namespace search = "http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
search:search("",
<options xmlns="http://marklogic.com/appservices/search">
<additional-query>
<cts:element-attribute-word-query xmlns:cts="http://marklogic.com/cts">
<cts:element>document</cts:element>
<cts:attribute>attr</cts:attribute>
<cts:text>foo</cts:text>
</cts:element-attribute-word-query>
</additional-query>
</options>
)

MarkLogic has ways to parse query text and map a value to an attribute word or value query.
First, you can use cts:parse():
http://docs.marklogic.com/guide/search-dev/cts_query#id_71878
http://docs.marklogic.com/cts.parse
Second, you can use search:search() with constraints defined in an XML payload:
http://docs.marklogic.com/guide/search-dev/query-options#id_39116
http://docs.marklogic.com/guide/search-dev/appendixa#id_36346

I'd look into using the <default> option of <term>. For details see http://docs.marklogic.com/guide/search-dev/appendixa#id_31590
Alternatively, consider doing query expansion. The idea behind that is that a end user send a search string. You parse it using search:parse of cts:parse (as suggested by Erik), and instead of submitting that query as-is to MarkLogic, you process the cts:query tree, to look for terms you want to adjust, or expand. Typically used to automatically blend in synonyms, related terms, or translations, but could be used to copy individual terms, and automatically add queries on attributes for those.
HTH!

Solr title search failing

I am indexing the title field for few products in Solr.
But when I am searching, I am not getting those titles in response.
For eg. I am storing following as title : Baboons Typing Tshirt
But when I am searching following I am not getting any result !!!
1)title:Baboons
2)title:(Baboons Typing Tshirt)
3)title:(Baboons*)
On the otherhand, if I am searching like this, I am getting lot of results
1)title:(Tshirt)
I have indexed many titles containing word Tshirt but I want to search a specific title which is failing..!!
I dont know whether Solr is ignoring first words, or it is doing something random.
My Question is basically: If I have a search title with lots of words, I will like to match it with the title which contains maximum common terms.
How to do it?
Thanks

Solr works like that by itself. You don't have to change anything.
You have to be careful how you set up your fields in schema.xml, i.e. how analysis is done.
You can use Solr's admin > Analysis interface to see how exactly your title field (when indexing) and query (when searching) is processed (tokenized, transformed).
Remember, match, in order to occur, requires identical word (case and everything) on both sides (index & query).
To open your index and see how Solr has actually indexed your data, use Luke.

How to find related items by tags in Lucene.NET

My indexed documents have a field containing a pipe-delimited set of ids:
a845497737704e8ab439dd410e7f1328|
0a2d7192f75148cca89b6df58fcf2e54|
204fce58c936434598f7bd7eccf11771
(ignore line breaks)
This field represents a list of tags. The list may contain 0 to n tag Ids.
When users of my site view a particular document, I want to display a list of related documents.
This list of related document must be determined by tags:
Only documents with at least one matching tag should appear in the "related documents" list.
Document with the most matching tags should appear at the top of the "related documents" list.
I was thinking of using a WildcardQuery for this but queries starting with '*' are not allowed.
Any suggestions?

Setting aside for a minute the possible uses of Lucene for this task (which I am not overly familiar with) - consider checking out the LinkDatabase.
Sitecore will, behind the scenes, track all your references to and from items. And since your multiple tags are indeed (I assume) selected from a meta hierarchy of tags represented as Sitecore Items somewhere - the LinkDatabase would be able to tell you all items referencing it.
In some sort of pseudo code mockup, this would then become
for each ID in tags
get all documents referencing this tag
for each document found
if master-list contains document; increase usage-count
else; add document to master list
sort master-list by usage-count descending
Forgive me that I am not more precise, but am unavailable to mock up a fully working example right at this stage.
You can find an article about the LinkDatabase here http://larsnielsen.blogspirit.com/tag/XSLT. Be aware that if you're tagging documents using a TreeListEx field, there is a known flaw in earlier versions of Sitecore. Documented here: http://www.cassidy.dk/blog/sitecore/2008/12/treelistex-not-registering-links-in.html

Your pipe-delimited set of ids should really have been separated into individual fields when the documents were indexed. This way, you could simply do a query for the desired tag, sorting by relevance descending.

You can have the same field multiple times in a document. In this case, you would add multiple "tag" fields at index time by splitting on |. Then, when you search, you just have to search on the "tag" field.

Try this query on the tag field.
+(tag1 OR tag2 OR ... tagN)
where tag1, .. tagN are the tags of a document.
This query will return documents with at least one tag match. The scoring automatically will take care to bring up the documents with highest number of matches as the final score is sum of individual scores.
Also, you need to realizes that if you want to find documents similar to tags of Doc1, you will find Doc1 coming at the top of the search results. So, handle this case accordingly.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can I exclude the Path property from being indexed? - sharepoint

I do not really understand why you need to disable this? However, you cannot exclude the Path property as it is an integral part of Search and without it, the search results would not be linked. Keywords in the Path property (URL) is also an important factor in the relevance calculation.

Related

How do you construct an Azure Search query to return a wildcard search based solely on a specific field?

Smart search results behaviour of compound index of multiple page types

MarkLogic Text Search: Return Results Based on Matches Inside Attributes

Solr title search failing

How to find related items by tags in Lucene.NET

Categories

Resources