We have files in a SharePoint Online library's that got migrated from an old network drive, unfortunately all the files have names like "ReportOnTheMexicoArea20010101.doc". If a user wanted to find that exact file they may search for "Mexico" but that will not return this file as its characters within a string, is there any custom query or trick to search characters in the middle of a file name?
Thanks in advance
SharePoint does not support suffix wildcard search, it only support for prefix matching.When you use words in a free-text KQL query, Search in SharePoint returns results based on exact matches of your words with the terms stored in the full-text index. You can use just a part of a word, from the beginning of the word, by using the wildcard operator (*) to enable prefix matching. In prefix matching, Search in SharePoint matches results with terms that contain the word followed by zero or more characters.
For example: Report*
I would suggest you expand your approach for tagging your documents. I would suggest you use SharePoint metadata to tag your documents rather than just creating a title.
Related
I recently came across following problem: When applying a topic model on a bunch of parsed PDF files, I discovered that content of the references unfortunately also counts for the model. I.e. words within the references appear in the tokenized list of words.
Is there any known "best-practice" to solve this problem?
I thought about a search strategy where the python code automatically removes all content after the last mention of "references" or "bibliography". If I would go by the first, or a random mention of "references" or "bibliography" within the full text, the parser might not capture the true full content.
The input PDF are all from different journals and thus have a different page structure.
The syntax is what makes a bibliography entry distinct from a regular sentence.
Test for the pattern that coincides with whatever (or multiple) reference styles you are trying to remove.
Aka date, unquoted string, string, page numbers in a certain format.
I'd spend some time searching for a tool that already recognizes bibliography before doing this, as it will be unique to each style (MLA etc.)
Couple of additional features to consider for detecting the start of reference setion
Check if the mention of "references" or "bibliography" is in the last pages as opposed to earlier pages
Run entity recognition on some length of words (~50?) after the word and if a high number of tokens in the 50 are entities, that indicates journal names, author names, etc.
We are a listings/business directory company that uses Apache Solr 4.7.2. When we do a search for "suits" in "Melbourne", our top two results are hotels that contain the word "suites" and the rest of the results are tailors, clothing retailers, etc., as expected. How do I prevent Solr from including hotels/suites in a search for "suits"?
This is due to stemming. There are two ways to handle it:
Disable stemming completely by removing the stemming filter from schema.xml
use KeywordMarkFilter if you just want to exclude specific keywords from being stemmed. In this particular case you would create a protwords.txt file with two lines, "suits" and "suites" (and any other keyword you want to protect from stemming)
I know SharePoint 2010 supports wildcard searches in the following format "testsearch*", but not as "*testsearch".
I have files in a document library with the following file names:
208-12-60-LI-NK-SE-002
208-12-60-LI-NK-SE-003
208-12-60-LI-NK-SE-004
208-12-60-LI-NK-SE-005
209-12-60-LI-HK-SE-002
209-12-60-LI-HK-SE-003
209-12-60-LI-HK-SE-004
209-12-60-LI-HK-SE-005
In any case using wildcard searches returns results if the following search query strings are used "HK-SE-002*" or "HK-SE*". Search results are also returned if the (*) is not omitted as "HK-SE-002" and "HK-SE".
However as soon as you add more to the search string as "LI-HK-SE-002*" the wildcard searches fails to return any results. SharePoint also fails to return any results if the (*) is omitted from the search string as "LI-HK-SE-002".
I have tested this scenario on a LIVE environment as well as on a DEV VM.
Is this a limitation of the SharePoint search functionality?
Or Are there any additional configurations that need to be made?
Any suggestions would be welcome.
Thanks,
I am trying to make an application which find all the copied code in a project.
But basically my question is purely related to google search.
I made a search for the keyword "public void bubbleSort(int[] arr){"
and this was the result.
In the first page of search results, only the last url makes a perfect match with my keyword.
Can i tell google with some search keywords so that it will give more importance to pages with an exact match of my search keyword?
although the plus sign, +, is no longer an available Google search filter, you can use quotes, or after running the query selecting Search Tools and then verbatim under the All Results drop down.
You can also search the Google code archives, https://code.google.com/ or try some of the other code search engines around the Internet.
+"public void bubbleSort(int[] arr){"
the plus sign means to include this term no matter what. the quotes turn the loosely coupled words into a single term.
for a full list of Google syntax operators:
[web]: https://support.google.com/websearch/answer/136861?hl=en
I am using the Lucene search engine but it only seems to find matches that occur at the beginning of terms.
For example:
Searching for "one" would match "onematch" or "one day a time" but not "loneranger".
The Lucene doc says it doesnt support wildcards at the front of a search string so I am not sure whether Lucene even searches inter-term matches or only can match documents that start with the search term.
Is this a problem with how I have created my index, how I am building my search query or just a limitation of Lucene?
Found some info in another post here on Stack Overflow [LUCENE.NET] Leading wildcard throws an error"
You can set the SetAllowLeadingWildcardCharacters property on your Query Parser to allow leading wildcards during your search. This will of course have the obvious large performance impact but will allow user to find matches within a search term.
Lucene will find a document if the search term appears anywhere within it, but it doesn't allow you to do wildcard queries where the wildcard is on the front of the search term, because it performs horribly. If that is functionality you care about, you will either have to do some low-level Lucene hacking change a config flag (thanks for the interesting link), find a third-party library that has already done that hacking, or find a different search implementation (for small enough datasets, the built in search from a lot of RDBMS engines is sufficient).
Your query should be
"Query query = new WildcardQuery(new Term("contents", "*one *"));"
where contents is the field name in which you are searching.
"one" should be enclosed with asterisk mark. I have given space in the query after *one but there should not be any space. without space the * is not displaying that is why I added star.