How can I crawl but not index web pages in OpenSearchServer? - search

I'm using OpenSearchServer to provide search functionality on a web site. I want to crawl all pages on the site for links to follow but I want to exclude some pages from the index. I can't work out how to do this.
Specifically the website includes a shop that has its own product search and I am keeping this search for products and categories. The product pages have URLs like http://www.thesite/p/123 so I don't want to include any page like this in the search results. However some product pages reference background info pages and I want these to be included in the search index.
The problem I have is that the filter has no effect on the results - it doesn't filter out the /p/ and /c/ results. If I change the filter by unticking the negative box I get no results so it seems to be either the contents of the field or the filter criteria that is causing the problem.
I've tried adding a negative filter to the default query called search in the Query > Filter tab on the index with url:"http://www.thesite/p/*"
but it seems that wildcards are not supported for query filters although they are supported for Crawler > Exclusion list filters.
I've tried adding a new field called urlField in Schema > Fields and populating it using an analyzer configured using the Whitespace Tokenizer and a regular expression (http://www.thesite/(c|p)/). When I use the Test button it seems to generate two tokens for my test URL http://www.thesite/p/123:
http://www.thesite/p/
p
I'd hoped to be able to use the first one in a Query > Filter to exclude all the shop results and optionally be able to use the p (for product) or c (for category) if I need to search the product pages sometime in the future.
The urlShop field in the schema is set up as follows:
Indexed: yes
Stored: no (because I don't need the field back, just want to be able to filter on it)
TermVector: No
Analyzer: urlShop
Copy of: url
I've added urlFilter:"http://www.thesite/p/" to Query > Filters with the negative box ticked.
This seems to have no effect on the results when I use the default renderer.
To see whether it affects the returned results I unticked the negative box in the query filter I get no results in the default renderer. This leads me to believe that the urlShop field is not being populated but I'm not sure how to check this directly.
I would like to know whether there is an easier way to do this but if my approach makes sense in the context of OpenSearchServer please can you help me identify what's wrong?
The website is running under IIS and OpenSearchServer will be configured on the same server running in Tomcat.

Finally figured this out...
Go to query and hit edit for your configured query. Then go to the filters tab. Add a query filter like this:
urlExact:"http://myurltoexclude*"
Check the "negative" box. Click add.
Now make sure to click "save in the tiny little button on the right hand side. This is the part I missed. The URLS are still in the DB and crawl, but at least they aren't returned in results.

Related

Is there a way to send a filter to a Sharepoint List and view the results in a web part?

I've been tasked with setting up an advanced filter for a customer facing portal - as part of this, I have filters set up for various tags such as Priority, Site, Application, Reference. I've come up with a way to generate a URL that goes to the list and sets the filters to the selections given, by giving an html navigation command using a JavaScript function to splice together the URL.
This works fine for use within my team, however customers shouldn't have full access to our lists, and we have a customer-facing portal that currently has a limited view of these lists displayed to them, with much fewer details and none of our internal notes. These limited views are done through a web part that displays the Soaped in view - this is where I'm wanting to have the filtered list show up, as it's a more user-friendly way to search the list for information that the customer would want.
You can add a Query String (URL) Filler Wepart to the page, configure it for the param in the URL and configure connection or the List WebPart to receive filter value from the Query String Filler.
Add QSF webpart:
Configure the WP:
Establish Connection (go to the WP Edit mode first)
Test the URL param: ...sharepoint.com/sites/dev/SitePages/TestPage.aspx?Disp=1

Kentico - WHERE condition for custom Page Types page

I have a custom page type for employees, and one of the fields is Location. I want to show/filter only employees in "San Jose" or "San Francisco" and used this WHERE condition below but it didn't work. Apparently, I missed something very basic. Could you help?
Location LIKE '%San%';
I did another test, where instead of page type, I used custom table with the exact field names and was able to filter using the same statement. On a related note, I'm new to Kentico and exploring which is more suitable for creating/maintaining a list of about 100 employees - Page Types or Custom Tables - with the ability to filter by department, location etc. Appreciated your input here as well. Best!
If you're adding the WHERE condition into a standard Kentico repeater or other data source, the syntax looks right except you do not need the semicolon ";".
You'll also want to double check the field name, and if you are limiting your query to certain columns (as is best practice especially for larger data sets) and be sure the field you are filtering on is being selected.
Regarding the management of your employee list, either method you've described will work. In that scenario it typically depends on who will be editing the content, and how frequently. It is more editor-friendly, in my opinion, to add those documents into the content tree. This also gives you quicker control over the order, and keeps it similar to how other content on the site is maintained. I also like to set up folders or other parent page types as categories if needed, so the documents can be dragged and dropped between them and it sets up a visual taxonomy that isn't possible if it's all stored in a table. Storing items in the tree also allows for workflow and versioning, as well as more granular control over permissions/access, if this is important to you.
It's awesome that you are thinking about how to best store your data in advance. There many factors to consider such as overall number of records, number of columns, the fact whether you need to use workflow, versioning, preview etc..
The best source of information regarding this would be this article which summarizes all options you have and gives clear explanations of which to use in which scenario.
And to your original question - What components are you using to display the data? Is the repeater? If so, can you make sure to set the Page types property to match the page type you are displaying? If the page types is not configured, Kentico will not load any custom fields because it doesn't know from which table it should load the data from.
Additionally make sure to either include the "Location" field in the Columns property or leave the columns blank (not recommended because then Kentico loads all columns which is like 200. when you count all from CMS_Document, CMS_Tree etc..)
Below is the framework that I use to debug whenever I wish to add a repeater and is facing some problem.
First get all the columns instead of accessing limited columns. Fetching all columns will make sure that I don't have any problem retrieving data.
If I am missing any particular column information name, then I would double check the column name.
I verify this by firing up SQL server management studio and access data from page type table or custom table.
If access to SQL server is not available(generally in Azure hosted solutions with restrictive access to DB), I would enable SQL debug from the settings and see what query repeater is generating to see if it is correct.

Does Drupal provides interchanged/partial word search facility?

Does Drupal provides interchanged/partial word search for example: If i'm searching for "search term" should also return search result says that "term search".
Actually there's a quick way to set this up with the Search API and the Search API Database Search modules. No need for Solr.
Enable both the modules and go to the Search API configuration page
(admin/config/search/search_api)
Add a server and give it a recognizable name (such as "MySQL"), make sure it's enabled
Choose the following service class: Database service (normally the only option. If you don't see it, make sure that the "Search API Database Search" module is enabled).
Selecting this service class will provide you the option Search on parts of a word (make sure you enable it). Now hit "Create server". Next, go back to the Search API configuration page, and add an index. Again give it a recognizable name, such as "Contentindex", and make sure it's enabled.
Select the server you recently created (in this example "MySQL"). Datasource options: check the content types you want to index (the description of this field may be a bit confusing, but make no mistake: you should check the content types you want to index, not the ones you want to leave out).
You could check "Index items immediately", but that depends on the size of your content. This option is alright for smaller websites. In the other case: let cron take care of it. Hit "Create index".
Next, you want to enable the fields (in the fields tab) to that very index, e.g.: contant type, title, status and author - and hit "Save changes" (we'll take care of the related fields later).
Then you want to configure the filters (in the filter tab). Here you can enable the bundle filter, if you want to finetune the content types once again (probably won't need to).
Down below you'll find the highlight filter, to highlight the search word in the results.
Now return to the fields tab and unfold the "Add related fields" tab. Add all the fields of which you want to get the content indexed, and hit Save Finally go to the "view" tab of the index, and index now all the items.
The last step is to create search results page and search block in Views, which only takes a few minutes.
Make sure the submodule Search Views is enabled and create a view page which will show "Contentindex" (select the name you gave to your index configuration instead of "content" which is selected by default.
Add all the fields (make sure your format is "fields") that every search result should consist of (title [link to content], body, …).
You'll notice that these fields look like this Indexed Node: Title
Add an exposed filter: Search: Fulltext search; and select the following in its settings: Contains any of these words, and use as: Search keys.
You're done!
There's an excellent video tutorial on this partial search method without Solr, which also incorporates faceted search (which is not necessary for partial search, but in case you're interested…). If you're only looking for a partial search option, you can watch the video until 5:08 and after that skip to 6:45.
Reference:
https://www.drupal.org/node/84631

Get listitems from Current Site

I am trying to create a Page Layout, that should have a lookup field. Lookup field should always get populated with a list's items.
This list will exist in all subsites, so whereever I create this page, list should get populated with listitems from current site.
I tried using site column lookup field, but it always point to list under top site and not the current site.
Any suggestion on how to make it work or better alternative? Thanks!
Let me know if I can provide more info.
The most straightforward solution I can think of is using a cross site lookup column and creating a seperate fields for each subsite. However, you will need to create and use different Page Layouts for each subsite.
You can use http://sp2010filteredlookup.codeplex.com/ for cross site lookups.
Solution 1 - Use http://sp2010filteredlookup.codeplex.com/
Use filtered lookup solution. So let's say you have your custom Page Layout and custom Page Content Type.
Every time you create new subsite, you should remember go to Pages list settings and edit Page Content Type by adding cross site lookup (with the same field "internal name").
So you still have one Page Layout (and one Content Type). But for each Pages library instance, Content Type contains diff fields (but with the same Internal Name). It will allow you run CAML queries and other things needed without any problems.
Solution 2 - develop custom sharepoint field type.
In edit mode, control will render "dropdown list" and populate data from list instance that is on current subsite. In the field settings you can have relative list url.
Solution 3 - hidden text field / js snippet solution
Page Content Type can contain hidden text field (it can contains selected field value in json format for example). Develop js snippet that will handle all the logic (rendering in edit/view mode, saving etc) and put it on Page Layout (aspx).
I would suggest to use solution #1 or #2.

Filter a Content Query Web Part using the values of another web part on the same page

I have a Content Query Web Part on my page that rolls up all the contacts lists from all the sub-webs in my site collection. this works fine. I'd like to be able dynamically filter the contacts rollup by having the user click on a list of leters of the aphhabet at the top of the page. click A and see the contacts that start with A, etc...
I'm plopping various filter web parts on the page, but don't see how to filter the results of the CQWP. The connections menu is not much help.
You can't use the OOB filter webparts or CQWP like that.
What you can do is extend the CQWP and add some functionality to it - take a look at Enhanced Content Query Web Part over at codeplex.com for inspiration.
Then send the clicked letter to the QueryString and have your extended CQWP read the filter value from the querystring - this would perform really well
Generally, this sort of thing requires some development. Here's one source:
http://www.andrewconnell.com/blog/archive/2008/02/18/Subclassing-the-Content-Query-Web-Part-Adding-Dynamic-Filtering.aspx
i would not recommend you to use a cqwp....instead try using a dataview web part it allows you to filter the content using querystring parameters without coding.

Resources