Add Site and Page Description to SharePoint Search Index - sharepoint

As part of a SharePoint solution, the functionality for users to create new web sites and publishing pages (programmatically) via a button click has been added. I need to ensure that the Description field for the newly created sites and pages is indexed by SharePoint Search. What is the best way to do this?
Please note, I am NOT interested in starting a new crawl. I just want to ensure that whenever the next scheduled crawl occurs, the contents of these fields will be searchable.
Thanks, MagicAndi

I'm guessing you mean how can you ensure the site is indexed immediately?
Generally, crawls are scheduled which means your new site will only be added to the search index after the next crawl is done. So if your incremental crawl happens every hour you may have to wait up to an hour for it to appear in the search index.
However, given that your new sites are being added programatically you could also programatically start an incremental crawl if it is vital for it to start appearing in search results immediately. There are details how to do this in this article.
Update:
The site title and description should be indexed automatically by the next crawl. If this isn't happening, then you don't have a Content Source that covers that site so you need to create/update one to cover the new sites and make sure it has a crawl schedule. If the new sites are created in separate site collections consider putting them on a Managed Path.
In our SharePoint system we have a terrabyte of data with 100,000 site collections and probably 20 new site collections added every day. We only have one content source that points to the root of the site and everything gets indexed automatically.
It sounds like you're missing a content source or a crawl schedule.

It turns out that the site description is included in the crawl by default. I tested the search default properties by creating a new site and assigning a unique text string to the description. After the next incremental crawl, I was able to search and find the unique string via the default SharePoint search.
I have not yet tested if the page description is included in the search scope by default, but I'm prepared to guess that it is. I will update my answer as soon as I get a chance to test this.

Related

SharePoint 2013 in place search issue

I am doing an in place search in a document library and results are being returned just fine. However, some users are reporting seeing the following message:
Some files might be hidden. Include these in your search
I also receive this message at random times for different searches. There seems to be reports of this on various boards with no real answer. Anyone know why this message would occur?
This may relate to the Search Service Settings > Content Source in Central Admin. Ensure that the addressers are correct and that the crawl have the same entry as you do. Normally you want to point the url to the default zone.
When done, reset the index and start a new full crawl.

Sharepoint 2010 Search Default Content Access Account

I have a search set up on my Intranet. I have not allowed certain libraries and lists to be crawled (this helps eliminate the need for so many crawl rules). However...I do need some crawl rules in place, which I added. I ran the Full Crawl and the "excluded" items from the crawl rules still showed up.
I believe this is because my administration account has full control, but I don't know how to fix it.
I went in to add another account to the service (Manage Service Applications under Central Admin - Administration tab) and the only option it gives me to select is "full control".
Under the main site accounts (Manage Web Application link on Central Admin) the user I added says full read.
Now then On the main Search Service Page there is an account called "Default Content Access Account". I changed that to be the account that is Full Read from the administration group of the manage web application page. I then cleared the indexing and ran the crawl fresh. The crawl rules are still ignored. Does anyone have any thoughts on this? I am very perplexed.
Well, I never was able to fully solve the issue. I did go into each list and library and under advanced settings selected 'No' for the search being able to crawl it. Though this solution will only go so far.
I still have the issues in my document libraries of the /Form/* content showing up in a search (Which only show up if you search for an item that also appears on the MasterPage).
Anyway, I can live with this half fixed as it is.

Adding Item to SharePoint Search index manually

I am looking for a way to add a document to Search Index using API, as and when document gets added to document library.
I can add eventhandler and write a code to call API. I need to know if API supports such interface. Any sample will be really helpful.
Thanks.
I think that SharePoint (2007 and 2010) have passive indexing, meaning it is out of your control beyond scheduling the indexing service to run at a certain frequency. That being the case, there are occasions when your search cache will be out of sync, such as when you first delete an item. However, I believe you can programmatically prime the index service.
It is also possible to have SharePoint non-SharePoint content, such as a UNC path, via the Central Admin.
As other mentioned it isn't quite possible to do what you want. However you can decrease the latency between when you add content and when it gets indexed. The process looks like this:
Create a new Search Content Source that includes your data that needs to be rapidly searched
Add only sites that you care about rapid search to this content source
Schedule this content source's incremental crawl to happen really often. Consider programmatically watching the crawl status so that you could restart the crawl after it has completed.
Tune your search databases I/O and its indexes so that search crawling happens as fast as possible.

Microsoft SharePoint Search - Ignore sections of the page

I am using Microsoft SharePoint Search (MOSS) to search all pages on a website.
My problem is that when you search for a word that appears in the header, footer, menu or tag cloud section of the website, that word will appear on every page, so the search server will bring you a list of results for that search term: every page on the website.
Ideally I want to tell the search server to ignore certain HTML sections in its search index.
This website seems to describe my problem, and a guy says "why not hide those sections of your website if the User Agent is the search server.
The problem with that approach is that most of the sections I hide contain links to other pages (menu's and tag clouds) and so the crawler will hit a dead end and won't crawl very far.
Anyone got any suggestions on how to solve this problem?
I'm not sure if i'm reading this correctly. You DON'T want Search to include parts of your site in the index, but you DO want it to go into that section and follow any links in it?
I think the best way is to indeed exclude those section based on user agent (i.e. add them to a usercontrol and if the user agent is MS Search you don't render the section).
Seeing as these sections would be the same on every page, it's okay to exclude them when the search crawler comes by.
Just create ONE page (i.e. a sitemap :-D). that does include all the links a normal user would see in the footer / header / etc. The crawler could then use that page to follow links deeper into your site. This would be a performance boost as well, seeing as the crawler only encounters the links once instead of on every page.

Sharepoint search of external RSS feeds

I want my sharepoint site to allow a user to search content in a known collection of RSS feeds. I figure conceptually a few ways to do this
crawl the feeds at their source (Yikes!)
Pull the full articles into my sharepoint site, then let my crawler crawl it
Make use of an existing index (like google)
search the full articles, on demand, using something like a google utility (my preference)
So can I somehow, from my sharepoint site, allow a user to search the full articles from a couple dozen, named, rss feeds
thanks
Cary
I don't see why there is a problem with crawling the feeds at their source? That would seem to be reasonable.
It is fairly easy to create a content source to point at the feed and select the correct indexing schedule. If that does not work then you can try a more complicated approach.
Be aware that copying the content of another website to host on your own could have copyright implications (not too mention the risk that any inflammatory content would appear to be published on your own site).
--update--
Try reading the target sites robots.txt to see if (it even has one) it has a desired frequency. Otherwise it depends on the depth of the site you would be crawling.
If you are crawling just the rss feed xml, I suspect you could do that every hour without annoying anyone. Otherwise if you reach into each article, you may want to limit that. It really depends a lot on any relationship you have with the target site and type of site you are hitting.
Checkout this article for a little more info on how SharePoint deals with robots.txt
(p.s. the target site did not put the articles on the web so no one would read them)
The out of the box crawler will respect robots.txt and there are provisions for crawler impact rules that will lessen the chance that SharePoint will perform a beat down on the external site.

Resources