Script SharePoint crawl rules - sharepoint

I want to exclude certain pages from MOSS indexing like a confirmation page that sits in the pages library in the root of my site: http://server/Pages/ConfirmSignup.aspx
I can do this by going to search administration / search result removal and adding the url to the URLs to remove box.
Because I have dev, staging, uat, production environments I want to script this. I could only find a command in Gary Lapointe's stsadm commands but that adds an exclusion to a search scope which does not seem to work for individual files, only folders. Since there are other files in my /Pages library I can't use this.
How do I add search result removal urls programmatically?

The SPList object has a NoCrawl property. Setting this to true will ensure no items in the list will be indexed or appear in search results.
Unfortunately this doesn't go down to the SPListItem level. You would need to have an 'Admin' site and exclude its Pages list from indexing.
The advantage this solution has is its level of control. In some cases crawl rules are very complex or impossible to define correctly in search configuration. This option avoids those issues.

Related

Is there a way to exclude a subsite in Content Query Web Part in SharePoint 2013/O365?

I'm using a Content Query web part in SharePoint 2013 in O365. I want to include all items from the root site and all subsites except one particular subsite.
In the CQWP, I can only select items from the site collection, from a site and all it's subsites or from a list.
Is there a way to do this?
My answer is "No", as you can see, we can't find such settings in the tool pane of this web part, which indicates that it is not supported by design.
Content search web part(aka CSWP), it should enable you write some complex queries with KQL to search the data you want(exclude from a site or a library), just give it a try.
A sample query text which is set in the CSWP, it will exclude the results from a specific site:
http://i.stack.imgur.com/FIOPD.png

Sharepoint 2010 - Questions regarding basic concepts

I am beginning sharepoint development and have some quick questions concerning basic terms.
How do i find out whether a particular site is a site collection, or a site JUST BY THE URL? Is their a powershell command to do this?
I was creating some sites in sharepoint. Some sites were appended with /sites/sitename whereas others were just under the base url of sharepoint. What is the difference between the 2? AND, how do i recreate the ones under the sites node? For some reason, I cant find the option to create under the sites node again. Please explain this concept as all msdn tutorial are very confusion for beginners like me. Those are good once you get the hang of basics.
Please provide an analogy how to understand web app, site collection, site, web site, etc.
Is there a way to use NEWFORM.aspx for a document library instead of UPLOAD.aspx?
The Site collection is at the root level of your Web application.
So http://abc.com/ => Site collection
Using Powershell, open the Sharepoint Powershell prompt and run Get-SPSite to get all Site-Collections
the /sites/ is called as a managed path
It can be defined in the Central Administration for every web application.
The option to select the /sites will be available only when you create the second site collection under the Web Application (The first one take the / by default.)
Have a look at Technet Article
document library is for uploading file, not for storing user submitted data, for that you need to create a list
1) Document Set is used in cases where multiple documents have the same properties, its like putting all these documents in a folder and then providing attributes to that folder which are in turn applied for each document in that folder.
In your case, if all the files have the same values for the 8 fields then the document set is the correct way to go.
2)If there is additional metadata associated with the files then these can be added either to the content type (eg. document or document set content type) or to the columns in the library itself, you dont need to create a separate list for holding that data. Adding data to the content type ensures consistency across all the document libraries within that site collection, adding columns to the library affects only that library.

What are the differences between Crawler impact rules and crawl rules in SharePoint 2010 search?

What are the differences between Crawler impact rules and crawl rules in SharePoint 2010 search?
A crawler impact rule defines the rate at which the Windows SharePoint Services Help Search service requests documents from a Web site during crawling. The rate can be defined either as the number of simultaneous documents requested or as the delay between requests. In the absence of a crawler impact rule, the number of documents requested is from 5 through 16 depending on the hardware resources.
You can use crawler impact rules to modify loads placed on sites when you crawl them.
Crawl rules provide you with the ability to set the behavior of the Enterprise Search index engine when you want to crawl content from a particular path. By using these rules, you can:
Prevent content within a particular path from being crawled.
For example, in a scenario in which a content source points to the URL path such as http://www.microsoft.com/, but you want to prevent content from the "downloads" subdirectory http://www.microsoft.com/downloads/ from being crawled, you would set up a rule for the URL, with the behavior set to exclude content from that subdirectory.
Indicate that a particular path that would otherwise be excluded from the crawl should be crawled.
Using the previous scenario, if the downloads directory contained a directory called "content" that should be included in the crawl, you would create a crawl rule for the following URL, with the behavior set to include the "content" subdirectory http://www.microsoft.com/downloads/content.

SharePoint - Obtaining all files from a web

I have a requirement wherein I have to obtain all the files of a web recursively (i.e. traversing through the folders and sub folders) and display them for the user through SP Object Model.
This has to be security trimmed, in the sense, if the user doesn't have sufficient privileges to view or open the file, then that file shouldn't be taken into account. Is it possible to obtain all the files without looping through each and every document library, folders and sub folders?
Also, I don't want the default document libraries like web part gallery, master page gallery,etc to be listed out. Any insights on how to achieve this?
The Content Query Web Part can get you most of the way there. Out-of-the-box you could set this up to show all files (based on a content type or content type category) from a site collection. You could even filter to remove system files although that might be a little tricky to get the filters right.
If that doesn't get you far enough, then you could write a web part that extends the Microsoft.SharePoint.Publishing.WebControls.ContentByQueryWebPart class. You could override the Filters (by setting the FilterField1, FilterType1, FilterOperator1, FilterValue1, etc).
The security trimming should happen for you by the default behavior of the ContentQueryWebPart. The Web Part and Master Page galleries will get filtered out based on your content type settings so you shouldn't have to worry about those.

Add Site and Page Description to SharePoint Search Index

As part of a SharePoint solution, the functionality for users to create new web sites and publishing pages (programmatically) via a button click has been added. I need to ensure that the Description field for the newly created sites and pages is indexed by SharePoint Search. What is the best way to do this?
Please note, I am NOT interested in starting a new crawl. I just want to ensure that whenever the next scheduled crawl occurs, the contents of these fields will be searchable.
Thanks, MagicAndi
I'm guessing you mean how can you ensure the site is indexed immediately?
Generally, crawls are scheduled which means your new site will only be added to the search index after the next crawl is done. So if your incremental crawl happens every hour you may have to wait up to an hour for it to appear in the search index.
However, given that your new sites are being added programatically you could also programatically start an incremental crawl if it is vital for it to start appearing in search results immediately. There are details how to do this in this article.
Update:
The site title and description should be indexed automatically by the next crawl. If this isn't happening, then you don't have a Content Source that covers that site so you need to create/update one to cover the new sites and make sure it has a crawl schedule. If the new sites are created in separate site collections consider putting them on a Managed Path.
In our SharePoint system we have a terrabyte of data with 100,000 site collections and probably 20 new site collections added every day. We only have one content source that points to the root of the site and everything gets indexed automatically.
It sounds like you're missing a content source or a crawl schedule.
It turns out that the site description is included in the crawl by default. I tested the search default properties by creating a new site and assigning a unique text string to the description. After the next incremental crawl, I was able to search and find the unique string via the default SharePoint search.
I have not yet tested if the page description is included in the search scope by default, but I'm prepared to guess that it is. I will update my answer as soon as I get a chance to test this.

Resources