[UPDATE] With the help of Kentico, I have determined the cause of the problem to be that the site could not be accessed within the web server itself. Once I corrected that, the page crawler could index the content.
[ORIGINAL POST]
Problem with Kentico's Smart Search page crawler indexing.
The Smart Search page crawler indexing for our production site is not working. Auto or manual rebuilds, it doesn't matter - it will run but we get no search results because nothing is being indexed. However, it is working exactly as we expect on our development site. Smart Search is configured exactly the same in both sites. The content and all Kentico settings are exactly the same in both sites. The web.config file is the same except for the database connection string.
When I run the page crawler in production, all the pages are crawled but none of the keywords are indexed. I verified this using the Luke tool. The index files are generated in the App_Data folder.
Other information:
Site is not running on Azure.
Event logs do not show any errors after rebuilding index.
We are using a Global Administrator account in the Crawler settings
The domain is provided in the Crawler settings.
I tried deleting the index files and rebuilding them, but there was no difference.
Analyzer type is Simple with Stemming.
No batch size is set.
Robots.txt is the same in both dev and prod.
Is there any configuration in IIS or something that might be preventing the page crawler indexer from working? If it means anything, the dev site is not on the same machine or the same network as the production site.
Did you try to change the domain to localhost:80 in the Crawler settings?
Did you create the index directly on the production site? If not and you synced it from dev, then try configuring a new indexer on the production box. Also, check the production site at Settings --> System --> Search --> Enable Smart Search Indexing.
Related
I have big dilemma and I need help.
Basically we have sitecore web app this is our main web service. Currently my app is working with the main app via .html static pages(it works as SPA, JS calls backend with needed html content).
But database I work with grows bigger, and to access certain elements with URL I need to create ~70.000+ static files. As well this static files are needed for google indexing, so we can advertise our products. In case if there is new meta data needed or new item added, I need to run my other program that creates this static files to update everything out of txt file with all items. And we have 2 reserve servers where our sitecore web is. So it like 70k+ files for 9 languages and 3 web servers. It takes a day to recreate everything...
That why I decided to make clear MVC SPA application, and it works great. But...
I can't add my MVC application or anything except .html files to the current sitecore main app.
And the question is: how it could be done without losing google indexing and without changing main domain.
For example we have now:
www.ourdomain.com/foldername/mystaticfile.html
What I want:
www.ourdomain.com/mynewmvcapplication
Sitecore has a settings called IgnoreUrlPrefixes. You can add mynewmvcapplication to this setting, in that case Sitecore will ignore that path as well as anything under it. Here is a good article which shows you how to update this setting without making an update to Sitecore's config files.
Take a look at Sitecore Redirect Manager Sitecore market place. This it has the capabilities to create your custom url and keeps your search engine rating.
https://marketplace.sitecore.net/en/Modules/Sitecore_Redirect_Manager.aspx
Otherwise you can check Custom Link Provider and Custome Item Resolver. This will need more coding than the previous one. A google search with those keywords brings back many results.
Best wishes.
I am having site level search in my share point site,its not working, I have done full crawl of database still its not working.Even I had configured search services again.Still its not working,whereas item level search is working.
You have to perform following steps.
Make sure the search account has “full read” permissions in the web
applications
Entered my content sources and verified they are being crawled
Checked the crawling error log and there are no “Top Level Errors”
The search managed account has “SPSearchDBAdmin” rights in all 4 of
the search databases
I have performed an “index Reset” and performed a full crawl
I have not disabled loop back check cause, according to my research
this should not be done in production servers
I've got an umbraco site which I deploy to an azure web app service. The data is on an azure sql database. I have been able to deploy this successfully, and can verify that all the data I expect to be there is present in the content view.
However I have added content on various pages, in rich text editors, and on my local site I can see this content on the site. But on my deployed site the content in rich text editors is only visible on the content view, not on the site. I've tried publishing each item but nothing will appear.
What else can I try?
Umbraco needs some additional configuration to be treated properly on Azure. It especially affects indexes and XML caching file.
Please check the following blog post made by one of the Umbraco HQ Core developers - Sebastiaan Janssen: https://cultiv.nl/blog/making-sure-your-umbraco-site-performs-on-azure/. Go step by step to ensure if your app is properly configured.
Going further you may be in need to also ensure proper configuration for load balancing, which you can find here: https://our.umbraco.org/documentation/getting-started/setup/server-setup/load-balancing/flexible
I found the answer after a much experimenting.
I had not manually included in my project (and thus not deploying) Views/Partials/Grid/fanoe.cshtml. This file includes the and I guess I was using some default template which is using this file, rather than the other grid templates in the same folder.
For some reason my search in the sharepoint site does not work.
I have set up the SSP, the scopes, the crawls, everything but it still does not work
Can someone explain to me how to setup the search? Maybe I did something wrong in the process.
It's not the simplest thing in the world to setup, as it's comprised of a number of components.
You need to check each one to determine where your problem is.
Start from the crawl, and work your way forward to the search production on the page.
So check the following:
Check some servers have been setup to index pages. (You can see this under services on servers in the central administration pages.)
Make sure they're all running correctly. (Not in a half started state.)
Check your crawl log in your SSP to see if it is indexing anything.
(Index different types of content, like file shares, web sites, and sharepoint itself. (check each one.)).
(Note you need a special plugin to index PDF's.).
Check your index is copied to the front end server where it is used.
If it's not, it may be because this hasn't been configured, (Check Services running on servers again)
Then check your site collection setup, and ensure you have a search site configured.
Ensure the site collection search details are configured to use the search site.
Finally check the user doing the searching actually has access to the content being indexed.
Doing all of that should give you some idea of where the problem is.
In addition to Bravax's answer its worth checking that you are not getting stung by the local loopback check.
I had similar problem and ended up using search server express which is free (see my answer from this link: sharepoint 2010 foundation search not working)
I have installed search server express 2010 on top of SPF which works great. it has additional features and work well with sharepoint foundation. her is a link for upgrade and configuration: http://www.mssharepointtips.com/tip.asp?id=1086
You need to crawl the the contents source and add the website to it, then run full crawl to index data.
I'm trying to get crawl to work on two separate farms I have but can't get it to work on either one. They both have two WFE's with an additional WFE configured as an Index server. There is one more server dedicated for Query and two clustered SQL 2005 back end servers for the database. I have unsuccessfully tried at least 50 different websites that I found with solutions from a search engine. I have configured (extended) my Web App to use http://servername:12345 as the default zone and http://abc.companyname.com as the custom and intranet zones. When I enter each of those into the content source and then try to run a crawl, I get a couple of errors in the crawl log:
http://servername:12345 returns:
"Could not connect to the server. Please make sure the site is accessible."
http://abc.companyname.com returns:
"Deleted by the gatherer. (The start address or content source that contained this item was deleted and hence this item was deleted.)"
However, I can click both URL's and the page is accessible.
Any ideas?
More info:
I wiped the slate clean, so to speak, and ran another crawl to provide an updated sample.
My content sources are as such:
http://servername:33333
http://sharepoint.portal.fake.com
sps3://servername:33333
My current crawl log errors are:
sps3://servername:33333
Error in PortalCrawl Web Service.
http://servername:33333/mysites
Content for this URL is excluded by the server because a no-index attribute.
http://servername:33333/mysites
Crawled
sts3://servername:33333/contentdbid={62a647a...
Crawled
sts3://servername:33333
Crawled
http://servername:33333
Crawled
http://sharepoint.portal.fake.com
The Crawler could not communicate with the server. Check that the server is available and that the firewall access is configured correctly.
I double checked for typos above and I don't see any so this should be an accurate reflection.
One thing to remember is that crawling SharePoint sites is different from crawling file shares or non-SharePoint websites.
A few other quick pointers:
the sps3: protocol is for crawling user profiles for People Search. You can disregard anything the crawler says about it until you're ready for user profiles.
your crawl account is supposed to have access to your entire farm. If you see permissions errors, find the KB article that tells you the how to reset your crawl account (it's a specific stsadm.exe command). If you're trying to crawl another farm's content, then you'll have to work something else out to grant your crawl account access. I think this is your biggest issue presently.
The crawler (running from the index server) will attempt to visit the public URL. I've had inter-server communication issues before; make sure all three servers can ping each other, and make sure the index server can reach the public URL (open IE on the index server and check it out). If you have problems, it's time to dirty up your index server's hosts file. This is something SharePoint does for you anyway, so don't feel too bad doing it. If you've set up anything aside from Integrated Windows Authentication, you'll have to work harder to get your crawler working.
Anyway, there's been a lot of back and forth in the responses, so I'm just shotgunning a bunch of suggestions out there, maybe one of them is on target.
I'm a little confused about your farm topology. A machine installed as a just a WFE cannot be an indexer. A machine installed as "complete" can be an indexer, query and/or a wfe...
Also, instead of changing the default content access account, you may want to add a crawl rule instead (once everything is up and running)
Can you see if anything helpful is in the %commonprogramfiles%/microsoft shared/web server extensions/12/logs on your indexer?
The log file may be a bit verbose, you can search for "started" or "full" and that will usually get you to the line in the log where your crawl started.
Also, on your sql machine, you may be able to get more information from the MSScrawlurlhistory table.
Can you create a content source for http://www.cnn.com and start a full crawl? Do you get the same error(s)?
Also, we may want to take this offline, let me know if you want to do that.
I'm not sure if there is a way to send private messages via stackoverflow though.
Most of your issues are related to Kerberos, it sounds like. If you don't have the infrastructure update applied, then Sharepoint will not be able to use kerberos auth to web sites w/ non default (80/443) ports. That's also why (I would bet) that you cannot access CA from server 5 when it's on server 4. If you don't have the SPNs set up correctly, then CA will only be accessible from the machine it is installed on. If you had installed Sharepoint using port 80 as the default url you'd be able to do the local sharepoint crawl without any hitches. But by design the local sharepoint sites crawl uses the default url to access the sharepoint sites. Check out http://codefrob.spaces.live.com/blog/cns!7C69E7B2271B08F6!363.entry for a little more detail on how to get Kerberos & Sharepoint to work well together.
In the Services on Server section check the properties for the search crawl account to make sure it is set up, and that it has permissions to access those sites.
Thanks for the new input!
So I came back from my weekend and I wanted to go through your pointers and try every one and then report back about how they didn't work and then post the results that I got. Funny thing happened, though.
I went to my Indexer (servername5) and I tried to connect to Central Admin and the main portal from Internet Explorer. Neither worked. So I went into IIS on ther Indexer to try to browse to the main portal from within IIS. That didn't work either and I received an error telling me that something else was using that port. So I saw my old website from the previous build and I deleted it from IIS along with the corresponding Application Pool. Then I started the App Pool for the web site from the new build and browsed to the website. Success. Then I browsed to the website from the browser on my own PC. Success again. Then I ran a crawl by the full URL, not the servername, like so:
http://sharepoint.portal.fake.com
Success again. It crawled the entire portal including the subsites just like I wanted. The "Items in index" populated quickly and I could tell I was rolling.
I still cannot access the Central Admin site hosted on servername4 from servername5. I'm not sure why not but I don't know that it matters much at this point.
Where does this leave me? What was the fix?
I'm still not sure. Maybe it was the rebuild. Maybe as soon as I rebuilt the server farm I had everything I needed to get it to work but it just wouldn't work because of the previous website still in IIS. (It's funny how sloppy a SharePoint un-install can be. Manual deletion of content databases, web sites, and application pools seem necessary and that probably shouldn't be the case.)
In any event, it's working now on my "test" farm so the key is to get it working on the production farm. I'm hopeful that it won't be so difficult after this experience.
Thanks for the help from everyone!