Lucene.NET and external sites - search

We have built a web site which employs Lucene.NET for search. We recently have integrated another web site so that form a user's perspective both website seem to be just one site! (we share the mater pages, etc.)
The problem we have is that two web sites are hosted in different locations. So when Lucene.NET crawls the first web site, it does not pick the content of the second web site. We want to extract the content from the 2nd web site and put it in the same index file that is built for the first site.
How can I get Lucene.NET to crawl an external site too?
Thanks

If you have file-system access to the 2nd system than you can just index by providing the path. If not, you will need to write a crawler, you can start with something basic using HttpWebRequest, or get fancier by using some tools that recursively crawl a site using links etc.

Related

Sitecore + Mvc Application. Proper redirection without losing Google indexing. sitecoredomain.com/mvcapp

I have big dilemma and I need help.
Basically we have sitecore web app this is our main web service. Currently my app is working with the main app via .html static pages(it works as SPA, JS calls backend with needed html content).
But database I work with grows bigger, and to access certain elements with URL I need to create ~70.000+ static files. As well this static files are needed for google indexing, so we can advertise our products. In case if there is new meta data needed or new item added, I need to run my other program that creates this static files to update everything out of txt file with all items. And we have 2 reserve servers where our sitecore web is. So it like 70k+ files for 9 languages and 3 web servers. It takes a day to recreate everything...
That why I decided to make clear MVC SPA application, and it works great. But...
I can't add my MVC application or anything except .html files to the current sitecore main app.
And the question is: how it could be done without losing google indexing and without changing main domain.
For example we have now:
www.ourdomain.com/foldername/mystaticfile.html
What I want:
www.ourdomain.com/mynewmvcapplication
Sitecore has a settings called IgnoreUrlPrefixes. You can add mynewmvcapplication to this setting, in that case Sitecore will ignore that path as well as anything under it. Here is a good article which shows you how to update this setting without making an update to Sitecore's config files.
Take a look at Sitecore Redirect Manager Sitecore market place. This it has the capabilities to create your custom url and keeps your search engine rating.
https://marketplace.sitecore.net/en/Modules/Sitecore_Redirect_Manager.aspx
Otherwise you can check Custom Link Provider and Custome Item Resolver. This will need more coding than the previous one. A google search with those keywords brings back many results.
Best wishes.

google search engine inside my drupal / site without show the content to worldwide?

i have a site that im developing and i want to put there content and links (some of them are my private home links and stuff) drupal search is good, but its not accurate and also its shows me only things from my site... i want to search all include my website, but dont let google to public it world wide... is taht possible?
It's not possible to have google index your site without making it accessable to the web. However you can set up a local solr server and user the Apache Solr Search Integration Module to index your site, which is much better than the normal drupal search. That would only provide search for your site still. I'm not sure what you mean by "only shows me things from my site."
If you are not able to host a solr server an alternative is Acquia's hosted solr search. if the pricing is right for you it's a quick and easy way to get solr search on your site.

Sharepoint Site using PHP code

I realise that Sharepoint in asp .net based but I have a PHP application that a user wants me to include in a Sharepoint site.
So is it possible to use PHP code inside sharepoint?
danit, when you say 'include' in the site, would it mean to be part of the site chrome (like inside the same menus etc) or just live connected with links would suffice?
I ask that because you can run php code under an IIS site but that would only keep the pages separated inside the same virtual host. if you need to actually join the functionalities you can:
Fake it with an iFrame (Page Viewer Web Part pointing to the php site for example).
you'll have to use some interop like webservices. It really comes down to what you want to do and not to if it is possible.
This is also possible by creating a web part page and adding a page viewer web part. You can link to the page, making it visible within the Sharepoint site, but it won't offer any interaction with Sharepoint itself.
Sharepoint treats the page as a shared document, however, so you can restrict access to it. I have done this in order to offer access to 'view only' content such as reports, but you can also use it for custom php apps that rely on a database other than the one(s) Sharepoint is using.

WSS Search - Content inside webparts

How would you go about having WSS search index content that's inside a webpart/pulled from an external source and presented in a SPGridView?
You probably already know this, but if you go to Site Settings -> Search Visibility you will see a section stating:
This site contains fine-grained permissions. Specify the sites ASPX page indexing behavior: ...
If you choose "Always index all ASPX pages on this site" it should index the content in your web part, but only as the crawler sees it, so security trimming would not apply. It is basically a web crawl and not a SharePoint content crawl.
I know you said WSS, but in MOSS you might be able to take this one step further if the above did not work out and use a web site search on your SharePoint site. I have done plenty of web site searches (it does not work perfectly), but have not tried to explicitly do a web site search on a SharePoint site so I'm not certain this will work.
Lars (who co-wrote "Inside the Index and Search Engines: MOSS 2007") is pretty active on StackOverflow so maybe he'll chime in.
Also in MOSS: Have the data you are displaying available as for instance a web service / page in a different site also. In Moss you can add Federated Search locations to be included in the Search results.

How to provision a custom page without using the _layouts directory?

I need to provision a custom aspx page which does some work and then redirects to another page. Using a _layouts page, AKA an application page won't work since I only want this page accessible to one site collection.
I looked at using pattern #4 from blog post Application Development on MOSS 2007 and WSS V3. It feels pretty hacky, and it asks you to drop the DLL. Drop it into the bin of the site collection, and upload the file through SP Designer.
I'd rather have this page be a feature that gets included in my site definition or stapled to an existing site definition. I imagine I could use a feature receiver to deploy the files to the pages SP list. One of the comments on the blog post says as much:
Pardon my ignorance on this maybe I am
missing the point completely but
wouldnt it be easier to deploy your
custom pages by programmatically
adding them to the pages splist?
I basically, had a simple .aspx page
with a user control. I deployed it via
this method.
How can a custom page be provisioned without using a _layouts page?
I guess another option is to keep using a _layouts page, but make sure the referrer is correct.
Besides application pages, you also have the possibility to create site pages.
Site pages are in nature related to application pages, but they reside in a site not in _LAYOUTS.
The welcome page (default.aspx) is an example of such a page.
In the same way as with application pages, it is possible to do codebehind in these pages (check out AC's article on this subject, Using ASP.NET 2.0 Code Behind Files in SharePoint v3 Sites)
You can read this article regarding the subtle differences between application pages and site pages: SharePoint Application and Site Pages - Part 1 of 2
You deploy this custom page using a feature, where you specify the file as ghostable (look at the example in ACs article).

Resources