I am using nutch and solr on ubuntu. I would like to use php to query the database or other methods to return an array of links from indexed pages that go to any particular url or domain. Please point me in the right direction.
I used this tutorial to set up the spider http://nlp.solutions.asia/?p=180
I would also like to note that the preference is for a php language option or an api interface with the nutch or sorl application via php curl or command line interface.
Thanks
I noticed that inside of the database inlinks were stored so executing the following query would get indexed inbound links to creativecommons.org domain after parsing the inlinks from each row:
SELECT *
FROM webpage
WHERE inlinks LIKE '%creativecommons.org%'
Related
I'm working with a client to improve their site search results through the Magento search functionality. We have set up redirects for the top searched terms. My question is, how am I able to track conversions/revenue for these terms now that I no longer have search query parameters on the url?
The client wants to be able to see the effect these search changes have on conversion rate/revenue but I can't seem to figure out how to set this up in GA and Magento doesn't seem to have report that provides this data. Any help is appreciated.
Magento 1.7.0.2
You can try two different approach :
Either by getting the search term by url rewrite table using your current url.
Or you can use the session variables by setting the search teams in the session and getting at required place.
Hope this will help you!
Is there a way to create / process friendly URLs in liferay like this?
http://myserver.com/JonDoe
... where John Doe is the name of a clients whose data should be displayed.
A little more detail:
I am not talking about getting rid of the "web" or "group" for friendly urls, I am taking about having a friendly url right after the first "/".
We want to create URLs in the form of http://server/ClientName where ClientName resolves to the name of a Client. This is an issue since normally liferay would expect a friendly URL after the first "/". So we ned to intercept that somehow.
The process should be like this (pseudo code):
1) inspect values after first "/"
2) If value after "/" is the name of a client, send user to client display page and display client information
3) If there is no client with the given name, interpret it as friendly url and do normal liferay behaviour.
Is there a way to do this in liferay ?
Sounds like you want to get rid of the /web/ or /group/ parts of the URLs? This is possible with proper configuration of the virtual host - you'll map the site to the domain name, then you have total freedom to name the pages, even hierarchically (e.g. /JonDoe/home)
So far this was simple configuration. If you want /JonDoe to point to another site than /JoeShmoe (e.g. just get rid of /web/ or /group/), you'll have to dig deeper and write quite some customization plugins that change the name resolving (and generation of URLs)
If you want to have one URL for a page, you can just set the friendly URL for that page (see Olafs remark about virtual hosts as well)
If you want to have a limited set of URLs for one page, you can create a page of type Link to Page for each URL and select the original page. To identify the current URL when rendering you portlet you can use PortalUtil.getCurrentURL(renderRequest)
If you want to have many URLs for one page you could use a FriendlyUrlMapper, which allows URLs like http://myserver.com/page/-/myPortlet/JonDoe.
If you want to have many "root" URLs (i mean without the /page/-/myPortlet part), you will have to create an Liferay EXT plugin, extend com.liferay.portal.util.PortalImpl and overwrite getPortletFriendlyURLMapperLayoutQueryStringComposite. I've done the same by implementing a strategy that checks if a page exists for a specific given URL and otherwise uses the URL as parameter for a FriendlyURLMapper.
I've configured my IIS (asp.net site) to use URL Rewrite.
In particular this is my rule (dynamic one): whatever url in format number/string will be redirected to a special aspx page.
SSo whatever url starts with mysite/id/Name is redirected to showprof.aspx?id=id&title=Name. This works perfectly.
My question is about search engines. I don't have any "fixed" page that contains links like mysite/id/Name that the spider can scan, so I'm trying to figure it out how search engines could index my dynamic pages. Should I create a sitemap.xml? if yes in wich way? or should I create a "hidden" page that contains every link to all my dynamic contents like mysite/id1/Name1 mysite/id2/Name2 and so on?
thank you
A starting point is definitely a Sitemap.xml, You could try for example the IIS SEO Toolkit and see if it is able to index any of your pages: http://www.iis.net/downloads/microsoft/search-engine-optimization-toolkit
It also has functionality to generate a sitemap.xml, although I'm guessing in your case you probably have some dynamic content, so a better approach would be to have a "handler" that generates it dynamically on demand (maybe cache it for performance reasons).
I would also recommend to have some pages that actually are accessible through normal links, for example maybe have in your home page of the site a link to a "site map" page (not sitemap.xml), where there you render a set of links that you want to index (at least the ones that are most important to you), and that will make them easy to discover.
I am developing a site on Codeigniter 2.0.2 . Its a site where companies/users can signup and create their own profile page, have their own custom url(like http://facebook.com/demouser), have their own feedback system, display their services.
This said, I have been successful in display the profile page in the following format
http://mainwebsite.com/company/profile/samplecompany
This displays the home page for the company samplecompany , where company is the controller and profile is the method.
Now I have few questions,
I guess it is possible to create to have/get http://mainwebsite.com/samplecompany using htaccess and a default controller. If anybody can help with the htaccess rule , that would be awesome. I am already using htacess to remove index.php from CI but could not get this working.
There will be few other pages for the given user/company such as feedback, contact us, services etc. So the implementation links that come to my mind is of the form
`
http://mainwebsite.com/company/profile/samplecompany/feedback or
http://mainwebsite.com/samplecompany/feedback
http://mainwebsite.com/company/profile/samplecompany/services or
http://mainwebsite.com/samplecompany/services
http://mainwebsite.com/company/profile/samplecompany/contactus or
http://mainwebsite.com/samplecompany/contactus
wheresamplecompany` is the dynamic part
Is it possible to create site links in the format?
I understand using A record for a given domain, I can point a domain say, http://www.samplecompany.com to http://mainwebsite.com/company/profile/samplecompany so typing http://www.samplecompany.com he should be taken to http://mainwebsite.com/company/profile/samplecompany . If this is successfully implemented, will
http://www.samplecompany.com/feedback
http://www.samplecompany.com/services
http://www.samplecompany.com/contactus
work correctly?
I guess it is possible to create to have/get http://mainwebsite.com/samplecompany using htaccess and a default controller. If anybody can help with the htaccess rule , that would be awesome. I am already using htacess to remove index.php from CI but could not get this working.
There will be few other pages for the given user/company such as feedback, contact us, services etc. So the implementation links that come to my mind is of the form ` http://mainwebsite.com/company/profile/samplecompany/feedback or http://mainwebsite.com/samplecompany/feedback
You can accomplish this using routes. For example, in your /config/routes.php file, put this:
$route['samplecompany'] = "company/profile/samplecompany";
$route['samplecompany/(:any)'] = "company/profiles/samplecompany/$1";
The first rule tells CodeIgniter that when someone accesses http://mainwebsite.com/samplecompany that it should process it as if the URL were "company/profile/samplecompany". The second rule captures anything that comes in the URI string after "samplecompany" and appends it onto the end.
However, if you have multiple companies(not just samplecompany), you're probably going to want to extend CI's router to suppor this unless you want to manually edit the config file each time a new company is added.
OK, you're definitely going to want to handle dynamic company names(as per your comment). This is a little trickier. I can't give you the full code, but I can point you in the right direction.
You'll want to extend CI's router and on an incoming request query the DB for the list of company names. If the URI segment matches a company name, you'll want to serve it up using your company/profile method. If it does not, you will ignore it and let CI handle it normally. Check out this post on this topic for more information: forum link.
Here's a great guide on how to achieve what you need: Codeigniter Vanity URL's.
I have been trying to create a new google custom search engine, but when I try some query,the search engine it is not giving me the expected search
result.On some queries it is working fine, but on other queries, it says"no result".
I tried adding the URL of the website that I wanted to search for, but there are certain pages and keywords that are not coming up in the search result when I try to search for the keyword of that page.
I tired adding both the main page URL and the URL of the sub page that I want to search for, but nothing is working.
There are some sub pages to the main URL that are coming in the search result.
this happened to me too. It is because the url you specify to google has to match the server address where your site is stored. For example, I made a site with google custom search (mainstreetbd.com) and when I tested it on my server, the google search returned no results. But when I did it on the specified url, it works fine.
Some webpages instruct search engines how to index their webpage in a file called robots.txt.
For example:
https://stackoverflow.com/robots.txt
(If a site has one it should be under http://URL/robots.txt directly after the domain name)
If the robots.txt for the site you are trying to search excludes some parts of its site from being indexed, it could be the source of your problem.