Nutch change seed.txt doesn't work - nutch

I am using nutch 1.7 and try to crawl domain1.com using
bin/nutch crawl urls -dir crawl -solr http://localhost:8983/solr/ -depth 3 -topN 5
But after I change the urls/seed.txt delete the http://domain1.com/ and add http://domain2.com/ re-run the command above, the crawl still crawl the domain1.com not domain2.com.
Does any one know why's that?

I found the solution. I need to change the regex-urlfilter.txt file also.

Related

Htaccess 301 redirect is only redirecting domain not also the page

I want to redirect page www.domain1.nl/page-old to www.domain2.nl/page-new.
When www.domain1.nl/page-old is visited it redirects to: www.domain2.nl/page-old.
So the redirect sort of works, but only half. The domain is redirected, the page is not.
Domain1 only has a .htaccess file in the root.
Domain2 is a WordPress website.
We use apache 2.4 + php7.3 + CloudFlare (no rules set)
We tried multiple htaccess rules texts: Redirect 301, Redirect, RedirectMatch, RewriteRule. All same result.
In the .htaccess file of domain1.com:
Redirect 301 /page-old/ https://www.domain2.com/page-new/
Expected result would be:
www.domain1.nl/page-old >> www.domain2.nl/page-new
Actual result:
www.domain1.nl/page-old >> www.domain2.nl/page-old
Posting my answer in comment section here so that readers can find it easily later
You need to use this rule instead for precise matching:
RedirectMatch 301 ^/page-old/ https://www.domain2.com/page-new/
Redirect is non-regex directive that works with starts-with match.

htaccess redirect using not english characters

htaccess works without problem when i`m using english characters.
but when I use not english characters, it cant redirect and shows :
Not Found
The requested URL
these are my sample code. I have tasted it different types but nono of them works :
Redirect 301 "/مقاله-انواع-دسته-بندی-برج-های-خنک-کننده" /destination
Redirect 301 "/%D9%85%D9%82%D8%A7%D9%84%D9%87-%D8%A7%D9%86%D9%88%D8%A7%D8%B9-%D8%AF%D8%B3%D8%AA%D9%87-%D8%A8%D9%86%D8%AF%DB%8C-%D8%A8%D8%B1%D8%AC-%D9%87%D8%A7%DB%8C-%D8%AE%D9%86%DA%A9-%DA%A9%D9%86%D9%86%D8%AF%D9%87" /destination
Add the following line in your Apache configuration file:
AddDefaultCharset UTF-8
then restart your Apache server. After this, you may use your htaccess file which contains redirects with UTF-8 characters, and they should now be recognized. You should make sure that you save your htaccess file in a format which supports UTF-8.
You can use this rule in your site root .htaccess:
RewriteEngine On
RewriteRule ^مقاله-انواع-دسته-بندی-برج-های-خنک-کننده/?$ /destination [L,B,R=301]

301 htaccess redirect not working (for a single page)

Putting this at the top of my htaccess file doesn't do anything:
Redirect 301 /taxonomy/term/6%207%208%209 http://mysite.com/taxonomy/term/all
Neither does this:
Redirect 301 http://mysite.com/taxonomy/term/6%207%208%209 http://mysite.com/taxonomy/term/all
Im using a CMS that uses its own htaccess file so could it be my rule are being overridden? I thought that putting the code at the top of the file would solve this? Thanks
Escapes get decoded before going through mod_alias, so the %20's get turned back into spaces. You need to put them in quotes:
Redirect 301 "/taxonomy/term/6 7 8 9" http://mysite.com/taxonomy/term/all

Redirect 301 cgi-bin/ to new URL

it's my first question here :)
I got a problem for redirecting URL:
I have old URL like www.domain.com/cgi-bin/category.cgi?type=...
And try to redirect them to www.domain.com on the htaccess
but I still have 404 error...
This is my rule :
RewriteRule ^cgi-bin/(.*)$ http://www.domain.com [R=301,L]
I verified if there are something in the conf about cgi-bin but nothing.
I did a test with "cgi-bin2" and it works...
So what can i do ?
I don't know where you problem come from but why don't you try to write a perl script which will redirect to your base domain url ?
(it can work if you have, for example, just few cgi files previously used).
In your example it seems you want to redirect "category.cgi".
so, in our case, write a "category.cgi" file in your "cgi-bin" folder and write this code inside it :
#!/usr/bin/perl
#
# fixedredir.cgi
use strict;
use warnings;
my $URL = "http://www.yourdomain.com/";
print "Status: 301 Moved\nLocation: $URL\n\n"
Hope it help !

.htaccess redirect loop! All subdirectories to root

I'm trying to redirect all subdirectories to the root of my website via .htaccess!
The code below works fine if I try to access a subdirectory ... but doesn't let me display the index page in the root because starts a loop of redirects!
RedirectMatch temp ^/.*$ http://localhost/
How can I solve this?
If you want all subdirectores to redirect to the home page, you would do something like this:
RedirectMatch temp ^/.+/.*$ http://localhost/
This will match any URI with two slashes in it, with at least one character between them.

Resources