Is BLEXBot crawler used by Google? [closed]

Is BLEXBot crawler used by Google? [closed] - .htaccess

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I have setup my htaccess this way
SetEnvIfNoCase User-Agent .*google.* search_robot
SetEnvIfNoCase User-Agent .*yahoo.* search_robot
SetEnvIfNoCase User-Agent .*bot.* search_robot
SetEnvIfNoCase User-Agent .*ask.* search_robot
Order Deny,Allow
Deny from All
Allow from env=search_robot
I have this bot showing up:
IPv4 address:198.143.187.122
Reverse DNS:blexn3.webmeup.com
RIR:ARIN
Country:United States
RBL Status:Clear
Thread:No threats detected
Is this bot used by Google or I am missing something?

No BLEXBot is not google. It belongs to a company called WebMeUp. You can find information about them here.
If you lookup the IP in the log you will see it's not Google.
IP Address 198.143.187.122
Host blexn3.webmeup.com
Location US US, United States
City Chicago, IL 60661
Organization SingleHop
ISP SingleHop
Google IPs will list Google as the organisation.
Google use their own Bots, they are custom built. You can read up about them here, including a definitive list of their user-agent strings which may be useful to you.
To block follow the instructions here.

Related

Google DynDNS with FritzBox [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
Improve this question
I know thats not the kind of question that are commonly asked in here, but I don't know where to ask instead.
I want to setup DynDNS with my FritzBox 6600 Cable, but i always got an 500 - notfqdn (not full qualified domain name) Error. (I know FritzBox is not the best solution avaiable... but this is what i have to work with).
According to this guide i set up the DynDNS config in the FritzBox and used the username and password from the configured DynDNS and the update url domains.google.com/nic/update with the dyndns domain. The config in the FritzBox looks like the following:
Update-URL: domains.google.com/nic/update
Domainname (Domain-Name): something.my-domain.de
Benutzername (Username): my_username
Kennwort (Password): my_password
I don't know what's the problem. Some testing with other configuration shows that a random user and password give the same 500 error.
Do anybody know how the request of the FritzBox looks like and how the parameters are parsed?

According to the FRITZ!Box help page you can use pre-defined placeholders inside the Update URL which will be filled with the corresponding information.
So in the case of Google Domains the API expects a request URL in the following form:
https://username:password#domains.google.com/nic/update?hostname=subdomain.yourdomain.com&myip=1.2.3.4
In your FRITZ!Box DynDNS configuration you have to replace the variables in the Google Domains API URL by the corresponding placeholders from the FRITZ!Box documentation, which will look something like this:
https://<username>:<pass>#domains.google.com/nic/update?hostname=<domain>&myip=<ipaddr>
Note that the URL might be different depending on your FRITZ!Box type.
For further information check out the Google Domains help page Learn about Dynamic DNS
(especially the section "Use the API to update your Dynamic DNS record") and the help page of your FRITZ!Box, which may be accessed using the question mark icon in the top right of the DynDNS configuration page. (Help page for FRITZ!Box 7590)

You need to change the URL into following syntax
https://username:password#domains.google.com/nic/update?hostname=subdomain.yourdomain.com

Use the userdefined DynDNS provider.
Update-URL: https://domains.google.com/nic/update?hostname=mydyndns.yourdomain.com
Domainname: mydyndns.yourdomain.com
Username: generated username from Google Domains
Password: generated password from Google Domains

Allow several ip address for admin login using htaccess files [duplicate]

This question already has an answer here:
Need to deny all IPs except mine from accessing site and display friendly error
(1 answer)
Closed 8 years ago.
I am doing one application in php and i want to make the admin login secure for that i need to make the admin login functionality only from one ip address.
So how can it be possible, please reply your help will be appreciated

in your .htaccess file
order deny,allow
deny from all
allow from your ip

Ban robots from website [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
my website is often down because a spider is accessying to many resources. This is what the hosting told me. They told me to ban these IP address:
46.229.164.98
46.229.164.100
46.229.164.101
But I've no idea about how to do this.
I've googled a bit and I've now added these lines to .htaccess in the root:
# allow all except those indicated here
<Files *>
order allow,deny
allow from all
deny from 46.229.164.98
deny from 46.229.164.100
deny from 46.229.164.101
</Files>
Is this 100% correct? What could I do?
Please help me. Really I don't have any idea about what I should do.

based on these
https://www.projecthoneypot.org/ip_46.229.164.98
https://www.projecthoneypot.org/ip_46.229.164.100
https://www.projecthoneypot.org/ip_46.229.164.101
it looks like the bot is http://www.semrush.com/bot.html
if thats actually the robot, in their page they say
To remove our bot from crawling your site simply insert the following lines to your
"robots.txt" file:
User-agent: SemrushBot
Disallow: /
Of course that does not guarantee that the bot will obey the rules. You can block him in several ways. .htaccess is one. Just like you did it.
Also you can do this little trick, deny ANY ip address that has "SemrushBot" in user agent string
Options +FollowSymlinks
RewriteEngine On
RewriteBase /
SetEnvIfNoCase User-Agent "^SemrushBot" bad_user
SetEnvIfNoCase User-Agent "^WhateverElseBadUserAgentHere" bad_user
Deny from env=bad_user
This way will block other IP's that the bot may use.
see more on blocking by user agent string : https://stackoverflow.com/a/7372572/953684
Should i add, that if your site is down by a spider, usually it means you have a bad-written script or a very weak server.
edit:
this line
SetEnvIfNoCase User-Agent "^SemrushBot" bad_user
tries to match if User-Agent begins with the string SemrushBot (the caret ^ means "beginning with"). if you want to search for let's say SemrushBot ANYWHERE in the User-Agent string, simply remove the caret so it becomes:
SetEnvIfNoCase User-Agent "SemrushBot" bad_user
the above means if User-Agent contains the string SemrushBot anywhere (yes, no need for .*).

You are doing the right thing BUT
You have to write that code in .htaccess file , not in Robots.txt File.
For denying any Search Engine from crawling your site, the code should like this
User-Agent:Google
Disallow:/
It will disallow Google from crawling your Site.
I would prefer .htaccess method by the way.

Canonical link seo friendly [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I have a link structure that is managed by an apache server. Some pages can be called with different url but with a php redirect the user will see just the canonical url for the page.
canonical url: www.example.it/490/persons/jan-antone-vian
generic url that call the same page: www.example.it/490/
My question is if it is correct to insert the url with just the id (www.example.it/490/) in some links (is much easier to manage the links) for SEO?

first of all, it's sure that a well constructed URL is better for SEO (url with keywords)
What you can do, and what you seems to have done :
keep the www.example.it/490/ URL for sharing,
add a 301 redirection to your
www.example.it/490/persons/jan-antone-vian when arriving in the first
url (www.example.it/490/). The SEO juice should be transfered to the
full url constructed.
Google says that ONE (and only one) URL has to be related to ONE (and only one) content, so you cannot let the both url displaying the same content. (you could be penalised for duplicate content...)
Note that a 301 is announced that it transfers the FULL SEO JUICE, but it's common to not do many 301 redirections in cascade.

Will Google be able to access my website after blocking all US IPs?

I'm going to block all US IPs using .htaccess this way :
<Limit GET HEAD POST>
order deny,allow
deny from 3.0.0.0/8
deny from 4.0.0.0/25
deny from 4.0.0.128/26
deny from 4.0.0.192/28
deny from 4.0.0.208/29
....
allow from all
</Limit>
Will Google be able to access and index my website after blocking all US IPs?
EDIT : Sorry for the ambiguity, but I DO want Google to index my website.

Although Google has its servers spread across the whole world, it would be quite hard to say where the search engine's bots mostly originate from. What I suggest would be to block the IP ranges but add an exclusion clause that matches against the User-Agent for search bots like:
SetEnvIfNoCase User-Agent (googlebot|bingbot|yahoo!\sslurp) is_search_bot
<Directory /docroot>
Order Deny,Allow
Deny from 3.0.0.0/8
Deny from 4.0.0.0/25
Deny from 4.0.0.128/26
Deny from 4.0.0.192/28
Deny from 4.0.0.208/29
Allow from env=is_search_bot
</Directory>

I don't think so, but if you really don't what google to index it then use a robot.txt file so it doesn't index it. The robot.txt would be
User-agent: googlebot Disallow: /directory/
If it's just a matter of blocking US ip and that's it then you're probably good, as google has data centers in many different locations, not just the United States. This means that google will still probably index it.

Although google has many data centers , but all their bots are in US so no google will not be able to scan your website if you block us ips

If you can't access your domain root directory, just use this meta tag to block google bot index specific page(s):
<meta name="googlebot" content="noindex">
If your site was indexed already by google crawler, following the guide Remove your own content from Google search results

Access: https://www.google.com/webmasters/
There all information that you need.
Here, the Google teach how you can block the Googlebot index your site:
https://support.google.com/webmasters/answer/93708
About your question, I think that if you block all US IP Address, the "Google other country" must access and index your site, then he must sync with Google US.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string