how to ensure visit different web page by human not by bot? - bots

how to ensure visit different web pages by human, not by bot program?
Is there some tecnique?
thanks

if(strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
{
// Google Bot visits you
}
This is an PHP example of finding out, if the visitor is GoogleBOT.

You can either change the User Agent in the HTTP headers, or look for bot like activity, such as a very high frequency of hits over a wide range of pages coming from a single ip address (though you might see that with a Proxy Server too). You can also look for hits on Robots.txt and assume that other visits within the same session where also from a robot.
In reality there is no sure fire way of doing it, as sophisticated robot writers could pretend to be browsers.

Time can be good measurement of whether a visit was a human or a bot.
If you set a time-out or delay on the JavaScript which tracks the user visit to execute after 1 or 2 seconds. Most humans will visit a page for at least that time (even if they don't like it) whereas a bot should be able to scan and move on in that time.
Just a thought.

Related

Dialogflow fulfillments timeout after 5 seconds, How can we overcome this limit?

Imagine the following scenario:
Our client starts a conversation with our bot regarding a problem with one of their websites.
We go and check the website in our backend through our fulfilment web service and if there is no obvious problem we want to then generate a screenshot of the website and present it to our client.
The screenshot could take more than 5 seconds to take as many websites take longer than 5 seconds to fully load.
As a result the response timeouts.
We can't force our clients to redesign their websites to lead faster than 5 seconds just so that our chatbot can handle their request.
I assume there are many other real-world examples in which fulfillments can take more than 5 seconds.
Example:
Client: My website www.example.com doesn't load.
Bot: I just checked the website and it loads fine for me. Here is a screenshot of your website for you to check:
{Image goes here}
The short answer is - you can't.
There are some tricks that work sometimes, but they won't work in all scenarios, and aren't good practices anyway.
Your best bet would be to respond to the user with a message saying you're doing some checks and diagnostics and to ask for an update in a moment (and, if possible, provide a Suggestion Chip prompting them to ask for it). After sending that message, launch a background task to do the checks to see what might be going on and take the screen shot. When they ask for an update, you can report with what you have (latency time, the screen shot, etc) or if things are being slow and that you're still checking.

How to implement logic based on external redirects?

I'm building a website for a client (real estate), and on the website are links to a different website (adverts for properties). My client routinely activates and deactivates these adverts when he rents out a certain property.
The hrefs on my links look something like this:
<a href="https://domain.xx/estate/idxx/des-crip-tion-xx-xx-x-xx/">. If the advert is indeed active, it just takes them to the advert. If it is not active, however, the website in question redirects the user to https://domain.xx/estate-for-rent/city/, effectively sending the users to my client's competition.
I wish to implement some logic where, before handing the users over to the other website, the server checks to see if it is redirected to https://domain.xx/estate-for-rent/city/, or some similar logic, and if so, uses preventDefault, or something, and notifies the user that the advert is not available instead of sending them to the other website.
I wonder if I can use the fact that only if the advert is active does the resulting url in the users browser window (after they've been directed to the other website) match the url in my href. Can i somehow get the server to try to access the url in my href, and have it see where it gets redirected, and then do something based on that? On the back-end, I'm running NodeJS with Express by the way, and if it matters, I'm relying heavily on EJS for templating. Thanks in advance for any help!
This sounds more like a problem you could solve on the client as opposed to the server. For example, at a high level here's how I would do it:
Handle the click event for each link (really simple to do a catch-all with jQuery)
Fire off a HEAD request via AJAX to the destination URL (this would be much more efficient than a GET but depends on the external service supporting this verb)
Use the status code to determine what to do next (e.g. 2xx allow redirect, 3xx pop a message and block)

Twilio SMS with links - links being clicked automatically?

I have an app that sends SMS's out to a bunch of people. Those messages contain links. They are not using any link shorteners or any other service. They link back to my site. The links themselves are randomized strings, which are stored in my db, which are associated with an action. (Click "yes" or "no" link and the db tracks what you chose.) For ALL users, this works perfectly. With one user - and it's always the same user, as soon as the cron job runs, which triggers this event, his "vote" comes in. This is without him clicking or even seeing the message sometimes.
So, the question: has anyone ever seen or heard of a cell provider or a messaging app or similar that "clicks" links as part of some process before sharing the content with the user? I can't see ANYTHING in the code that would single him out so I'm thinking it has to be something in between when the message goes out and he does what he does. Especially because the timestamp is also always within seconds of the cron job running.
Sending an SMS can sometimes go through multiple carriers before reaching an end destination. As such, providers may be "handling" the content in this case.
The best thing to do would for any cases of this in the future would be to write support for further investigation.

How do you prevent crawling from your web site?

I am running a website on IIS with more than 1000 page links at pagination and I want to prevent others to crawl/steal these pages by running a crawler script and get the info page by page.
Is there any way to understand the request if it is a user request or being ran by a script? or maybe some filters for this on highest level before coming to request?
You can't prevent automated crawling.
You can make it harder to automatically crawl your content, but if you allow users to see the content it can be automated (i.e. automating browser navigation is not hard and computer generally don't care to wait long time between requests).
One option is to require single "user" (either authenticated or not) to have some minimal delay between requests (i.e. 1-5 seconds). This way generic crawling will not be useful (require some "user id" in request and delay between requests), and one would have to write custom crawling code which is clearly more time intensive.
Note that writing special "crawler" for your site may be considered as "noble" action and significantly increase incentive to create one (i.e. check out "how to make Google maps available offline" questions).

How does gmail check gtalk statuses in real time?

How is it that if I have thousands of contacts (let's suppose) all around the world and one of them changes their status to away or becomes idol that it will change immediately in my browser?
It isn't instant really, there is a small delay, basically when you load the gmail page in your browser you also download a javascript file that refreshes the content dynamically via ajax. Similarly if a contact of yours changes gtalk status and you're using the gtalk client in gmail that change will be reflected after the next time the page you're viewing asks the server for updates. It's just constantly checking with the server for changes (the event oriented paradigm isn't really prevalent on the web).
I'm not sure of the exact mechanism gmail uses, but a fairly dumb way would be to have the page poll (via XMLHTTPRequest, aka AJAX call) the servers every X seconds for a change in contact statuses since N seconds ago... then apply those changes.
Google chat system is based upon XMPP protocol and Gmail chat block is just like another XMPP client (similar to gtalk,pidgin,psi for desktops). XMPP runs over browser using Bosh extension. Though i m sure google must have hacked to get it working in their own way, but underlying idea is still the same.
In short, when one of your contact update his/her status, it is being pushed to the google chat xmpp servers which in turn pushes that information to your gmail chat client.

Resources