creating log file of specific web traffic statistics - web

I have a website, hosted on a shared server.
Through CPANEL, I am provided with a few traffic analysis logs and tools.
None seem to provide what I'm looking for.
For each day, I'd like to see log file with a list of unique visitors.
Under each unique visitor (by IP address), I'd like to see the following information
geographic location (based on IP address)
information to help determine if the visitor was a bot or human
the page URLs they requested (including the exact time of request)
explanation of my application:
I run a forum on my site. I'd like a better understanding of who is visiting, when they visit and how
they navigate through my forum pages (topics, posts etc.)
I would appreciate some direction on how to develop this (a script is probably best)

I would (and do) use Google Analytics as it gives you exactly what you are asking for and a whole lot more (like being able to see live what is happening). It requires you to add some javascript code to the application (which for so many today, plugins are available).
If no plugin is available, see https://support.google.com/analytics/answer/1008080?hl=en
This approach to your end result will typically be a lot easier than trying to create your own log analyser and installing it on a shared cPanel server.

Related

How to block users accessing site outside of UK?

Searched the web and unable to find a solution. I have an umbraco site using IIS to host on a Windows server. Any ideas on approach to block users accessing site outside the UK? Htaccess approach would be too slow.... thank you in advance!
That's quite hard to do accurately, as you could have someone based in the UK using a European network provider, which means that they might appear to come from say Holland instead of the UK. It's also possible for people to spoof their location fairly easily if they really want to get at your site.
As Lex Li mentions there are plenty of commercial databases and tools for looking up a user's location, but the accuracy of these varies considerably, not to mention the fact that some of them only support IPv4. Any of these options are going to be slow though, as you'll have to check on every request. You also have to make sure you keep the databases up to date.
Another option would be to proxy your site through something like CloudFront or CloudFlare which both support blocking traffic by country.

Need some ideas on how one can spam some website, crawl some website and waste it's resources

I am working on a startup which basically serves website. Sorry, I can't reveal much details about the startup.
I need some ideas on how spammers and cralwer devs think on attacking some website. And if possible, then a way to prevent such attacks too.
We have come up with some basic ideas like:
1. Include a small JS file in the sites that would send an ACK on our servers ones all the assets are loaded. Like some crawlers/bots only come to websites and download specific stuff like images or articles. In such cases, our JS won't be triggered. And when we study our logs, which will have a record of resources requested by the particular IP and if out JS was triggered or not. We can then whitelist or blacklist IP's based on the study.
2. Like email services do, we will load a 1x1 px image on the client side via an API call. In simple words, we won't add the "img" tag directly in out HTML, but rather a JS that calls an API on our server that returns the image to the client.
3. We also have a method to detect Good bots like that of google which indexes our pages. So we can differentiate between good bots and bad bots that just waste our resources.
We are at a very basic level. Infact, all our code does right now is logs the IP's and assets requested by that IP in elasticsearch.
And so we need ideas on how people spam/crawl websites via cralwers/bots/etc. So we can come up with some solution. And if possible, please also mention the pros and cons and ways to defend against your ideas too.
Thanks in advance. If you share your ideas, you'll be helping a startup which will be doing a lot of good stuff.

google chrome extention send email site block of dns

I want to build an extension on Google Chrome which functions will be forwarding address illicit websites that email to parents, which prohibited it site address using DNS Nawala or something similar, with the extension prevents the expected negative impact of the use of the internet.
What are the steps that I did in building this extension ?
Thank you.
This is a very broad "how do I create my entire project" question, but I'll try to give you some broad advice:
An extension alone will not be enough for this. You're going to need a web service as well. You'll likely need to divide the project into two parts:
A Chrome extension that monitors the websites a person visits. You can do this by using the Tab API. Simply look at each site the user has visited and if they visit any of the illicit sites on a blacklist, take an action, probably by making an API call to the web service mentioned below.
You're almost certainly going to need to use a web service developed with a scripting language like PHP or Java, or something similar. This web service would take care of sending the emails to parents. If we're just talking about sending an email to one parent than this service could be quite simple. The extension would tell the web service to send an email when an illicit site is visited, and that's about it. If you're talking about a commercial project then this service would probably need to be a full fledged website that allows parents to sign up for these emails.
Again, this is a very broad question, and generally speaking Stackoverflow is more for asking specific programming questions. But hopefully this will get you moving in the right direction at least, so you can come back and ask more specific questions. :-)

I need to speed up my site and reduce the number of files calls

My webhost is aking me to speed up my site and reduce the number of files calls.
Ok let me explain a little, my website is use in 95% as a bridge between my database (in the same hosting) and my Android applications (I have around 30 that need information from my db), the information only goes one way (as now) the app calls a json string like this the one in the site:
http://www.guiasitio.com/mantenimiento/applinks/prlinks.php
and this webpage to show in a web view as welcome message:
http://www.guiasitio.com/movilapp/test.php
this page has some images and jquery so I think this are the ones having a lot of memory usage, they have told me to use some code to create a cache of those files in the person browser to save memory (that is a little Chinese to me since I don't understand it) can some one give me an idea and send me to a tutorial on how to get this done?. Can the webview in a Android app keep caches of this files?
All your help his highly appreciated. Thanks
Using a CDN or content delivery network would be an easy solution if it worked well for you. Essentially you are off-loading the work or storing and serving static files (mainly images and CSS files) to another server. In addition to reducing the load on your your current server, it will speed up your site because files will be served from a location closest to each site visitor.
There are many good CDN choices. Amazon CloudFront is one popular option, though in my optinion the prize for the easiest service to setup is CloudFlare ... they offer a free plan, simply fill in the details, change the DNS settings on your domain to point to CloudFlare and you will be up and running.
With some fine-tuning, you can expect to reduce the requests on your server by up to 80%
I use both Amazon and CloudFlare, with good results. I have found that the main thing to be cautious of is to carefully check all the scripts on your site and make sure they are working as expected. CloudFlare has a simple setting where you can specify the cache settings as well, so there's another detail on your list covered.
Good luck!

How should I wall off the dev and/or beta sites -- from the public and search engine bots?

I need dev and beta sites hosted on the same server as the production environment (let's let that fly for practical reasons).
To keep things simple, I can accept the same protections in place on both dev and beta -- basically don't let it get spidered, and put something short of user names and passwords in place to prevent everyone and their brother from gaining access (again, there's a need to be practical). I realize that many people would want different permissions on dev than on beta, but that's not part of the requirements here.
Using robots.txt file is a given, but then the question: should the additional host(s) (aka "subdomain(s)") be submitted to the Google Webmaster tools as an added preventive measure against inadvertent spidering? It should go without saying, but there will be no linking into the dev/beta sites directly, so you'd have to type in the address perfectly (with no augmentation by URL Rewrite or other assistance).
How could access be restricted to just our team? IP addresses won't work because of the various methods of internet access (meetings at lunch spots with wifi, etc.).
Perhaps having dev/beta and production INCLUDE a small file (or call a component) that looks for URL variable to be set (on the dev/beta sites) or does not look for the URL variable (on the production site). This way you could leave a different INCLUDE or component (named the same) on the respective sites, and the source would otherwise not require a change when it's moved from development to production.
I really want to avoid full-on user authentication at any level (app level or web server), and I realize that leaves things pretty open, but the goal is really just to prevent inadvertent browsing of pre-production sites.
Usually I see web server based authentication with a single shared username and password for all users, this should be easy to set up. An interesting trick might be to check for a cookie instead, and then just have a better hidden page to set that cookie. You can remove that page when everyone's visited it, or implement authentication just for that file, or allow access to it just from the office and require people working from home to use VPN or visit the office if they clear their cookies.
I have absolutely no idea if this is the "proper" way to go about doing it, but for us we place all Dev and Beta sites on very high port numbers that crawlers/spiders/indexers never go to (in fact, I don't know of any off the top of my head that go beyond port 80 unless they're following a direct link).
We then have a reference index page listing all of the sites with links to their respective port numbers, with only that page being password-protected. For sites involving real money transactions or other sensitive data, we display a short red bar on top of the website explaining that it is just a demo server, on the very rare chance that someone would directly go to a Dev URL and Port #.
The index page is also on a non-standard (!= 80) port. But even if a crawler were to reach it, it wouldn't get past the password input and would never find the direct links to all the other ports.
That way your developers can access the pages with direct URLs and Ports, and they have a password-protected index for backup should they forget.

Resources