Need some ideas on how one can spam some website, crawl some website and waste it's resources - web

I am working on a startup which basically serves website. Sorry, I can't reveal much details about the startup.
I need some ideas on how spammers and cralwer devs think on attacking some website. And if possible, then a way to prevent such attacks too.
We have come up with some basic ideas like:
1. Include a small JS file in the sites that would send an ACK on our servers ones all the assets are loaded. Like some crawlers/bots only come to websites and download specific stuff like images or articles. In such cases, our JS won't be triggered. And when we study our logs, which will have a record of resources requested by the particular IP and if out JS was triggered or not. We can then whitelist or blacklist IP's based on the study.
2. Like email services do, we will load a 1x1 px image on the client side via an API call. In simple words, we won't add the "img" tag directly in out HTML, but rather a JS that calls an API on our server that returns the image to the client.
3. We also have a method to detect Good bots like that of google which indexes our pages. So we can differentiate between good bots and bad bots that just waste our resources.
We are at a very basic level. Infact, all our code does right now is logs the IP's and assets requested by that IP in elasticsearch.
And so we need ideas on how people spam/crawl websites via cralwers/bots/etc. So we can come up with some solution. And if possible, please also mention the pros and cons and ways to defend against your ideas too.
Thanks in advance. If you share your ideas, you'll be helping a startup which will be doing a lot of good stuff.

Related

How to block users accessing site outside of UK?

Searched the web and unable to find a solution. I have an umbraco site using IIS to host on a Windows server. Any ideas on approach to block users accessing site outside the UK? Htaccess approach would be too slow.... thank you in advance!
That's quite hard to do accurately, as you could have someone based in the UK using a European network provider, which means that they might appear to come from say Holland instead of the UK. It's also possible for people to spoof their location fairly easily if they really want to get at your site.
As Lex Li mentions there are plenty of commercial databases and tools for looking up a user's location, but the accuracy of these varies considerably, not to mention the fact that some of them only support IPv4. Any of these options are going to be slow though, as you'll have to check on every request. You also have to make sure you keep the databases up to date.
Another option would be to proxy your site through something like CloudFront or CloudFlare which both support blocking traffic by country.

Block traffic from referral spam bots in Azure Web App with DNN

I am sure many of you have found fake referral traffic in your google analytics reports/views. This makes it difficult for low to medium traffic sites to have accurate data for marketing. I am wondering what others are doing to exclude this traffic from their analytics reports.
If you go to your analytics account and go to acquisition -> all traffic -> referrals you will see sites like floating-share-buttons.com. These are the sites I want to filter out. Which you can do by setting up a custom filter for the view as described at the bottom of this page. I have done this and it works.
I would rather block these bots from hitting the site all together. Just a note: my sites are running as web apps in azure.
I am not sure if setting up url rewrite rules described here will work in azure apps or if this will mess with the existing url rewrite functions of the Content Management System I am using (DotNetNuke DNN platform 7).
I am really just looking to hear what others have done to block bots rather than than setting up filters in the analytics view's settings.
Thanks
PS
for those who are interested, this is the current filter list I am using:
webmonetizer\.net|trafficmonetizer\.org|success-seo\.com|event-tracking\.com|Get-Free-Traffic-Now\.com|buttons-for-website\.com|4webmasters\.org|floating-share-buttons\.com|free-social-buttons\.com|e-buyeasy\.com
With regards to this issue, there are a number of things that you can do. You are going the route that I see most commonly used and that is to block the information using the filters in Google Analytics.
You can go the route of an IIS Filter as well, just like you have linked. DNN's Friendly URL's will not necessarily be impacted by this as they are processed BEFORE DNN gets the request. There is a marginal performance impact by having two things process re-writes, but nothing to be concerned about until incredibly high user volume.
This is also a great collection of options.
First you need to know that there are mainly 2 types of spam affecting GA right now, Ghost and Crawlers.
The first(ghosts) never interacts with your page, so any server-side solutions like the HTTP rules or htaccess file won't have any effect and will only fill your config files with.
The crawlers as the name imply do access your website and can be blocked this way, but there are only a few of them compared with the ghost. To give you an Idea there are around 8 active crawlers while there are more than 100 ghosts and each week increasing.
This is because the ghost method is easier to implement for the spammers.
From your expression, only success-seo is a crawler. The rest should be filtered. Now there is a better way to get rid of all ghosts with just one filter based in your valid hostnames instead of creating of updating one every week.
You can find more information about the ghost spam and the solution here
https://stackoverflow.com/a/28354319/3197362
https://moz.com/ugc/stop-ghost-spam-in-google-analytics-with-one-filter
Hope it helps.

Using another server to store files: Good or bad idea?

I am thinking of using another "less" important server to store files that our clients want to upload and handling the data validation, copying, insertion, etc at that end.
I would display the whole upload thingy through iframe on our website and using HTML,PHP,SQL as syntax-languages for the thingy?
Now I would like to ask your opinions is this is a good or bad idea.
I´m figuring out that the pros and cons are:
**Pros:
The other server is "less" valuable, meaning if something malicious could be uploaded there it would not be the end of the world
Since the other server has less events/users/functionality/data it would help to lessen the stress of our main website server
If the less important server goes down the other functionality on main server would still be functioning
Firewall prevents outside traffic (at least to a certain point)
The users need to be logged through the main website
**Cons:
It does not have any CMS+plugins, so it might be more vunerable
It might generate more malicious traffic towards it.
Makes the upkeep of the main website that much more complicated for future developers
Generally I´m not found of the idea that users get to uploading files, but it is not up to me.
Thanks for your input. I´m looking forward to hearing your opinions.
Servers have file quotas and bandwidths defined/allocated for them.
If you transfer your "less" used files to another server ,it will help your main server to improve its performance.
And also there wont be much maintenance headaches with the main server if all files are uploaded there.
Conclusion : It is a good idea.
Well, I guess most importantly, you will need a single sign-on (SSO) solution in place between the two web applications. I assume you don't want user A be able to read or delete files from user B.
SSO between 2 servers is a lot more complicated than for a single web application. Unless this site is only deployed in an intranet with a Active Directory domain controller in which case you can use Kerberos.
I'm not sure it's worth it just for the advantages you name.

google chrome extention send email site block of dns

I want to build an extension on Google Chrome which functions will be forwarding address illicit websites that email to parents, which prohibited it site address using DNS Nawala or something similar, with the extension prevents the expected negative impact of the use of the internet.
What are the steps that I did in building this extension ?
Thank you.
This is a very broad "how do I create my entire project" question, but I'll try to give you some broad advice:
An extension alone will not be enough for this. You're going to need a web service as well. You'll likely need to divide the project into two parts:
A Chrome extension that monitors the websites a person visits. You can do this by using the Tab API. Simply look at each site the user has visited and if they visit any of the illicit sites on a blacklist, take an action, probably by making an API call to the web service mentioned below.
You're almost certainly going to need to use a web service developed with a scripting language like PHP or Java, or something similar. This web service would take care of sending the emails to parents. If we're just talking about sending an email to one parent than this service could be quite simple. The extension would tell the web service to send an email when an illicit site is visited, and that's about it. If you're talking about a commercial project then this service would probably need to be a full fledged website that allows parents to sign up for these emails.
Again, this is a very broad question, and generally speaking Stackoverflow is more for asking specific programming questions. But hopefully this will get you moving in the right direction at least, so you can come back and ask more specific questions. :-)

I need to speed up my site and reduce the number of files calls

My webhost is aking me to speed up my site and reduce the number of files calls.
Ok let me explain a little, my website is use in 95% as a bridge between my database (in the same hosting) and my Android applications (I have around 30 that need information from my db), the information only goes one way (as now) the app calls a json string like this the one in the site:
http://www.guiasitio.com/mantenimiento/applinks/prlinks.php
and this webpage to show in a web view as welcome message:
http://www.guiasitio.com/movilapp/test.php
this page has some images and jquery so I think this are the ones having a lot of memory usage, they have told me to use some code to create a cache of those files in the person browser to save memory (that is a little Chinese to me since I don't understand it) can some one give me an idea and send me to a tutorial on how to get this done?. Can the webview in a Android app keep caches of this files?
All your help his highly appreciated. Thanks
Using a CDN or content delivery network would be an easy solution if it worked well for you. Essentially you are off-loading the work or storing and serving static files (mainly images and CSS files) to another server. In addition to reducing the load on your your current server, it will speed up your site because files will be served from a location closest to each site visitor.
There are many good CDN choices. Amazon CloudFront is one popular option, though in my optinion the prize for the easiest service to setup is CloudFlare ... they offer a free plan, simply fill in the details, change the DNS settings on your domain to point to CloudFlare and you will be up and running.
With some fine-tuning, you can expect to reduce the requests on your server by up to 80%
I use both Amazon and CloudFlare, with good results. I have found that the main thing to be cautious of is to carefully check all the scripts on your site and make sure they are working as expected. CloudFlare has a simple setting where you can specify the cache settings as well, so there's another detail on your list covered.
Good luck!

Resources