Trackback spam protection. How to escape link blast

Trackback spam protection. How to escape link blast - .htaccess

Lately some folks got into habit of creating trackback links pointing to my website from porn-related sources to de-index my site. They succeeded to a sertain extent, but I managed to spot them through GA and now I'm blocking their websites through .htaccess. The procedure is painful and I decided to ask you, good fellows, if you know how to block trackback links as they appear before it becomes a problem.
I know WP has some protection from backtrack spam, but I am not familliar with the mechanism.

Spam is a well-known problem with Trackback and Pingback.
Possible measures against the spam:
Fetch the source and check if it really links to you
Pipe the request through a spam-analyzing service like Akismet
Pipe the source site content through a spam-analyzing service
Have a whitelist of people you know and trust, and block the rest. This isn't that nice for people you don't know and that send valid pingbacks.
More info: http://indiewebcamp.com/spam

Related

Creating a honeypot for nodejs / hapi.js

I have a hapijs application and checking some logs I have found some entries for automated site scanners and hits to entries to /admin.php and similar.
I found this great article How to Block Automated Scanners from Scanning your Site and I thought it was great.
I am looking for guidance on what the best strategy would be to create honey pots for a hapijs / nodejs app to identify suspicious requests, log them, and possibly ban the IPs temporarily.
Do you have any general or specific (to node and hapi) recommendations on how to implement this?
My thoughts include:
Create the honeypot route with a non-obvious name
Add a robots.txt to disallow search engines on that route
Create the content of the route (see the article and discussions for some of the recommendations)
Write to a special log or tag the log entries for easy tracking and later analysis
Possibly create some logic that if traffic from this IP address receives more traffic than certain threshold (5 times of honeypot route access will ban the IP for X hours or permanently)
A few questions I have:
How can you ban an IP address using hapi.js?
Are there any other recommendations to identify automated scanners?
Do you have specific suggestions for implementing a honeypot?
Thanks!

Let me start with saying that this Idea sounds really cool but I'm not if it is much practical.
First the chances of blocking legit bots/users is small but still exisits.
Even if you ignore true mistakes the option for abuse and denial of service is quite big. Once I know your blocking users who enter this route I can try cause legit users touch it (with an iframe / img / redirect) and cause them to be banned from the site.
Than it's effectiveness is small. sure your going to stop all automated bots that scan your sites (I'm sure the first thing they do is check the Disallow info and this is the first thing you do in a pentest). But only unsophisticated attacks are going to be blocked cause anyone actively targeting you will blacklist the endpoint and get a different IP.
So I'm not saying you shouldn't do it but I am saying you should think to see if the pros outwaite the cons here.
How to actually get it done is actually quite simple. And it seem like your looking for a very unique case of rate limiting I wouldn't do it directly in your hapi app since you want the ban to be shared between instances and you probably want them to be persistent across restarts (You can do it from your app but it's too much logic for something that is already solved).
The article you mentioned actually suggests using fail2ban which is a great solution for rate limiting. you'll need to make sure your app logs to afile it can read and write a filter and jail conf specifically for your app but it should work with hapi with no issues.
Specifically for hapi I maintain an npm module for rate limiting called ralphi it has a hapi plugin but unless you need a proper rate limiting (which you should have for logins, sessions and other tokens) fail2ban might be a better option in this case.
In general Honey pots are not hard to implement but as with any secuiry related solution you should consider who is your potential attacker and what are you trying to protect.
Also in general Honey pots are mostly used to notify about an existing breach or an imminent breach. Though they can be used to also trigger a lockdown your main take from them is to get visibility once a breach happend but before the attacker had to much time to abuse the system (You don't want to discover the breach two months later when your site has been defaced and all valuable data was already taken)
A few ideas for honey pots can be -
Have an 'admin' user with relatively average password (random 8 chars) but no privileges at all when this user successfully loges in notify the real admin.
Notice that your not locking the attacker on first attempt to login even if you know he is doing something wrong (he will get a different ip and use another account). But if he actually managed to loggin, maybe there's an error in your login logic ? maybe password reset is broken ? maybe rate limiting isn't working ? So much more info to follow through.
now that you know you have a semi competent attacker maybe try and see what is he trying to do, maybe you'll know who he is or what his end goal is (Highly valuable since he probably going to try again).
Find sensitive places you don't want users to play with and plant some canary tokens in. This can be just a file that sites with all your other uploads on the system, It can be an AWS creds on your dev machine, it can be a link that goes from your admin panel that says "technical documentation" the idea is that regular users should not care or have any access to this files but attackers will find them too tempting to ignore. the moment they touch one you know this area has been compromised and you need to start blocking and investigating
Just remember before implementing any security in try to think who you expect is going to attack you honey pots are probably one of the last security mesaures you should consider and there are a lot more common and basic security issues that need to be addressed first (There are endless amount of lists about node.js security best practices and OWASP Top 10 defacto standard for general web apps security)

Make your site anti-bot?

I remember a site closed due to misuse and i wonder if bots have a part of it. If the bot is POSTing something to my site what are ways i can combat it? I was thinking of setting some cookies and having the cookies changed via javascript + timestamp and sign (so yesterdays cookies cant be used today and next week).
I'm sure most people/bots would just use another site instead of enabling JS in their bot.
What else can i do? I'm thinking daily POST limit and a honeypot for generic bots who just randomly post spam

If you want to get fancy, you can combine a honeypot with IP bans. Anyone who posts to your honeypot gets their IP stuck in /etc/hosts.deny or similar for the next N days.

The most popular method to prevent abuse by bots currently is CAPTCHA. It tends to work pretty well for most bots, since computers can't read very well yet. A slight downside is that some people (myself included) don't like having to constantly prove they're not bots. But it's one of the very few common ways of preventing abuse that's not trivial to defeat, if implemented properly.
There are CAPTCHA plugins for most blog, wiki and e-commerce frameworks.

You could also look into akismet:
http://akismet.com/faq/
It offers spam detection services.

I want to use security through obscurity for the admin interface of a simple website. Can it be a problem?

For the sake of simplicity I want to use admin links like this for a site:
http://sitename.com/somegibberish.php?othergibberish=...
So the actual URL and the parameter would be some completely random string which only I would know.
I know security through obscurity is generally a bad idea, but is it a realistic threat someone can find out the URL? Don't take the employees of the hosting company and eavesdroppers on the line into account, because it is a toy site, not something important and the hosting company doesn't give me secure FTP anyway, so I'm only concerned about normal visitors.
Is there a way of someone finding this URL? It wouldn't be anywhere on the web, so Google won't now it about either. I hope, at least. :)
Any other hole in my scheme which I don't see?

Well, if you could guarantee only you would ever know it, it would work. Unfortunately, even ignoring malicious men in the middle, there are many ways it can leak out...
It will appear in the access logs of your provider, which might end up on Google (and are certainly read by the hosting admins)
It's in your browsing history. Plugins, extensions etc have access to this, and often use upload it elsewhere (i.e. StumbleUpon).
Any proxy servers along the line see it clearly
It could turn up as a Referer to another site

some completely random string
which only I would know.
Sounds like a password to me. :-)
If you're going to have to remember a secret string I would suggest doing usernames and passwords "properly" as HTTP servers will have been written to not leak password information; the same is not true of URLs.
This may only be a toy site but why not practice setting up security properly as it won't matter if you get it wrong. So hopefully, if you do have a site which you need to secure in future you'll have already made all your mistakes.

I know security through obscurity is
generally a very bad idea,
Fixed it for you.
The danger here is that you might get in the habit of "oh, it worked for Toy such-and-such site, so I won't bother implementing real security on this other site."
You would do a disservice to yourself (and any clients/users of your system) if you ignore Kerckhoff's Principle.
That being said, rolling your own security system is a bad idea. Smarter people have already created security libraries in the other major languages, and even smarter people have reviewed and tweaked those libraries. Use them.

It could appear on the web via a "Referer leak". Say your page links to my page at http://entrian.com/, and I publish my web server referer logs on the web. There'll be an entry saying that http://entrian.com/ was accessed from http://sitename.com/somegibberish.php?othergibberish=...

As long as the "login-URL" never posted anywhere, there shouldn't be any way for search engines to find it. And if it's just a small, personal toy-site with no personal or really important content, I see this as a fast and decent-working solution regarding security compared to implementing some form of proper login/authorization system.
If the site is getting a big number of users and lots of content, or simply becomes more than a "toy site", I'd advice you to do it the proper way

I don't know what your toy admin page would display, but keep in mind that when loading external images or linking to somewhere else, your referrer is going to publicize your URL.

If you change http into https, then at least the url will not be visible to anyone sniffing on the network.

(the caveat here is that you also need to consider that very obscure login system can leave interesting traces to be found in the network traces (MITM), somewhere on the site/target for enabling priv.elevation, or on the system you use to log in if that one is no longer secure and some prefer admin login looking no different from a standard user login to avoid that)
You could require that some action be taken # of times and with some number of seconds of delays between the times. After this action,delay,action,delay,action pattern was noticed, the admin interface would become available for login. And the urls used in the interface could be randomized each time with a single use url generated after that pattern. Further, you could only expose this interface through some tunnel and only for a minute on a port encoded by the delays.
If you could do all that in a manner that didn't stand out in the logs, that'd be "clever" but you could also open up new holes by writing all that code and it goes against "keep it simple stupid".

How would you attack a domain to look for "unknown" resources? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Given a domain, is it possible for an attacker to discover one or many of the pages/resources that exist under that domain? And what could an attacker do/use to discover resources in a domain?
I have never seen the issue addressed in any security material (because it's a solved problem?) so I'm interested in ideas, theories, best-guesses, in addition to practices; anything an attacker could use in a "black box" manor to discover resources.
Some of the things that I've come up with are:
Google -- if google can find it, an attacker can.
A brute force dictionary attack -- Iterate common words and word combinations (Login, Error, Index, Default, etc.) As well, the dictionary could be narrowed if the resource extension was known (xml, asp, html, php.) which is fairly discoverable.
Monitor traffic via a Sniffer -- Watch for a listing of pages that users go to. This assumes some type of network access, in which case URL discovery is likely small peanuts given the fact the attacker has network access.
Edit: Obviously directory listings permissions are turned off.

The list on this is pretty long; there are a lot of techniques that can be used to do this; note that some of these are highly illegal:
See what Google, archive.org, and other web crawlers have indexed for the site.
Crawl through public documents on the site (including PDF, JavaScript, and Word documents) looking for private links.
Scan the site from different IP addresses to see if any location-based filtering is being done.
Compromise a computer on the site owner's network and scan from there.
Attack an exploit in the site's web server software and look at the data directly.
Go dumpster diving for auth credentials and log into the website using a password on a post-it (this happens way more often than you might think).
Look at common files (like robots.txt) to see if they 'protect' sensitive information.
Try common URLs (/secret, /corp, etc.) to see if they give a 302 (unauthorized) or 404 (page not found).
Get a low-level job at the company in question and attack from the inside; or, use that as an opportunity to steal credentials from legitimate users via keyboard sniffers, etc.
Steal a salesperson's or executive's laptop -- many don't use filesystem encryption.
Set up a coffee/hot dog stand offering a free WiFi hotspot near the company, proxy the traffic, and use that to get credentials.
Look at the company's public wiki for passwords.
And so on... you're much better off attacking the human side of the security problem than trying to come in over the network, unless you find some obvious exploits right off the bat. Office workers are much less likely to report a vulnerability, and are often incredibly sloppy in their security habits -- passwords get put into wikis and written down on post-it notes stuck to the monitor, road warriors don't encrypt their laptop hard drives, and so on.

Most typical attack vector would be trying to find well known application, like for example /webstats/ or /phpMyAdmin/, look for some typical files that unexperienced user might left in production env (eg. phpinfo.php). And most dangerous: text editor backup files. Many text editors leave copy of original file with '~' appended or perpended. So imagine you have whatever.php~ or whatever.apsx~. As these are not executed, attacker might get access to source code.

Brute Forcing (Use something like OWASP Dirbuster , ships with a great dictionary - also it will parse responses therefore can map the application quite quickly and then find resources even in quite deeply structured apps)
Yahoo, Google and other search engines as you stated
Robots.txt
sitemap.xml (quite common nowadays, and got lots of stuff in it)
Web Stats applications (if any installed in the server and public accessible such as /webstats/ )
Brute forcing for files and directories generally referred as "Forced Browsing", might help you google searches.

The path to resource files like CSS, JavaScript, images, video, audio, etc can also reveal directories if they are used in public pages. CSS and JavaScript could contain telling URLs in their code as well.
If you use a CMS, some CMS's put a meta tag into the head of each page that indicates the page was generated by the CMS. If your CMS is insecure, it could be an attack vector.

It is usually a good idea to set your defenses up in a way that assumes an attacker can list all the files served unless protected by HTTP AUTH (aspx auth isn't strong enough for this purpose).
EDIT: more generally, you are supposed to assume the attacker can identify all publicly accessible persistent resources. If the resource doesn't have an auth check, assume an attacker can read it.

The "robots.txt" file can give you (if it exists, of course) some information about what files\directories are there (Exmaple).

Can you get the whole machine? Use common / well known scanner & exploids.
Try social engineering. You'll wonder about how efficient it is.
Bruteforce sessions (JSessionid etc.) maybe with a fuzzer.
Try common used path signatures (/admin/ /adm/ .... in the domain)
Have a look for data inserts for further processing with XSS / SQL Injection / vulnerability testing
Exploid weak known applications within the domain
Use fishing hacks (XSS/XRF/HTML-META >> IFrame) to forward the user to your fake page (and the domain name stays).
Blackbox reengineering - What programming language is used? Are there bugs in the VM/Interpreter version? Try service fingerprinting. How whould you write a page like the page you want wo attack. What are the security issues the developer of the page may have missed?
a) Try to think like a dumb developer ;)
b) Hope that the developer of the domain is dumb.

Are you talking about ethical hacking?
You can download the site with SurfOffline tools, and have a pretty idea of the folders, architecture, etc.
Best Regards!

When attaching a new box onto "teh interwebs", I always run (ze)nmap. (I know the site looks sinister - that's a sign of quality in this context I guess...)
It's pretty much push-button and gives you a detailed explanation of how vulnerable the target (read:"your server") is.

If you use mod_rewrite on your server you could something like that:
All request that does not fit the patterns can be redirected to special page. There the IP or whatever will be tracked. You you have a certain number of "attacks" you can ban this user / ip. The most efficient way you be automatically add a special rewrite condition on you mod_rewrite.

A really good first step is to try a domain transfer against their DNS servers. Many are misconfigured, and will give you the complete list of hosts.
The fierce domain scanner does just that:
http://ha.ckers.org/fierce/
It also guesses common host names from a dictionary, as well as, upon finding a live host, checking numerically close IP addresses.

To protect a site against attacks, call the upper management for a security meeting and tell them to never use the work password anywhere else. Most suits will carelessly use the same password everywhere: Work, home, pr0n sites, gambling, public forums, wikipedia. They are simply unaware of the fact that not all sites care not to look at the users passwords (especially when the sites offer "free" stuff).

Is it possible for a 3rd party to reliably discern your CMS?

I don't know much about poking at servers, etc, but in light of the (relatively) recent Wordpress security issues, I'm wondering if it's possible to obscure which CMS you might be using to the outside world.
Obviously you can rename the default login page, error messages, the favicon (I see the joomla one everywhere) and use a non-default template, but the sorts of things I'm wondering about are watching redirects somehow and things like that. Do most CMS leave traces?
This is not to replace other forms of security, but more of a curious question.
Thanks for any insight!

Yes, many CMS leave traces like the forming of identifiers and hierarchy of elements that are a plain giveaway.
This is however not the point. What is the point, is that there are only few very popular CMS. It is not necessary to determine which one you use. It will suffice to methodically try attack techniques for the 5 to 10 biggest CMS in use on your site to get a pretty good probability of success.

In the general case, security by obscurity doesn't work. If you rely on the fact that someone doesn't know something, this means you're vulnerable to certain attacks since you blind yourself to them.
Therefore, it is dangerous to follow this path. Chose a successful CMS and then install all the available security patches right away. By using a famous CMS, you make sure that you'll get security fixes quickly. Your biggest enemy is time; attackers can find thousands of vulnerable sites with Google and attack them simultaneously using bot nets. This is a completely automated process today. Trying to hide what software you're using won't stop the bots from hacking your site since they don't check which vulnerability they might expect; they just try the top 10 of the currently most successful exploits.
[EDIT] Bot nets with 10'000 bots are not uncommon today. As long as installing security patches is so hard, people won't protect their computers and that means criminals will have lots of resources to attack. On top of that, there are sites which sell exploits as ready-to-use plugins for bots (or bots or rent whole bot nets).
So as long as the vulnerability is still there, camouflaging your site won't help.

A lot of CMS's have id, classnames and structure patterns that can identify them (Wordpress for example). URLs have specific patterns too. You just need someone experienced with the plataform or with just some browsing to identify which CMS it's using.
IMHO, you can try to change all this structure in your CMS, but if you are into all this effort, I think you should just create your own CMS.
It's more important to keep everything up to date in your plataform and follow some security measures than try to change everything that could reveal the CMS you're using.

Since this question is tagged "wordpress:" you can hide your wordpress version by putting this in your theme's functions.php file:
add_action('init', 'removeWPVersionInfo');
function removeWPVersionInfo() {
remove_action('wp_head', 'wp_generator');
}
But, you're still going to have the usual paths, i.e., wp-content/themes/ etc... and wp-content/plugins/ etc... in page source, unless you figure out a way to rewrite those with .htaccess.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string