Is HTML Email Obfuscation safe enough to stop bots?

Is HTML Email Obfuscation safe enough to stop bots? - security

I know that most javascript email obfuscation solutions stop bots dead in their tracks - but sometimes it's hard to use/insert javascript in places.
To that end I was wondering if anyone knew if the bots were smart enough to translate HTML entities in HEX and DEC into valid email strings?
For example, lets say I have a function that randomly converts the string characters into one of three forms - is this enough?
hide_email($email)
{
$s='';
foreach(str_split($email)as$l)
{
switch(rand(1,3))
{
case 1:$s.='&#'.ord($l).';';break;
case 2:$s.='&#x'.dechex(ord($l)).';';break;
case 3:$s.=$l;
}
}
return$s;
}
which makes first.last#email.com into something like:
first.last#email.com
I would assume that the bot creators would have already added a regex pattern for something like this this...

I would not think this particularly safe. Were I writing code to interpret HTML, decoding entities to their corresponding characters would be among the first bits of code to go in.
As a further defense, I would suggest judicious use of tags (such as the <span> tag), perhaps even nested. That takes more effort to decode and still does not require Javascript.

I wouldn't be shocked if a bot used a client that did an HtmlDecode before returning the results.

There was an interesting article I read awhile ago about a guy who posted a web page with nine different methods of obfuscation, and waited a year to see how much each e-mail address got.
Here's a link to the article: Nine Ways to Obfuscate E-mail Addresses Compared. Some of the pictures in the sidebar may not be safe for work, if your work frowns on girls in bikinis.

Related

Removing ids from url [duplicate]

Hey guys! Working on a new Cake app and wondering if there is anyway for me to remove the ID-in-URL routing from Cake. Perhaps by passing the ID in POST somehow? Having the ID passed in as a URL param just seems really shoddy and unsafe. Thanks!

"Shoddy"? It's standard practice and a perfectly fine solution to have ids in the URL. Look at the URL of your question:
http://stackoverflow.com/questions/4638262/removing-id-from-cakephp-url
^^^^^^^
id
Also, there's absolutely nothing unsafe about showing an id in a URL. It's just a number that doesn't mean anything. If a user can do something "bad" only by knowing this id, your app is broken and insecure, not the id-passing mechanism.
Trying to work around this scheme means working around the fundamental principle of the HTML protocol and opens up a whole new can of worms.

Some people prefer using slugs instead of primary key ids. This is the removing-id-from-cakephp-url part of the URL from this page. Take a look at the SluggableBehavior.
However, slugs can change. Hence, having the primary key in your URL is useful if you want to have a permalink. StackOverflow does both so that it can support both permalinking from other sites, as well as for SEO reasons. :)
Regarding security issues, I guess the other answers have already pointed out that there are other ways to make your application secure.

Why do you care? URL-s are optimized for SEO reasons, an ID won't matter if it's ain't too long. If the latter, consider using a shorter one with numbers and letters in them instead, it will be as difficult to guess as a long one with just numbers.
If you are not using GET and you do not supply the params in the URL, your users won't be able to copy-paste the location.

Disable client-side Scripting (like Greasemonkey) on a web site?

I would like to know if there is a way to prevent the use of userscripts on my browser game site?
Many use Greasemonkey to have advantages over other players, and I would like a way to disable these scripts.
I found this old article, "How To Disable Greasemonkey On Your Web Site", but it's from 2005 and doesn't seem to work.

Combating a smart scripter is tough. They have the upper hand since their script can touch the page before your server does, and can block or replace just about anything. See this answer to a very similar question.
Your smartest, and most cost-effective countermeasure is to sanction the users who are "gaming" the game. Attack the burglar, not the lock-pick.
If you insist in a tech war with your users, nothing you do will block everybody, but you can make them work for it.
Here are some things you can do make life harder for scripters:
Frequently change the structure of the page, especially element ID's and CSS class names. If you can, periodically insert or remove elements, so that the key <div> is not always the 3rd one in the second <table>, for example.
Every time you make a change, monitor your logs for users who get a sudden decrease in performance or usage -- for however many hours or days it takes them to adapt their scripts.
Likewise, frequently change your javascript filenames, and change the names of any variables or functions that the scripter may use.
Write your click and keyboard event-handlers to only work for trusted events, for browsers that support it.
You can put key text, including countdown timers, in images with unpredictable names. Making it hard for the script to detect key events. Needing to do OCR ramps up the skill-level required by a Greasemonkey scriptwriter, considerably. (At least for now.)
If you move the key game action into Flash, it becomes an order of magnitude harder to script for. They may even have to reverse engineer your flash and replace it with one that has scriptable hooks. Switching to Flash will annoy and drive off users (like me), though.
See that answer for more but, again, the best and most cost-effective approach is to sanction the offending user(s). Be sure that your Terms of Service specifically forbids what they are doing, though.

As addition to Brock Adams' own answer, here's couple methods for finding a possible scripter.
Timed function that checks DOM tree and search for added elements that are not your code's creation, or look for missing elements.
Primarily finds scripter who alters UI, yet haven't read/understood the game's js-source.
Client-side.
"Missing element"-search may get false positives from people who use something like AdBlock Plus. Not really false positive, if aim is to rank them out, too.
Inspect cookie content and look for hints of user added content.
If scripter has to transfer information from page/session to another, and has/knows no other method, he may attempt to use cookies for this.
Inspect query/hash in URL for content not added by your code.
It's possible to try to transfer information to other pages by altering links.
Hash-content (# in URL) is accessible only client-side.
Inspect session/localStorage.
Client-side.
Disable access thru anonymizing services, like anonym.to.
Circumventable, but makes life harder for people using unwanted online-tools.
Allow access to game-page only if referer is correct, otherwise redirect to login-page.
Another method to limit access to game-pages from outside sources.
If you want to be a pain, kill active session when redirecting.
NOTE: All client-side functions can be circumvented by scripter who understands the code.
NOTE: Usage of these requires some wisdom and good planning. If doing things wrong, then with client-side stuff you risk of kicking users' browsers on knees or DDoSing your own server. Or you may end up banning least half of the userbase after an update on your own code, if you use too much automation.

Here's one of my scripts. It could definitely still use some work, but the framework is there (though you may need to wrap everything in a big function to make variables private)
var secureElements,secureTags,secureTagLoop,secureLoop,var secureReporter = secureAnalyzationFunction = 0;
function analyze(secureAnalyzation){
if(secureAnalyzation.indexOf("function ")!=-1){
secureAnalyzationFunction = secureAnalyzation.substring(secureAnalyzation.indexOf("function ")+9,secureAnalyzation.indexOf("()"));
secureAnalyzationFunction = secureAnalyzationFunction+"=undefined;";
eval(secureAnalyzationFunction);
}
}
function secure(){
var secureTags = ["script","link","meta","canvas"];
for(secureTagLoop=0;secureTagLoop!=secureTags.length;secureTagLoop++){
secureElements = document.getElementsByTagName(secureTags[secureTagLoop]);
for(secureLoop=0;secureLoop!=secureElements.length;secureLoop++){
if(secureElements[secureLoop].outerHTML.indexOf("verified")==-1){
analyze(secureElements[secureLoop].outerHTML);
secureElements[secureLoop].parentElement.removeChild(secureElements[secureLoop]);
secureLoop--;
secureReporter++;
console.log("Deleted "+secureReporter+" foreign elements.")
}
}
}
}
window.onload = function() {
secure();
setInterval(secure,1500);
};

Should a honeypot captcha be more complicated than 'display: none;'?

I'm needing to implement some form of captcha support for comments on my blog. I would really prefer a mostly passive approach, as in, no ReCaptcha. I'm thinking about doing a combination of honeypot and this. I don't exactly plan for my site to be specifically targeted by any spammers, but I want to definitely stop all the drive-by spam attacks.
So on to my question: With spam bots advancing in technology all the time, should I use something more complicated for hiding the hidden field than display: none? If so, then what would you suggest?

Unless spam is a serious problem on your blog, I'd just go for doing display: none.
You could also try the classic "What is 2 + 2" / "What color is the sky?" style questions.

for human verification:
I use a php function to generate a random number string, and echo it into a text box. Then require the user to enter it into a blank box. I use jQuery .validate to make sure the two are equal to each other.
for bot detection:
I use a hidden input and then with my jQuery .validate script I make a custom rule that if the hidden input's value isn't blank it returns an error I also have this in my server side php validation. works pretty well.

Eliminate < > as accepted characters in a wordpress password?

Is it possible to eliminate these characters from a wordpress password? I have heard that it can open up scripts this way, that hackers can use to get in. Thank you.

Simple answer:
Your friend has misinformed you. Restricting these characters in a wordpress password is not something you need to worry about. But as they say "There is no smoke without fire".
More background information:
In your own web-application code, you should always be especially careful whenever you take any data from a user (Whether from a form, a cookie,or a URL) or another external computer system or application. The reason for this is that you want to avoid the values being interpreted as code and not just used as data.
The issue that has led your friend to worry about the <> characters is called Cross-Site Scripting and is a kind of attack that malicious users can perform to "inject" html or javascript content into your pages. If you accept information from the user that contains these html mark-up characters and re-display it on the same, or another page, then you can cause their html or javascript content to become part of your page. Any javascript content will run with access to the same data as the user that views the page.
Whenever outside data is read, it sould always be
validated : i.e. checked that it looks like the kind of thing you are expecting, and rejected if it doe not.
and encoded: i.e. When this data is displayed to back to the user or sent to another part of the system, it is converted to be safe. The type of conversion always depends on how and where the data is being used.
Please note that the angle-bracket characters are not the only thing to worry about. Please also note that it is well proven that disallowing certain characters (also called "blacklisting") is never the best way to secure code. It is always safer to state what is allowed (also called "whitelisting").

How to get a description of a URL

I have a list of URLs and am trying to collect their "descriptions." By description I mean what comes up, for example, if you Googled the link. For example, http://stackoverflow.com">Google: http://stackoverflow.com shows the description as
A language-independent collaboratively
edited question and answer site for
programmers. Questions and answers
displayed by user votes and tags.
This the data I'm trying to accumulate for the URLs I have.
I tried parsing the URL's meta-descriptions, however most of them are lacking a meta-description (yet Google and other search engines manage to get a description somehow).
Any ideas? Should I just "google" each link and scrape the data? I have a feeling Google wouldn't like this...
Thanks guys.

Different search engines have different algorithms to get the description out of the page if/when they are lacking the description meta tag. Some ignore the tag even it it's there.
If you want the description Google has, the most accurate way to get it would be to scrape it. Otherwise, you could write your own or look around on the web for code that does it.

These are called snippets.
Google use proprietary (and possibly patented) methods to garner this information, so there is no simple answer.
As you suggest, they will use meta-description information if it is there. (How to set the meta-information to help Google.)
They will also honour requests from the page authors to NOT include snippets. (How to prevent Google from displaying snippets) You should probably respect this too (as well as robots.txt, of course.)
You may have some luck with existing auto-summary packages, such as OTS.

You may want to check AboutUs.org (i.e. http://www.aboutus.org/StackOverflow.com).
But, there's little chance that the site will have an aboutus page and not have a meta description.

Some info that might explain how google does this:
Webmasters/Site owners Help
Adding a URL to google

I am not familiar with Google APIs, but perhaps there is an official way to get such information.

Interesting. some sources are better than others.
For "audiotuts.com" google has a worse description than AboutUs.com.
Google
Nov 18th in General by Joel Falconer ·
1. Recently, an AUDIOTUTS reader asked me about creative process. While this
is a topic that can’t be made into a
...
AboutUs.com:
AUDIOTUTS is a blog/tutorial site for
musicians, producers and audio
junkies! It is the sister site of the
popular PSDTUTS, VECTORTUTS and
NETTUTS.
I hate problems like these... they should be trivial but they aren't!

If you can assume English content, you can first look for Meta Description, and if that doesn't work, you can look for the first two or three sentence-like word sequences.
A product I worked on looked for the first P or DIV that contained more than one sequence of > n "words" delimited by periods. It would use the two or three sentence-like sequences, up to x total words, as a summary paragraph. It wasn't 100% accurate, but good enough for the average case. The number of words was adjusted a few times to eliminate things like navigation elements.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string