Do bots' user agents always have "http" in it?

Do bots' user agents always have "http" in it? - browser

Is it safe to assume that all bots' user agents always have URLs in the user agent strings?
I, of course, compare the user agent against the list of bots, but the idea here to do a preliminary check before check it against a long list.
Perhaps if I could reword my question better, is there any valid non-bot, non-crawler, non-spider or any non-filthy creature that has a URL in the user agent?

Is it safe to assume that all bots' user agents always have URLs in the user agent strings?
Nope. Check out this bot list, it has plenty of bots that don't sport a URL.
Perhaps if I could reword my question better, is there any valid non-bot, non-crawler, non-spider or any non-filthy creature that has a URL in the user agent?
I can't think of a browser that has a URL in the agent string, but this is definitely a dangerous assumption to work with. Remember that for example, Internet Explorer Add-Ons can add their signatures to the browser's user agent string. You can't guarantee there won't be a URL in it.

There's no assumptions you can make about the user agent string. RFC 1945, section 10.15 User Agent specifies the format and the section 3.7 Product Token specifies how product tokens should be formatted. As you can see from these two, user agent string can be pretty much anything the HTTP agent wants it to be.
Note: strictly speaking, using an URL in the product token can be treated as a violation of that RFC, since the / should be treated as a separator between the product identifier and the product version.

Related

Campaign tracking on Cross-Domain without redirect (no linker)

Lets say we have the following url:
https://www.sale.com/?utm_source=CDTest3Newsletter&utm_medium=CDTest3Email&utm_campaign=CDTest3FallSale&utm_id=CDT3ID
A user clicks on the link and surf through it and then close the session.
An hour later he/she navigates to www.purchase.com and a conversion occurs, is there a way to track and relate the conversion to the utm_id=cdt3id?
In Summary the conversion happens in the second domain and we want to relate that to the first domain marketing campaign!
Please note i know hot to enable linker while redirecting from origin domain to target domain!

You have to realize that this kind of behavior is not standard. Therefore, it will require non-standard solutions.
Having said that, your real problem is not the attribution. In the described scenario, you are likely to lose the user completely. Purchase.com will have no idea that this client is supposed to have the same id as on the previous site. The linker adds an explicit _ga query param to the url for the ga library on the purchase.com to know to use that as a user id and not to generate a new one.
If you're not able to reliably pass the client id to the checkout TLD through front-end, you have to use your backend to match the user by the BE auth/session token. Same exact logic applies if you want to pass the attribution data. You just keep it on the backend, bound to the user session token and throw it to the user's cookie on checkout, then grab it with GTM and populate it however you like. Or you can as well just conduct a BE redirect, appending both the _ga and the UTM query params to the url.
There are a bit more options if you're not using GA for your actual analysis. If you're able to match users and calculate attributions on your own either through ETL or persistent derived tables/SQL. So, basically, if you download your GA data to a third party storage like snowflake, asure or BQ and then use a BI tool on top of that. But at this point those options should be pretty apparent from the issue and possible solution described above.

Security concerns regarding username / password vs secret URL

I have a simple site with a sign-up form. Currently the user can complement their registration with (non-critical, "low security") information not available at the time of the sign-up, through a personal (secret) URL.
I.e., once they click submit, they get a message like:
Thanks for signing up. You can complement your registration by adding information through this personal URL:
http://www.example.com/extra_info/cwm8iue2gi
Now, my client asks me to extend the application to allow users to change their registration completely, including more sensitive information such as billing address etc.
My question: Are there any security issues with having a secret URL instead of a full username / password system?
The only concern I can come up with is that URLs are stored in the browser history. This doesn't worry me much though. Am I missing something?
It's not the end of the world if someone changes some other users registration info. (It would just involve some extra manual labor.) I will not go through the extent of setting up https for this application.

This approach is not appropriate for sensitive information because it's part of the HTTP request URL, which is not encrypted and shows up in many places such as proxy and other server logs. Even using HTTPS, you can't encrypt this part of the payload, so it's not an appropriate way to pass the token.
BTW, another problem with this scheme is if you send the URL to the user via email. That opens up several more avenues for attack.
A better scheme would require some small secret that is not in the email. But it can be challenging to decide what that secret should be. Usually the answer is: password.

Another potential problem lies with the users themselves. Most folks realize that a password is something they should try to protect. However, how many users are likely to recognize that they ought to be making some sort of effort to protect your secret URL?

The problem here is that although it is hard to guess the URL for any specific user, given enough users it becomes relatively easy to guess a correct url for SOME user.
This would be a classic example of a birthday attack.
ETA: Missed the part about the size of the secret, so this doesn't really apply in your case, but will leave the answer here since it might apply in the more general case.

can complement their registration with (non-critical, "low security") information
It's hard to imagine what user-supplied information really is "low-security"; even if you are asking for a password and a username from your customers you are potenitally violating a duty of care to your customers; a large propertion of users will use the same username/password on multiple sites. Any information about your users and potentially a lot of information about transactions can be used by a third party to compromise the identity of that user.
Any information about the user should be supplied in an enctypted format (e.g. via https). And you should take appropriate measures to protect the data you store (e.g. hashing passwords).
Your idea of using a secret URL, means that only you, the user, anyone on the same network as the user, in the vicinity of a user on wifi, connected to any network between you and the user, or whom has access to the users hardware will know the URL. Of course that's not considering the possibility of someone trying a brute force attack against the URLs.
C.

The secret URL means nothing if you're not using SSL. If you're still having the end-user transmit their identifying information across the Internet in the clear, then it doesn't matter how you're letting them in: They are still exposed.

The "secret URL" is often referred to as security by obscurity. The issue is that it is super simple to write a script that will attempt various combinations of letters, symbols, and numbers to brute force hack this scheme.
So if any sensitive information is stored you should definitely use at least a username and password to secure it.

What attacks can be directed on a registration page

I have a website registration page, and I'm trying to compile a list of what I need to do to protect it. If you know of an attack, please name it, and briefly describe it preferably with a brief description of its solution. All helpful answers/comments receive an up vote.
Here's what I have in mind so far: (and adding what others are suggesting. Phew, adding other input turned out to be lots of work, but please keep them coming, I'll continue adding here)
SQL injections: from user input date. Solution: prepared statements.
[AviD] "Stored Procedures also provide additional benefits (above prepared statements), such as the ability of least privilege on the DB"
Good point, please explain. I thought stored procedures were THE SAME as prepared statements. What I mean those statements were you bindParam the variables. Are they different?
Not hashing the password before entering into db. Solution: hash passwords.
[AviD] "re Hashing, the password needs a salt (random value added to the password before hashing), to prevent Rainbow Table attacks and same-password attacks."
"the salt used should be different for each user."
Good point, I have a question about this: I know salt should be random but also unique. How do we establish the unique part to counter against the same-password attack? I've been reading on this, but didn't get a clear answer on it yet.
[Inshallah] "if you use a long salt, like 16 chars for SHA-256 ($5$) then you don't really need to verify its uniqueness"
[Inshallah] "Actually, I think it doesn't really matter whether or not there are some conflicts. The salt is only for prevention of table lookups, so even a 2 char salt will be a (small) gain, even if there are conflicts. We are not talking about a cryptographic nonce here that absolutely mustn't repeat. But I'm not a cryptanalyst"
Good point, but does anyone have disclaimers on this point?
Dos attacks?! (I'm guessing this applies to registration forms too)
[Pascal Thivent] "Use HTTPs when submitting sensible data like a password." "for man-in-the-middle attacks, provided that adequate cipher suites are used "
What are the "adequate cipher suites" being referred to here?
[Koosha] "Use HTTPs or encrypt passwords before submition with MD5 and Javascript in clientside."
I don't agree to MD5 and don't like encrypting on client-side, makes no sense at all to me. but other input welcome.
[Dan Atkinson] Exclude certain usernames to prevent clashes with existing pages that have the same name (see original post for full answer and explanation)
[Koosha] "limit allowed characters for username.for example alphabet and numbers, dash(-) and dot(.)"
Please explain exactly why?
[Stu42] "Use Captcha so that a bot cannot automatically create multiple accounts"
[AviD] "There are better solutions than captcha, but for a low-value site it can be good enough."
#AviD, please mention an example?
[rasputin] "use e-mail verification"
[Andrew and epochwolf] xss attacks
Although I don't agree with Andrew and epochwolf to simply filter < and > or to convert < to &tl; and > to >. Most opinions suggest a library like HTMLpurifier. Any input on this?

Use HTTPS, i.e. a combination of HTTP and SSL to provide encryption and secure identification of the server when submitting sensitive data like a password. The main idea of HTTPS is to create a secure channel over an insecure network. This ensures reasonable protection from eavesdroppers and man-in-the-middle attacks, provided that adequate cipher suites are used and that the server certificate is verified and trusted.

Use recaptcha or asirra to avoid automatic submission. That should stop the bots and script kiddies.
To stop hordes of humans from submitting spam (via mechanical turk or anything like that), log each attempt in memcached and as soon as you reach a maximum submissions from the same IP in a given period of time, block that IP for a few minutes (or hours, days, whatever...).

You should use e-mail verification
and addition to Koosha's answer :
if you let usernames including such chars "#&?/" and create user pages like this site.com/user?me&you/ it may be serious problem in browsers. Please think it in url address bar of browsers.

I guess you should use a salt when hashing the passwords.

Use Captcha so that a bot cannot automatically create multiple accounts

If the routes on your website are set in a particular way (ie, going by the username, rather than their id), then having a username like 'admin' could cause problems. You should probably have an exclude list of possible usernames.
This caused problems in the past with MySpace, and people having usernames like login, and then decorating their page with a phishing form.
Edit:
As has been mentioned in the comments by AviD and Peter Boughton, it is also a way of misleading users. Let's say that a user has the username 'admin'. Then, in their user information page (assuming that they each get one that is available to all, like SO), they have some link in their about section that says like
For more information, visit our dev
blog at mysite.cn/loginpage
Someone maybe sees, 'mysite' in the url, but doesn't really look at the TLD, which would be China (sorry China!), rather than the .com TLD your site is hosted on. So they click through, assuming it's alright (they came from the admin user page after all), and this site looks identical to yours but has a login page. So you 're-enter' your details, but nothing happens. Or it redirects you elsewhere.
This is often the tactic of bank scammers who wish to target customers, inviting them to go to their website to 're-enter a banking password'.
This is just one more form of a type of security known as 'Social Engineering'.

Filter user's data removing '<', '>' - simply html tags. If someone can view user's profile there are possible XSS attacks through data.

Use HTTPS
Use Captcha.
Limit allowed characters for username in server side. for example alphabet and numbers, dash(-) and dot(.).
PS. Clientside encryption is not a secure way. but if you can't use HTTPs, clientside encryption is better than nothing.
Limiting characters, Its a simple way to protect your software from injections(SQL/XSS).

Are there any published frameworks or standards for passwords and website membership?

I am currently working on a project in which we are creating a large public website for my organization. This site is going to allow out clients to register and log in to obtain sensitive personal information.
From experience I know some of the basics like requiring a complex password and requiring an email address for a password reset that are common used.
Basically what I'm looking for is some sort of well documented recommendation or standards(like NIST or ISO) for these kinds of requirements.
I need to present this to a higher level director who is insisting on us:
not requiring the users to have an email address
asking us to allow the users to have our site display the password back to the user just by verifying a Name, Birthday and SSN
emailing the password in plain text as opposed to emailing a temp password and having them come to our site to reset the PW.
requiring we assign simple system generated username like first intial, first 3 characters of the last name with a 4 digit randomly generated number. (as opposed to the user picking any name they want)
If I can present some type of industry standard on why these are such risks it would really help.

Ok, let me answer the suggestions of your Pointy-Haired director (I understood you know he is just wrong, don't take it personally), I just can't resist:
not requiring the users to have an email address
Welcome to fake accounts.
asking us to allow the users to have our site display the password back to the user just by verifying a Name, Birthday and SSN
In my country and culture, privacy is a real concern so you'll never get my SSN and I won't register to any site asking this. BTW, if this is an information that can be found on the web (I've heard it's the case in the US), this doesn't seem really secure. Why not a security question to add some personal entropy?
emailing the password in plain text as opposed to emailing a temp password and having them come to our site to reset the PW.
LOL! First, how would you do this if you don't have the user's email address (and didn't verify it during registration)? Then, being able to send a password back means that you aren't going to store hashes of salted passwords. Bad idea. Is you director planning to store clear passwords (in the worst case) or to use symmetric encryption (in the best case)? In the later case, I'd like to know where he's planning to store the symmetric encryption key. Maybe on a post-it note under his keyboard. Not sure it's worth to mention that email is not secured.
requiring we assign simple system generated username like first initial, first 3 characters of the last name with a 4 digit randomly generated number (as opposed to the user picking any name they want)
Having a system suggesting available usernames is ok (especially when it's hard to find an available one) but I don't like when they don't allow me to choose a username. Having that said, I don't consider forcing a username as a major threat.
So, in other words, I really wouldn't trust a site with such practices and wouldn't give it any sensible information. Actually, I wouldn't give any information at all (i.e. not register) but I'm not the lambda user.
I know this is not a direct answer to the question but, seriously, when will people with absolutely no clue about something start to let people with a better understanding do their work? This is so ridiculous.
Now, some suggestions to answer the question:
The Definitive Guide To Website Authentication (beta)
Best practices for web login / authentication?
The OWASP's Guide to Authentication
The Weak Password Recovery Validation from the WASC Threat Classification

OWasp is specifically designed to contain standards for security, although it has a lot of articles that are too specific for what you want. You might want to try their development guide or ask on their forum the same question.

The W3C has a security group with a load of bumf. It may contain something that you want. The WASC also has a lot of info and looks authorative.

What are best practices for activation/registration/password-reset links in emails with nonce

Applications send out emails to verify user accounts or reset a password. I believe the following is the way it should be and I am asking for references and implementations.
If an application has to send out a link in an email to verify the user's address, according to my view, the link and the application's processing of the link should have the following characteristics:
The link contains a nonce in the request URI (http://host/path?nonce).
On following the link (GET), the user is presented a form, optionally with the nonce.
User confirms the input (POST).
The server receives the request and
checks input parameters,
performs the change,
and invalidates the nonce.
This should be correct per HTTP RFC on Safe and Idempotent Methods.
The problem is that this process involves one additional page or user action (item 3), which is considered superfluous (if not useless) by a lot of people. I had problems presenting this approach to peers and customers, so I am asking for input on this from a broader technical group. The only argument I had against skipping the POST step was a possible pre-loading of the link from the browser.
Are there references on this subject that might better explain the idea and convince even a non-technical person (best practices from journals, blogs, ...)?
Are there reference sites (preferably popular and with many users) that implement this approach?
If not, are there documented reasons or equivalent alternatives?
Thank you,
Kariem
Details spared
I have kept the main part short, but to reduce too much discussion around the details which I had intentionally left out, I will add a few assumptions:
The content of the email is not part of this discussion. The user knows that she has to click the link to perform the action. If the user does not react, nothing will happen, which is also known.
We do not have to indicate why we are mailing the user, nor the communication policy. We assume that the user expects to receive the email.
The nonce has an expiration timestamp and is directly associated with the recipients email address to reduce duplicates.
Notes
With OpenID and the like, normal web applications are relieved from implementing standard user account management (password, email ...), but still some customers want 'their own users'
Strangely enough I haven't found a satisfying question nor answer here yet. What I have found so far:
Answer by Don in HTTP POST with URL query parameters — good idea or not?
Question from Thomas -- When do you use POST and when do you use GET?

This question is very similar to Implementing secure, unique “single-use” activation URLs in ASP.NET (C#).
My answer there is close to your scheme, with a few issues pointed out - such as short period of validity, handling double signups, etc.
Your use of a cryptographic nonce is also important, that many tend to skip over - e.g. "lets just use a GUID"...
One new point that you do raise, and this is important here, is wrt the idempotency of GET.
Whilst I agree with your general intent, its clear that idempotency is in direct contradiction to one-time links, which is a necessity in some situations such as this.
I would have liked to posit that this doesn't really violate the idempotentness of the GET, but unfortunately it does... On the other hand, the RFC says GET SHOULD be idempotent, its not a MUST. So I would say forgo it in this case, and stick to the one-time auto-invalidated links.
If you really want to aim for strict RFC compliance, and not get into non-idempotent(?) GETs, you can have the GET page auto-submit the POST - kind of a loophole around that bit of the RFC, but legit, and you dont require the user to double-optin, and you're not bugging him...
You dont really have to worry about preloading (are you talkng about CSRF, or browser-optimizers?)... CSRF is useless because of the nonce, and optimizers usually wont process javascript (used to auto-submit) on the preloaded page.

About password reset:
The practice of doing this by sending an email to the user's registered email address is, while very common in practice, not good security. Doing this fully outsources your application security to the user's email provider. It does not matter how long passwords you require and whatever clever password hashing you use. I will be able to get into your site by reading the email sent out to the user, given that I have access to the email account or am able to read the unencrypted email anywhere on its way to the user (think: evil sysadmins).
This might or might not be important depending on the security requirements of the site in question, but I, as a user of the site, would at least want to be able to disable such a password reset function since I consider it unsafe.
I found this white paper that discusses the topic.
The short version of how to do it in a secure way:
Require hard facts about the account
username.
email address.
10 digit account number or other information
like social security number.
Require that the user answers at least three predefined questions (predefined by you,
don't let the user create his own questions) that can not be trivial. Like "What's
your favorite vacation spot", not "What's your favorite color".
Optionally: Send a confirmation code to a predefined email address or cell number (SMS) that the user has to input.
Allow the user to input a new password.

I generally agree with you with some modification suggested below.
User registers at your site providing an email.
Verification email is sent to the users account with two links:
a) One link with the GUID to verify the registration b) One link with the GUID to reject the verification
When they visit the verification url from their email they are automatically verified and the verification guid is marked as such in your system.
When they visit the rejection url from their email they are automatically removed from the queue of possible verifications but more importantly you can tell the user that you are sorry for the email registration and give them further options such as removing their email from your system. This will stop any custom service type complaints about someone entering my email in your system...blah blah blah.
Yes, you should assume that when they click the verification link that they are verified. Making them click a second button in a page is a bit much and only needed for double opt in style registration where you plan to spam the person that registered. Standard registration/verification schemes don't usually require this.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string