Hiding sensitive/confidential information in log files

Hiding sensitive/confidential information in log files - security

How would you go about hiding sensitive information from going into log files? Yes, you can consciously choose not to log sensitive bits of information in the first place, but there can be general cases where you blindly log error messages upon failures or trace messages while investigating a problem etc. and end up with sensitive information landing in your log files.
For example, you could be trying to insert an order record that contains the credit card number of a customer into the database. Upon a database failure, you may want to log the SQL statement that was just executed. You would then end up with the credit card number of the customer in a log file.
Is there a design paradigm that can be employed to "tag" certain bits of information as sensitive so that a generic logging pipeline can filter them out?

My current practice for the case in question is to log a hash of such sensitive information. This enables us to identify log records that belong to a specific claim (for example a specific credit-card number) but does not give anybody the power to just grab the logs and use the sensitive information for their evil purposes.
Of course, doing this consistently involves good coding practices. I usually choose to log all objects using their toString overloads (in Java or .NET) which serializes the hash of the values for fields marked with a Sensitive attribute applied to them.
Of course, SQL strings are more problematic, but we rely more on our ORM for data persistence and log the state of the system at various stages then log SQL queries, thus it is becomes a non-issue.

I would personally regard the log files themselves as sensitive information and make sure to restrict access to them.

Logging a credit card number could be a PCI violation. And if you aren't PCI compliant, you will be charged higher card-processing fees. Either don't log sensitive information, or encrypt your entire log files.
Your idea of "tagging" sensitive information is intriguing. You could have a special data type for Sensitive information, that wrapped the real, underlying data type. Whenever this object is rendered as a character string, it just returns "***" or whatever.
However, this could require widespread coding changes, and requires a level of concious vigilance similar to that needed to avoid logging sensitive information in the first place.

In your example, you should be encrypting the credit card number or, better yet, not even storing it in the first place.
If, say, you were logging something else, like a login, you might want to explicitly replace a password with *****.
However, this manages to neatly avoid answering the question you've posed in the first place. In general, when dealing with sensitive information, it should be encrypted on its way to any form of permanent storage, be it a database file or a log file. Assume that a Bad Guy is going to be able to get their hands on either, and protect the information accordingly.

If you know what you're trying to filter, you may run you log output through a Regex cleaning expression before you log it.

Regarding SQL statements specifically, if your language supports it, you should be using parameters instead of putting values in the statement itself. In other words:
select * from customers where credit_card = ?
Then set the parameter to the credit card number.
Of course, if you plan to log SQL statements with parameters filled in, you'd need some other way to filter out sensitive data.

Refer this tool, created exactly for this use case.
If you want to mask only selected field, during logging and keep other field values as is. you can try this.
https://github.com/senthilaru/sp-util
<dependency>
<groupId>com.immibytes</groupId>
<artifactId>sp-utils</artifactId>
<version>1.0.0-RELEASE</version>
</dependency>

Related

SQL Injection assistance required

I am trying to understand the below injection and what it is trying to do. What is it trying to get? The only portion I understand is the union and select part, but the full injection I am unsure of and need help understanding.
action=&aid=1&_FILES%5Btype%5D%5Btmp_name%5D=%5C%27%20or%20mid=#%60%5C%27%60%20/!50000union//!50000select/1,2,3,(select%20CONCAT(0x7c,userid,0x7c,pwd)+from+%60%23#__admin%60%20limit+0,1),5,6,7,8,9%23#%60%5C%27%60+&_FILES%5Btype%5D%5Bname%5D=1.jpg&_FILES%5Btype%5D%5Btype%5D=application/octet-stream&_FILES%5Btype%5D%5Bsize%5D=4294

Well, first we can url decode the string:
action=
&aid=1
&_FILES[type][tmp_name]=\' or mid=#`\'`/!50000union//!50000select/1,2,3,(select CONCAT(0x7c,userid,0x7c,pwd) from `##__admin` limit 0,1),5,6,7,8,9##`\'`
&_FILES[type][name]=1.jpg
&_FILES[type][type]=application/octet-stream
&_FILES[type][size]=4294
One of these parameters sticks out as pretty suspicious.
[tmp_name]=\' OR mid=#`\'`
/!50000union/
/!50000select/1,2,3,
(select CONCAT(0x7c,userid,0x7c,pwd)
from `##__admin`
limit 0,1)
,5,6,7,8,9##`\'`
In plain english, it's injecting a select query to get usernames and passwords in a format like 0x7c<user>0x7c<password> from the ##__admin table (which, according to #DCoder is likely a placeholder for the actual table where these values would be kept) and appending it to your original select.
The !50000 stuff is for bypassing your web application firewall (if you have one). If you don't, then it may just be a bot or automated attempt. Or someone following a script to see what works. The numbers aren't really useful - it may be for evading a firewall or just for debugging purposes for the attacker to see how the output looks. It's hard to tell without being able to run it.
Here's what the SQL the attacker is trying to run would look like in 'plain SQL':
select
userid,
pwd
from
`##__admin`
Do you have a table like this? When you go to this url for your site, does it dump the user table? If not, then you may not even have a problem and it is just an automated scan. You may still have issues with SQL injection, even if it doesn't work, but having this in your logs is not evidence of a breach... it's definitely a red flag though.

It's adding extra columns to the result recordset, with user/pwd information. So in essence, the user wants to collect user accounts he or she wants to abuse.

It have to be noted that SQL injection (if any) is made possible by another vulnerability.
It is clear that this application is depending (or at least it is believed by the malicious user) on the some sort of homebrewed implementation of register_globals. A worst implementation ever.
To make this code work, the application have to take GET variables, and blindly convert them in global variables, making $_FILE array to appear not from internal process but from mere GET request.

Visible User ID in Address Bar

Currently, to pass a user id to the server on certain views I use the raw user id.
http://example.com/page/12345 //12345 Being the users id
Although there is no real security risk in my specific application by exposing this data, I can't help but feeling a little dirty about it. What is the proper solution? Should I somehow be disguising the data?
Maybe a better way to propose my question is to ask what the standard approach is. Is it common for applications to use user id's in plain view if it's not a security risk? If it is a security risk how is it handled? I'm just looking for a point in the right direction here.

There's nothing inherently wrong with that. Lots of sites do it. For instance, Stack Overflow users can be enumerated using URLs of the form:
http://stackoverflow.com/users/123456
Using a normalized form of the user's name in the URL, either in conjunction with the ID or as an alternative to it, may be a nicer solution, though, e.g:
http://example.com/user/yourusername
http://example.com/user/12345/yourusername
If you go with the former, you'll need to ensure that the normalized username is set up as a unique key in your user database.
If you go with the latter, you've got a choice: if the normalized username in the database doesn't match the one in the URL, you can either redirect to the correct URL (like Stack Overflow does), or return a 404 error.

In addition to duskwuff's great suggestion to use the username instead of the ID itself, you could use UUIDs instead of integers. They are 128-bit in length so infeasible to enumerate, and also avoid disclosing exactly how many users you have. As an added benefit, your site is future proofed against user id limits if it becomes massively popular.
For example, with integer ids, an attacker could find out the largest user_id on day one, and come back in a week or months time and find what the largest user_id is now. They can continually do this to monitor the rate of growth on your site - perhaps not a biggie for your example - but many organisations consider this sort of information commercially sensitive. Also helps avoid social engineering, e.g. makes it significantly harder for an attacker to email you asking to reset their password "because I've changed email providers and I've forgotten my old password but I remember my user id!". Give an attack an inch and they'll run a mile.
I prefer to use Version/Type 4 (Random) UUIDs, however you could also use Version/Type 5 (SHA-1-based) so you could go UUID.fromName(12345) and get a UUID derived from the integer value, which is useful if you want to migrate existing data and need to update a bunch of foreign key values. Most major languages support UUIDs natively or are included in popular libraries (C & C++), although some database software might require some tweaking - I've used them with postgres and myself and are easy transitions.
The downside is UUIDs are significantly longer and not memorable, but it doesn't sound like you need the ability for the user to type in the URLs manually. You do also need to check if the UUID already exists when creating a user, and if it does, just keep generating until an unused UUID is found - in practice given the size of the numbers, using Version 4 Random UUIDs you will have a better chance at winning the lottery than dealing with a collision, so it's not something that will impact performance etc.
Example URL: http://example.com/page/4586A0F1-2BAD-445F-BFC6-D5667B5A93A9

Are MongoDB ids guessable?

If you bind an api call to the object's id, could one simply brute force this api to get all objects? If you think of MySQL, this would be totally possible with incremental integer ids. But what about MongoDB? Are the ids guessable? For example, if you know one id, is it easy to guess other (next, previous) ids?
Thanks!

Update Jan 2019: As mentioned in the comments, the information below is true up until version 3.2. Version 3.4+ changed the spec so that machine ID and process ID were merged into a single random 5 byte value instead. That might make it harder to figure out where a document came from, but it also simplifies the generation and reduces the likelihood of collisions.
Original Answer:
+1 for Sergio's answer, in terms of answering whether they could be guessed or not, they are not hashes, they are predictable, so they can be "brute forced" given enough time. The likelihood depends on how the ObjectIDs were generated and how you go about guessing. To explain, first, read the spec here:
Object ID Spec
Let us then break it down piece by piece:
TimeStamp - completely predictable as long as you have a general idea of when the data was generated
Machine - this is an MD5 hash of one of several options, some of which are more easily determined than others, but highly dependent on the environment
PID - again, not a huge number of values here, and could be sleuthed for data generated from a known source
Increment - if this is a random number rather than an increment (both are allowed), then it is less predictable
To expand a bit on the sources. ObjectIDs can be generated by:
MongoDB itself (but can be migrated, moved, updated)
The driver (on any machine that inserts or updates data)
Your Application (you can manually insert your own ObjectID if you wish)
So, there are things you can do to make them harder to guess individually, but without a lot of forethought and safeguards, for a normal data set, the ranges of valid ObjectIDs should be fairly easy to work out since they are all prefixed with a timestamp (unless you are manipulating this in some way).

Mongo's ObjectId were never meant to be a protection from brute force attack (or any attack, for that matter). They simply offer global uniqueness. You should not assume that some object can't be accessed by a user because this user should not know its id.
For an actual protection of your resources, employ other techniques.
If you defend against an unauthorized access, place some authorization logic in your app (allow access to legitimate users, deny for everyone else).
If you want to hinder dumping all objects, use some kind of rate limiting. Combine with authorization if applicable.
Optional reading: Eric Lippert on GUIDs.

How to implement/use a secure 'read-once' local file access system?

does anybody know of a secure 'read-once' local file access system? Or how one might create one? I realise that if data is to be used on a system, then it must be capable of being read, but I think it may be possible to severely limit how data is made available and reduce the possibility of it being copied and used elsewhere.
These are my requirements:
I want to store a 'secure/encrypted' data-file on a USB stick (could be read-only CD/DVD, but better if read/write USB or even a floppy) and have this file capable of being read once (and mainly only once), on a decoded block-by-block basis, once a password has been entered. The file content is probably basic text/xml (or text-encoded data) and is to be read mainly as a sequential stream. The data (ideally) can be read by normal windows file-access methods, ie: a std file, FSO objects (stream and text file), all BASIC PC (VB6/VB.NET) file handling methods, even Excel text (import). yes, I know this probably defeats the object (as such a file can then be opened/saved), but I would still want this possibility. Finally, once the 'access' criteria had been met, the device would prevent further access.
Access to the data would be on a local PC system only. No LAN, no device sharing supported. Data on the device should not be copyable by normal means. Data would be written to the device using normal methods if possible or a special application if necessary.
To keep things simple, just one password, one file, one use, and one user would be great, but other possible enhancements include: (as icing on the cake)...
allowing 'n' opens
having multiple passwords 2 or more users, acting individually
silo-passwords, having 2 more users sign together to get access (or even
having at least n from m more users sign together to get access)
Password prompt should be given on first block-access, independent of
application calling the first block
Password could be embedded/automatic
tie the access to a nominated machine/mac/ip/disk serial number (or
other machine-code)
tie the access to a nominated program /application
if possible, delete and securely overwrite the data file
My first guess at doing this suggests that it would need a 'psuedo-device' driver that would appear as an extention to (or replacement of) the std removable-device driver. The driver would handle each file block, sector by sector, and refuse to server further decoded blocks if not authorised. The device should not give normal directory listings, but some some form of content summary may be given to a user (optional).
Unlike a DRM system, I don't want any form of on-line acces/authentication (but would consider it), I would prefer a self-contained system.
I have looked long and hard for a such a device/system, and haven't found one yet. Most devices and system tools (eg: Iomega/ironkey) appear to unlock access to files, but without limit, ie: read-many, once unlocked.
Performance is not an issue. Slow floppy read-rate would be okay. Encyption method is agnostic, anything reasonably strong 40bit+ (128bit) would be fine. I can't tell you what the data is or whats its for, I just need a way to give data to somebody and limit its use as far as possible and what they can do with it. Its a real requirement to protect confidential data and not meant for DRM or MP3s/Videos or similar.
I am an 'office' developer and not really familiar with device-drivers or DRM - Now where would I start with such a project? Is there anything out-there available to joe-public already?
Thanks - Tim.
PS: Update
I should point out that I just wish to pass data between ourselves and a single specific nominated service-provider. I don't want them to copy the data we provide. It will be used once to support a 'singular' one-off process and then be done-with. As the data is 'streamed/read' it should be 'consumed'. if the process fails, we will re-issue the data to the service-provider. the data remains our property, it is not being sold/licensed.
I do realise that no solution will be foolproof, but the risk/reward ratio should dissuade casual attempts to break the system. The data has no explicit commercial value.
PPS: Its a real requirement... What would you do?
Judging by the upvotes on #eriksons thoughtful answer, you guys are saying 'not possible / don't bother' - but apart from personally supervising that the data is used according to our wishes, what would you do?

Executive summary: this isn't a realistic solution. Re-think the process so that "read-once" isn't necessary.
A few companies (Disappearing Inc. comes to mind, and they had at least one competitor) tried to make "self-destructing" email on general-purpose hardware in the late 90s. They spent millions of dot.com dollars to develop systems that didn't really work.
The only potential solution I know of is the use of a Trusted Platform Module. These are fairly common, as they are required in all computers bought by the US government. However, their capabilities vary. You'd need one that supported something called remote attestation, which allows software to perform integrity checks on itself. With this capability, you could write software that would enforce your data destruction policy. However, I don't think this feature is widely used. My laptop has a TPM, but it doesn't support this.
You should also be aware that there is a lot of hostility against "trusted computing," because it can be used to limit the functionality of a machine. This violates the right to do as you please with your property. TPMs might make sense for corporate or government machines, but not for personal computers.
Other aspects of your problem, such as granting multiple users access to the data, requiring multiple users to gain access to the data are easier.
Encrypting data for multiple users is typically achieved by generating a key, encrypting the data with that "content encryption key", then encrypting the key (which is relatively small) with a "key encryption key" (which could be a password) belonging to each intended recipient.
Requiring some number of users to enter a password can be done securely with Shamir Secret Sharing, as I learned here on SO.
Based on the comments on the question, especially the "mailing label printing service" analogy, I'm afraid my initial answer isn't really relevant.
In a case like that, I can only see a legal solution. Disallow storage of your data in the contract. If it's worth suing them for violating the contract, do so.
Cryptographically speaking, the best thing I could think of would be to "watermark" such a "mailing list" with information that would help me prove that a copy of the list was disclosed by a particular vendor. Knowing that a watermark exists might deter any deliberate disclosures, and could help leverage a fast settlement in the case of accidental disclosure. This could use steganographic techniques within records as well as fake records in the collection.
Algorithms for doing this might already exist, but I'm not familiar with the field. Researching "digital watermarks" might be useful. Even if it only turns up algorithms for protected video and audio, perhaps these could be adapted to work with other media.

There are several problems with your approach.
If you can read the data from any application, you can safe the data anywhere. I would think this would defeat the purpose of any 'only-one-access' policy.
To get a device driver to handle your scenario, you would need deep knowledge of file-system-programming, which at least under windows is no easy undertaking. Even then, it would be hard to enforce the one time access prerequisite.
Programs have different file-access strategies, which might break your assumptions. E.g. an application may open a file once to get its size, then close and reopen it, to load its data. How should this be enforced? Do you want to limit 'OpenFile' calls? do you want to limit 'read byte' calls? Do you want to limit ... jumping around in the file?
When your medium gets copied, by whatever means, you have no way of knowing that. The games industry tries to bind the game to the original CD for years, but failed miserably for years.
I think, what would be feasible, would be a container format, with a encoder/decoder, or something like that. (See Bitlocker in Windows7) That would guarantee, that you can only decode the data once to a local disc and would then delete the container on your medium (beware, check first if the medium is writable, and bind the container to an serial-number or name of the medium so that the container cannot be copied).
Another possibility would be a separate USB device, which you can only use once to extract the data from it. Then you would only need to write a driver once in user mode with WinUSB. Encrypted USB-Sticks use this approach.
But I really think this is a bad idea, because you can very easily get around any counter measurement, when the receiving person can read all data from the medium and safe it anywhere else.

What should I be doing to secure my web form UI?

I have a mostly desktop programming background. In my spare time I dabble in web development, page faulting my way from problem to solution with some success. I have reached the point were I need to allow site logins and collect some data from a community of users (that is the plan anyway).
So, I realize there is a whole world of nefarious users out there who are waiting eagerly for an unsecured site to decorate, vandalize and compromise. If I am really lucky, a couple of those users might make their way to my site. I would like to be reasonably prepared for them.
I have a UI to collect information from a logged in user and some of that information is rendered into HTML pages. My site is implemented with PHP/MySQL on the back end and some javascript stuff on the front. I am interested in any suggestions/advice on how I should tackle any of the following issues:
Cross Site Scripting : I am hoping this will be reasonably simple for me since I am not supporting marked down input, just plain text. Should I be just scanning for [A-Za-z ]* and throwing everything else out? I am pretty ignorant about the types of attacks that can be used here, so I would love to hear your advice.
SQL injection : I am using parametized queries (mysqli) here , so I am hoping I am OK in this department. Is there any extra validation I should be doing on the user entered data to protect myself?
Trollish behaviour : I am supporting polylines drawn by the user on a Google Map, so (again if I am lucky enough to get some traffic) I expect to see a few hand drawn phallices scrawled across western Europe. I am planning to implement some user driven moderation (flag inaproriate SO style), but I would be interested in any other suggestions for discouraging this kind of behavior.
Logins : My current login system is a pretty simple web form, MySQL query in PHP, mp5 encoded password validation, and a stored session cookie. I hope the system is simple enough to be secure, but I wonder if their are vulnerabilies here I am not aware of?
I hope I haven't been too verbose here and look forward to hearing your comments.

Your first problem is that you are concerned with your UI. A simple rule to follow is that you should never assume the submitted data is coming from a UI that you created. Don't trust the data coming in, and sanitize the data going out. Use PHP's strip_tags and/or htmlentities.
Certain characters (<,>,",') can screw up your HTML and permit injection, but should be allowed. Especially in passwords. Use htmlentities to permit the use of these characters. Just think about what would happen if certain characters were output without being "escaped".
Javascript based checks and validation should only be used to improve the user experience (i.e. prevent a page reload). Do not use eval except as an absolute last resort.

Cross Site Scripting can be easily taken care of with htmlentities, there is also a function called strip tags which removes the tags from the post and you'll note that this allows you to whitelist certain tags. If you do decide to allow specific tags through in the future keep in mind that the attributes on these tags are not cleaned in any way, this can be used to insert javascript in the page (onClick etc.) and is not really recommended. If you want to have formatting in the future I'd recommend implementing a formatting language (like [b] for bold or something similar) to stop your users from just entering straight html into the page.
SQL Injection is also easily taken care of as you can prepare statements and then pass through the user data as arguments to the prepared statement. This will stop any user input from modifying the sql statement.
CSRF (Cross-Site Request Forgery) is an often overlooked vulnerability that allows an attacker to submit data from a victims account using the form. This is usually done either by specifying your form get string for an img src (the image loads for the victim, the get loads and the form is processed, but the user is unaware ). Additionally if you use post the attacker can use javascript to auto-submit a hidden form to do the same thing as above. To solve this one you need to generate a key for each form, keep one in the session and one on the form itself (as a hidden input). When the form is submitted you compare the key from the input with the key in the session and only continue if they match.
Some security companies also recommend that you use the attribute 'autocomplete="off"' on login forms so the password isn't saved.

Against XSS htmlspecialchars is pretty enough, use it to clear the output.
SQL injection: if mysql parses your query before adds the parameters, afaik its not possible to inject anything malicious.

I would look into something else besides only allowing [A-Za-z]* into your page. Just because you have no intention of allowing any formatting markup now doesn't mean you won't have a need for it down the line. Personally I hate rewriting things that I didn't design to adapt to future needs.
You might want to put together a whitelist of accepted tags, and add/remove from that as necessary, or look into encoding any markup submitted into plain text.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string