I need to delete my input file securely once I have finished with it, at the moment I'm overwriting all the data with zero, this is messy as my temp folder becomes full of old files also the name of the files is a security issue.
Rather than just moving them to the recycle bin I would like them to skip it and just disappear, this is in conjunction with being wiped byte wise as data recovery software can recover items from beyond the recycle bin. As the name is also important I need to rename them before I delete them.
This is a progressive problem. What is "secure" for one application is insecure for another. If security is really important and you find yourself asking these kinds of questions on Stack Overflow, then most likely need to contract with an external security consultant. Examples of really important include financial information, medical records, or anything else where there is a law or contract requiring the securing of the data. I don't say this to be mean or imply that you are incapable of solving the problem, but to point out that this is a rather complex and evolving problem.
Basically to accomplish what you want to accomplish:
Once your code you wrote finishes then change the file size to empty - this makes recovery more difficult because the original file size is lost.
Then rename the file (RenameFile)to a different name.
Finally delete the file using DeleteFile, which does not move the file to the recycle bin.
Make sure you maintain an exclusive handle on the files the whole time they are on the disk too, or they can just be copied before they are deleted.
As I said, this is a progressive problem. This is a really basic solution, and is subject to a number of vulnerabilities. So depending on the level of security needed you might consider never letting the file be written to disk, or using multiple pass overwrites. If security is really important, then actually burning the hard drive platter at a high temperature, and then smashing it is the only way to be sure.
Edit: It appears you removed your code sample.
There are third-party utilities to do this kind of thing from the command - I found PGP Command Line has this feature, if you search around you can probably find a free app that will do this from the command line. You could then just call the command from your app in order to securely delete the file.
I would say that if you are insistent upon writing your own code to do this, then instead of using all 0's, write random bytes to the disk. And don't use the built-in c++ rand function, use a more secure random number generator.
As Jim McKeeth said, this is not something you want to do yourself if there are serious legal repercussions for getting it wrong.
Jim has described well the issues with solving your problem in code. The problem is indeed progressive, and any solution you implement will only approximate complete security without ever attaining it. So one thing to do is to decide exactly what you need to protect the file against (snooping family members? co-workers? corporate espionage? totalitarian governments?), then design your solution accordingly and document its limitations.
I have a sort of an orthogonal suggestion though. Instead of - or in addition to - implementing secure wiping in code, you can require cooperation from users. For example, you can suggest (or require) that input files be stored on an encrypted volume. In corporate environments PGP Disk might be preferred, since it's a recognizable brand, while home users would be well served to use the free and well-tested TrueCrupt. Both products support creating virtual encrypted volumes as well as encrypting whole partitions. This would go a great length to keeping the names and contents of input files secure, even before you write a single line of code.
Deleting a file can be touchy subject...
Depending on the need of your customer I would like to point to the Data remanence phenomenon. Which is residual data left after a simple overwrite. Data erasure is a method of destroying the residual data.
There are a few standards on how to erase the residual data, DoD 5220.22-M is mostly referred to by "secure file delete" applications, but apparently the rules have changed.
As of the June 2007 edition of the DSS
C&SM, overwriting is no longer
acceptable for sanitization of
magnetic media; only degaussing or
physical destruction is acceptable.
So what I'm saying is, try to get the rules which your customer has to follow.
Beware of "wear leveling" algorithms used with flash storage. To promote even wear, files are moved around on the drive, and it's invisible to your app, and even the operating system. So you can "secure delete" the file all you want, and you will only affect the most recent copy of the file. But prior copies are recoverable/discoverable with recovery software. So the only way to solve that, is to encrypt the file contents.
Related
I am making an application that will save information for certain files. I was wondering what the best way to keep track of files. I was thinking of using the absolute path for a file but that could change if the file is renamed. I found that if you run ls -i each file has an id beside it that is unique(?). Is that ok to use for a unique file id?
The inode is unique per device but, I would not recommend using it because imagine your box crashes and you move all the files to a new file system now all your files have new ids.
It really depends on your language of choice but almost all of them include a library for generating UUID's. While collisions are theoretically possible its a veritable non-issue. Generate the UUID prepend it to the front of your file and you are in business. As your implementation grows it will also allow you to create a HashTable index of your files for quick look ups later.
The question is, "unique across what?"
If you need something unique on a given machine at a given point in time, then yes, the inode number + device number is nearly always unique - these can be obtained from stat() or similar in C, os.stat() in python. However, if you delete a file and create another, the inode number may be reused. Also, two different hosts may have a completely different idea of what the device,inodeno pairs are.
If you need something to describe the file's content (so two files with the same content have the same id), you might look into one of the SHA or RIPEMD functions. This will be pretty unique - the odds of an accidental collision are astronomically low.
If you need some other form of uniqueness, please elaborate.
I need to replicate in CouchDB data from one database to another but in the process I want to alter the documents being replicated over,
mostly stripping out particular fields (but other applications mentioned in comments).
The replication would always be 100% one way (but other applications mentioned in comments could use bi-directional and sync)
I would prefer if this process did not increment their revision ID but that might be asking for too much.
But I don't see any of the design document functions that do what I am trying to do.
As it seems doesn't do this, what plans are there for adding this? And meanwhile, what workarounds are there?
No, there is no out-of-the-box solution, as this would defy the whole purpose and logic of multi-master, MVCC logic.
The only option I can see here is to create your own solution, but I would not call this a replication, but rather ETL (Extract, Transform, Load). And for ETL there are tools available that will let you do the trick, like (mixing open source and commercial here):
Scriptella
CloverETL
Pentaho Data Integration, or to be more specific Kettle
Jespersoft ETL
Talend have some tools as well
There is plenty more of ETL tools on the market.
I believe the best approach here would be to break out the fields you want to filter out into a separate document and then filter out the document during replication.
Of course the best way would be to have built-support for this, but a workaround which occurs to me would be, instead of here using the built-in replication, to code and use a custom replication which will do the additional needed alterations/transformations, still using rather than going beneith, the other built-ins, and with good coding, in many situations (especially if each master can push to its slaves), it feels this could be nearly as efficient.
This requires efficient triggers be put on each source/master to detect any changes, which I believe CouchDB does offer (or at least PouchDB appears to), which would then copy the changes to another location also doing the full alterations.
If the source of the change is unable to push the change to the final destination, this fixed store may to be local to it where the destination can pull from -- which could get pretty expensive especially in multi-master, as each location has to not only store & maintain its own data but also the data (being sent) of everyone it sends to.
This replicate would also place each source document's revision ID in the the document's copy...
...that is ideally, including essential if the copy was to be {updated, aka a master}, too.
...in form of either:
ideally the normal "_rev" property. Indeed this looks quite possible per it ("preserve their revisions ID") already done by the normal replication algorithm using the builtin "Bulk Docs API" which seemingly our varient would use, too
otherwise have a new copy object (with its own _rev) plus another field as "_rev_original" ntelling the original rev. But well that would work?
Clearly such copy could be created no problem.
Probably no big if the destination is just reading the data.
Seems hairy if the destination is also writing the data. As we'd now have to merge with these non-standard revisions. But doable.
Relevant to this (coding an a custom/improved replication (to do this apparently-missing functionality) ideally without altering Pouch and especially Couch source code), as starter/basis material (the standard method), here's the normal Couch replication algorithm which unfortunately doens't clearly say it only uses builtin ops but it looks like it, and also the official overview of what it does; I'm suspecting Pouch implements this, likely in Pouch's replicate.js (latest release as of 2014.07).
Futher implementation particulars? - those who would know, please put it here.
This is a "community wiki" answer so please extend it.
Also please comment links & details of anyone/system already doing or trying to do this or similar.
tl;dr : Should I store directories in CouchDB as a list of attachments, or a single tar
I've been using CouchDB to store project documents. I just create documents via Futon and upload them directly from there. I've also written a script to bulk-upload directories. I am using it like a basic content repository. I replicate it, so other people on my team have a copy of the repository.
I noticed that saving directories as a series of files seems to have a lot of storage overhead, so instead I upload a .tar.gz file containing the directory. This does significantly reduce the size of the document but now any change to the directory requires replicating the entire tarball.
I am looking for thoughts or perspective on the matter.
It really depends one what you want to achieve. I will try and provide some options for you to consider.
Storing one tar.gz will save you space, but it does make it harder to work with. If you are simply archiving it may work for you.
Storing all the attachments on one document works well for couchapps. The workflow is you mess around with attachments until you are ready to release the application, then there is not a lot of overhead for replication, because it is usually one time. It is nice that they are one one document because they all move/replicate as one bundle. Downsides for using this approach for a content management system are that you can get a lot of history baggage that you have to compact on your local couch. Also you will get a lot of conflicts during replication between couches, and couch will keep conflicts around for you to resolve. Therefore if you choose this model, you should compact frequently to reduce disk size.
For a content management system, I might recommend using one document per attachment. That would give you less conflicts. There will be a slight overhead as each doc will have some space allocated for the doc itself, but the savings in having to do frequent compaction and/or conflict resolution will be better.
Hope that gives you some options to weigh out.
I am building a windows application to store backups of sensitive files. The purpose of my application is to store a copy of a file with its hash. The program or user will then display the hash publicly in case the user needs to prove they had the backup of the sensitive file at a certain time.
Motivation:
Some situations where this might be useful are:
Someone has a job at a company where they think they might be accused of doing something illegal. If they were accused of changing some data over time, it would be convenient to have copies of sensitive files related to their case over a period of time.
A politician might take notes about things they did each day, many of them about classified or sensitive subjects, and then want to be able to disclose her files at a later date if they are accused of something (for instance, if the CIA said they were briefed on torture…). Not absolute proof, but it would be hard to create fake backup files for every potential scenario, especially several years into the future.
Just to be clear, this application is mostly just an excuse for me to practice my coding skills. I don’t recommend using any type of cryptographic software that hasn’t been scrutinized by several professionals.
Possible Solutions:
For my application, I need to find a good place to publicly store the hash values. Here are my ideas so far:
Send the hash values to a group of people through email. (disadvantage: could annoy people, but would create a traceable record)
Publish the hash values on a public blog (disadvantage: if I ever got in serious legal trouble someone with resources could try to attack the free service I used and erase my data)
Publish the hash values using some online security service that stores documents but does not allow you to delete them. (I am not sure something like this exists.)
What is the most secure and convenient way to publicly display my hash values?
Hash your set of hashes so that you have only one hash to record. Then publish this hash in the classifieds of a widely archived newspaper.
Truly secure? Print out the hashes on a piece of paper along with a legal text to the effect of, "On this day XX/XX/XXXX I affirm these hashes to be accurately identifying these files with these dates." (not a lawyer, get one to verify this), then have it notarized. Then, save that piece of paper in a secure location.
does anybody know of a secure 'read-once' local file access system? Or how one might create one? I realise that if data is to be used on a system, then it must be capable of being read, but I think it may be possible to severely limit how data is made available and reduce the possibility of it being copied and used elsewhere.
These are my requirements:
I want to store a 'secure/encrypted' data-file on a USB stick (could be read-only CD/DVD, but better if read/write USB or even a floppy) and have this file capable of being read once (and mainly only once), on a decoded block-by-block basis, once a password has been entered. The file content is probably basic text/xml (or text-encoded data) and is to be read mainly as a sequential stream. The data (ideally) can be read by normal windows file-access methods, ie: a std file, FSO objects (stream and text file), all BASIC PC (VB6/VB.NET) file handling methods, even Excel text (import). yes, I know this probably defeats the object (as such a file can then be opened/saved), but I would still want this possibility. Finally, once the 'access' criteria had been met, the device would prevent further access.
Access to the data would be on a local PC system only. No LAN, no device sharing supported. Data on the device should not be copyable by normal means. Data would be written to the device using normal methods if possible or a special application if necessary.
To keep things simple, just one password, one file, one use, and one user would be great, but other possible enhancements include: (as icing on the cake)...
allowing 'n' opens
having multiple passwords 2 or more users, acting individually
silo-passwords, having 2 more users sign together to get access (or even
having at least n from m more users sign together to get access)
Password prompt should be given on first block-access, independent of
application calling the first block
Password could be embedded/automatic
tie the access to a nominated machine/mac/ip/disk serial number (or
other machine-code)
tie the access to a nominated program /application
if possible, delete and securely overwrite the data file
My first guess at doing this suggests that it would need a 'psuedo-device' driver that would appear as an extention to (or replacement of) the std removable-device driver. The driver would handle each file block, sector by sector, and refuse to server further decoded blocks if not authorised. The device should not give normal directory listings, but some some form of content summary may be given to a user (optional).
Unlike a DRM system, I don't want any form of on-line acces/authentication (but would consider it), I would prefer a self-contained system.
I have looked long and hard for a such a device/system, and haven't found one yet. Most devices and system tools (eg: Iomega/ironkey) appear to unlock access to files, but without limit, ie: read-many, once unlocked.
Performance is not an issue. Slow floppy read-rate would be okay. Encyption method is agnostic, anything reasonably strong 40bit+ (128bit) would be fine. I can't tell you what the data is or whats its for, I just need a way to give data to somebody and limit its use as far as possible and what they can do with it. Its a real requirement to protect confidential data and not meant for DRM or MP3s/Videos or similar.
I am an 'office' developer and not really familiar with device-drivers or DRM - Now where would I start with such a project? Is there anything out-there available to joe-public already?
Thanks - Tim.
PS: Update
I should point out that I just wish to pass data between ourselves and a single specific nominated service-provider. I don't want them to copy the data we provide. It will be used once to support a 'singular' one-off process and then be done-with. As the data is 'streamed/read' it should be 'consumed'. if the process fails, we will re-issue the data to the service-provider. the data remains our property, it is not being sold/licensed.
I do realise that no solution will be foolproof, but the risk/reward ratio should dissuade casual attempts to break the system. The data has no explicit commercial value.
PPS: Its a real requirement... What would you do?
Judging by the upvotes on #eriksons thoughtful answer, you guys are saying 'not possible / don't bother' - but apart from personally supervising that the data is used according to our wishes, what would you do?
Executive summary: this isn't a realistic solution. Re-think the process so that "read-once" isn't necessary.
A few companies (Disappearing Inc. comes to mind, and they had at least one competitor) tried to make "self-destructing" email on general-purpose hardware in the late 90s. They spent millions of dot.com dollars to develop systems that didn't really work.
The only potential solution I know of is the use of a Trusted Platform Module. These are fairly common, as they are required in all computers bought by the US government. However, their capabilities vary. You'd need one that supported something called remote attestation, which allows software to perform integrity checks on itself. With this capability, you could write software that would enforce your data destruction policy. However, I don't think this feature is widely used. My laptop has a TPM, but it doesn't support this.
You should also be aware that there is a lot of hostility against "trusted computing," because it can be used to limit the functionality of a machine. This violates the right to do as you please with your property. TPMs might make sense for corporate or government machines, but not for personal computers.
Other aspects of your problem, such as granting multiple users access to the data, requiring multiple users to gain access to the data are easier.
Encrypting data for multiple users is typically achieved by generating a key, encrypting the data with that "content encryption key", then encrypting the key (which is relatively small) with a "key encryption key" (which could be a password) belonging to each intended recipient.
Requiring some number of users to enter a password can be done securely with Shamir Secret Sharing, as I learned here on SO.
Based on the comments on the question, especially the "mailing label printing service" analogy, I'm afraid my initial answer isn't really relevant.
In a case like that, I can only see a legal solution. Disallow storage of your data in the contract. If it's worth suing them for violating the contract, do so.
Cryptographically speaking, the best thing I could think of would be to "watermark" such a "mailing list" with information that would help me prove that a copy of the list was disclosed by a particular vendor. Knowing that a watermark exists might deter any deliberate disclosures, and could help leverage a fast settlement in the case of accidental disclosure. This could use steganographic techniques within records as well as fake records in the collection.
Algorithms for doing this might already exist, but I'm not familiar with the field. Researching "digital watermarks" might be useful. Even if it only turns up algorithms for protected video and audio, perhaps these could be adapted to work with other media.
There are several problems with your approach.
If you can read the data from any application, you can safe the data anywhere. I would think this would defeat the purpose of any 'only-one-access' policy.
To get a device driver to handle your scenario, you would need deep knowledge of file-system-programming, which at least under windows is no easy undertaking. Even then, it would be hard to enforce the one time access prerequisite.
Programs have different file-access strategies, which might break your assumptions. E.g. an application may open a file once to get its size, then close and reopen it, to load its data. How should this be enforced? Do you want to limit 'OpenFile' calls? do you want to limit 'read byte' calls? Do you want to limit ... jumping around in the file?
When your medium gets copied, by whatever means, you have no way of knowing that. The games industry tries to bind the game to the original CD for years, but failed miserably for years.
I think, what would be feasible, would be a container format, with a encoder/decoder, or something like that. (See Bitlocker in Windows7) That would guarantee, that you can only decode the data once to a local disc and would then delete the container on your medium (beware, check first if the medium is writable, and bind the container to an serial-number or name of the medium so that the container cannot be copied).
Another possibility would be a separate USB device, which you can only use once to extract the data from it. Then you would only need to write a driver once in user mode with WinUSB. Encrypted USB-Sticks use this approach.
But I really think this is a bad idea, because you can very easily get around any counter measurement, when the receiving person can read all data from the medium and safe it anywhere else.