We need random read (and later write) access to thousands of discrete ranges (each in the order of a few KBs) within very large binary blobs (in the order of 100s of GB). The current APIs force us to submit a single request for each such range. One negative aspect is billing, of course, but the main problem is the client-side and network loads for handling all these requests!
Are there any known ways of avoiding the massive overhead for access patterns like this?
Assume that reformatting the data is not viable, since the access patterns vary. Replicating the data in a multitude of versions optimized for each access pattern variation is also highly undesirable, for several reasons (optimization lead time, storage costs, data management, plus not all access patterns can be predicted - the known ones might not even be used).
Extending the "Range" REST API header to support multiple ranges would be ideal solution, but obviously that's not ours to control.
Unfortunately, there are no other nice ways to do that. The current api(I think you're using get blob api) only supports a single range not multi-ranges and detail is here.
As of now, there is no good workaround for this issue. I see the user voice you submitted, it's a good feedback and already upvoted for it. Hope the MS team can implement it in the future release.
I've heard a few people say that you should never expose your internal ids to the outside world (for instance an auto_increment'ng primary key).
Some suggest having some sort of uuid column that you use instead for lookups.
I'm wondering really why this would be suggested and if it's truly important.
Using a uuid instead is basically just obfuscating the id. What's the point? The only thing I can think of is that auto_incrementing integers obviously point out the ordering of my db objects. Does it matter if an outside user knows that one thing was created before/after another?
Or is it purely that obfuscating the ids would prevent "guessing" at different operations on specific objects?
Is this even an issue I should thinking about when designing an external facing API?
Great answers, I'll add another reason to why you don't want to expose your internal auto incremented ID.
As a competitive company I can easily instrument how many new users/orders/etc you get every week/day/hour. I just need to create a user and/or order and subtract the new ID from what I got last time.
So not only for security reasons, it's business reasons as well.
Any information that you provide a malicious user about your application and its layout can and will be used against your application. One of the problems we face in (web) application security is that seemingly innocuous design decisions taken at the infancy of a project become achilles heels when the project scales larger. Letting an attacker make informed guesses about the ordering of entities can come back to haunt you in the following, somewhat unrelated ways:
The ID of the entity will inevitably be passed as a parameter at some point in your application. This will result in hackers eventually being able to feed your application arguments they ordinarily should not have access to. I've personally been able to view order details (on a very popular retailer's site) that I had no business viewing, as a URL argument no less. I simply fed the app sequential numbers from my own legitimate order.
Knowing the limits or at least the progression of primary key field values is invaluable fodder for SQL injection attacks, scope of which I can't cover here.
Key values are used not only in RDBMS systems, but other Key-Value mapping systems. Imagine if the JSESSION_ID cookie order could be predetermined or guessed? Everybody with opposable thumbs will be replaying sessions in web apps.
And many more that I'm sure other ppl here will come up with.
SEAL team 6 doesn't necessarily mean there are 6 seal teams. Just keeps the enemy guessing. And the time spent guessing by a potential attacker is more money in your pocket any way you slice it.
As with many security-related issues, it's a subtle answer - kolossus gives a good overview.
It helps to understand how an attacker might go about compromising your API, and how many security breaches occur.
Most security breaches are caused by bugs or oversights, and attackers look for those. An attacker who is trying to compromise your API will firstly try to collect information about it - as it's an API, presumably you publish detailed usage documentation. An attacker will use this document, and try lots of different ways to make your site crash (and thereby expose more information, if he's lucky), or react in ways you didn't anticipate.
You have to assume the attacker has lots of time, and will script their attack to try every single avenue - like a burglar with infinite time, who goes around your house trying every door and window, with a lock pick that learns from every attempt.
So, if your API exposes a method like getUserInfo(userid), and userID is an integer, the attacker will write a script to iterate from 0 upwards to find out how many users you have. They'll try negative numbers, and max(INT) + 1. Your application could leak information in all those cases, and - if the developer forgot to handle certain errors - may expose more data than you intended.
If your API includes logic to restrict access to certain data - e.g. you're allowed to execute getUserInfo for users in your friend list - the attacker may get lucky with some numbers because of a bug or an oversight, and he'll know that the info he is getting relates to a valid user, so they can build up a model of the way your application is designed. It's the equivalent of a burglar knowing that all your locks come from a single manufacturer, so they only need to bring that lock pick.
By itself, this may be of no advantage to the attacker - but it makes their life a tiny bit easier.
Given the effort of using UUIDs or another meaningless identifier, it's probably worth making things harder for the attacker. It's not the most important consideration, of course - it probably doesn't make the top 5 things you should do to protect your API from attackers - but it helps.
I am building a windows application to store backups of sensitive files. The purpose of my application is to store a copy of a file with its hash. The program or user will then display the hash publicly in case the user needs to prove they had the backup of the sensitive file at a certain time.
Motivation:
Some situations where this might be useful are:
Someone has a job at a company where they think they might be accused of doing something illegal. If they were accused of changing some data over time, it would be convenient to have copies of sensitive files related to their case over a period of time.
A politician might take notes about things they did each day, many of them about classified or sensitive subjects, and then want to be able to disclose her files at a later date if they are accused of something (for instance, if the CIA said they were briefed on torture…). Not absolute proof, but it would be hard to create fake backup files for every potential scenario, especially several years into the future.
Just to be clear, this application is mostly just an excuse for me to practice my coding skills. I don’t recommend using any type of cryptographic software that hasn’t been scrutinized by several professionals.
Possible Solutions:
For my application, I need to find a good place to publicly store the hash values. Here are my ideas so far:
Send the hash values to a group of people through email. (disadvantage: could annoy people, but would create a traceable record)
Publish the hash values on a public blog (disadvantage: if I ever got in serious legal trouble someone with resources could try to attack the free service I used and erase my data)
Publish the hash values using some online security service that stores documents but does not allow you to delete them. (I am not sure something like this exists.)
What is the most secure and convenient way to publicly display my hash values?
Hash your set of hashes so that you have only one hash to record. Then publish this hash in the classifieds of a widely archived newspaper.
Truly secure? Print out the hashes on a piece of paper along with a legal text to the effect of, "On this day XX/XX/XXXX I affirm these hashes to be accurately identifying these files with these dates." (not a lawyer, get one to verify this), then have it notarized. Then, save that piece of paper in a secure location.
License keys are the defacto-standard as an anti-piracy measure. To be honest, this strikes me as (in)Security Through Obscurity, although I really have no idea how license keys are generated. What is a good (secure) example of license key generation? What cryptographic primitive (if any) are they using? Is it a message digest? If so, what data would they be hashing? What methods do developers employ to make it difficult for crackers to build their own key generators? How are key generators made?
For old-school CD keys, it was just a matter of making up an algorithm for which CD keys (which could be any string) are easy to generate and easy to verify, but the ratio of valid-CD-keys to invalid-CD-keys is so small that randomly guessing CD keys is unlikely to get you a valid one.
INCORRECT WAY TO DO IT:
Starcraft and Half-life both used the same checksum, where the 13th digit verified the first 12. Thus, you could enter anything for the first 12 digits, and guess the 13th (there's only 10 possibilities), leading to the infamous 1234-56789-1234
The algorithm for verifying is public, and looks something like this:
x = 3;
for(int i = 0; i < 12; i++)
{
x += (2 * x) ^ digit[i];
}
lastDigit = x % 10;
CORRECT WAY TO DO IT
Windows XP takes quite a bit of information, encrypts it, and puts the letter/number encoding on a sticker. This allowed MS to both verify your key and obtain the product-type (Home, Professional, etc.) at the same time. Additionally, it requires online activation.
The full algorithm is rather complex, but outlined nicely in this (completely legal!) paper, published in Germany.
Of course, no matter what you do, unless you are offering an online service (like World of Warcraft), any type of copy protection is just a stall: unfortunately, if it's any game worth value, someone will break (or at least circumvent) the CD-key algorithm, and all other copyright protections.
REAL CORRECT WAY TO DO IT:
For online-services, life is a bit simpler, since even with the binary file you need to authenticate with their servers to make any use of it (eg. have a WoW account). The CD-key algorithm for World of Warcraft - used, for instance, when buying playtime cards - probably looks something like this:
Generate a very large cryptographically-secure random number.
Store it in our database and print it on the card.
Then, when someone enters a playtime-card number, check if it's in the database, and if it is, associate that number with the current user so it can never be used again.
For online services, there is no reason not to use the above scheme; using anything else can lead to problems.
When I originally wrote this answer it was under an assumption that the question was regarding 'offline' validation of licence keys. Most of the other answers address online verification, which is significantly easier to handle (most of the logic can be done server side).
With offline verification the most difficult thing is ensuring that you can generate a huge number of unique licence keys, and still maintain a strong algorithm that isnt easily compromised (such as a simple check digit)
I'm not very well versed in mathematics, but it struck me that one way to do this is to use a mathematical function that plots a graph
The plotted line can have (if you use a fine enough frequency) thousands of unique points, so you can generate keys by picking random points on that graph and encoding the values in some way
As an example, we'll plot this graph, pick four points and encode into a string as "0,-500;100,-300;200,-100;100,600"
We'll encrypt the string with a known and fixed key (horribly weak, but it serves a purpose), then convert the resulting bytes through Base32 to generate the final key
The application can then reverse this process (base32 to real number, decrypt, decode the points) and then check each of those points is on our secret graph.
Its a fairly small amount of code which would allow for a huge number of unique and valid keys to be generated
It is however very much security by obscurity. Anyone taking the time to disassemble the code would be able to find the graphing function and encryption keys, then mock up a key generator, but its probably quite useful for slowing down casual piracy.
Check tis article on Partial Key Verification which covers the following requirements:
License keys must be easy enough to type in.
We must be able to blacklist (revoke) a license key in the case of chargebacks or purchases with stolen credit cards.
No “phoning home” to test keys. Although this practice is becoming more and more prevalent, I still do not appreciate it as a user, so will not ask my users to put up with it.
It should not be possible for a cracker to disassemble our released application and produce a working “keygen” from it. This means that our application will not fully test a key for verification. Only some of the key is to be tested. Further, each release of the application should test a different portion of the key, so that a phony key based on an earlier release will not work on a later release of our software.
Important: it should not be possible for a legitimate user to accidentally type in an invalid key that will appear to work but fail on a future version due to a typographical error.
I've not got any experience with what people actually do to generate CD keys, but (assuming you're not wanting to go down the road of online activation) here are a few ways one could make a key:
Require that the number be divisible by (say) 17. Trivial to guess, if you have access to many keys, but the majority of potential strings will be invalid. Similar would be requiring that the checksum of the key match a known value.
Require that the first half of the key, when concatenated with a known value, hashes down to the second half of the key. Better, but the program still contains all the information needed to generate keys as well as to validate them.
Generate keys by encrypting (with a private key) a known value + nonce. This can be verified by decrypting using the corresponding public key and verifying the known value. The program now has enough information to verify the key without being able to generate keys.
These are still all open to attack: the program is still there and can be patched to bypass the check. Cleverer might be to encrypt part of the program using the known value from my third method, rather than storing the value in the program. That way you'd have to find a copy of the key before you could decrypt the program, but it's still vulnerable to being copied once decrypted and to having one person take their legit copy and use it to enable everyone else to access the software.
CD-Keys aren't much of a security for any non-networked stuff, so technically they don't need to be securely generated. If you're on .net, you can almost go with Guid.NewGuid().
Their main use nowadays is for the Multiplayer component, where a server can verify the CD Key. For that, it's unimportant how securely it was generated as it boils down to "Lookup whatever is passed in and check if someone else is already using it".
That being said, you may want to use an algorhithm to achieve two goals:
Have a checksum of some sort. That allows your Installer to display "Key doesn't seem valid" message, solely to detect typos (Adding such a check in the installer actually means that writing a Key Generator is trivial as the hacker has all the code he needs. Not having the check and solely relying on server-side validation disables that check, at the risk of annoying your legal customers who don't understand why the server doesn't accept their CD Key as they aren't aware of the typo)
Work with a limited subset of characters. Trying to type in a CD Key and guessing "Is this an 8 or a B? a 1 or an I? a Q or an O or a 0?" - by using a subset of non-ambigous chars/digits you eliminate that confusion.
That being said, you still want a large distribution and some randomness to avoid a pirate simply guessing a valid key (that's valid in your database but still in a box on a store shelf) and screwing over a legitimate customer who happens to buy that box.
The key system must have several properties:
very few keys must be valid
valid keys must not be derivable even given everything the user has.
a valid key on one system is not a valid key on another.
others
One solution that should give you these would be to use a public key signing scheme. Start with a "system hash" (say grab the macs on any NICs, sorted, and the CPU-ID info, plus some other stuff, concatenate it all together and take an MD5 of the result (you really don't want to be handling personally identifiable information if you don't have to)) append the CD's serial number and refuse to boot unless some registry key (or some datafile) has a valid signature for the blob. The user activates the program by shipping the blob to you and you ship back the signature.
Potential issues include that you are offering to sign practically anything so you need to assume someone will run a chosen plain text and/or chosen ciphertext attacks. That can be mitigated by checking the serial number provided and refusing to handle request from invalid ones as well as refusing to handle more than a given number of queries from a given s/n in an interval (say 2 per year)
I should point out a few things: First, a skilled and determined attacker will be able to bypass any and all security in the parts that they have unrestricted access to (i.e. everything on the CD), the best you can do on that account is make it harder to get illegitimate access than it is to get legitimate access. Second, I'm no expert so there could be serious flaws in this proposed scheme.
If you aren't particularly concerned with the length of the key, a pretty tried and true method is the use of public and private key encryption.
Essentially have some kind of nonce and a fixed signature.
For example:
0001-123456789
Where 0001 is your nonce and 123456789 is your fixed signature.
Then encrypt this using your private key to get your CD key which is something like:
ABCDEF9876543210
Then distribute the public key with your application. The public key can be used to decrypt the CD key "ABCDEF9876543210", which you then verify the fixed signature portion of.
This then prevents someone from guessing what the CD key is for the nonce 0002 because they don't have the private key.
The only major down side is that your CD keys will be quite long when using private / public keys 1024-bit in size. You also need to choose a nonce long enough so you aren't encrypting a trivial amount of information.
The up side is that this method will work without "activation" and you can use things like an email address or licensee name as the nonce.
I realize that this answer is about 10 years late to the party.
A good software license key/serial number generator consists of more than just a string of random characters or a value from some curve generator. Using a limited alphanumeric alphabet, data can be embedded into a short string (e.g. XXXX-XXXX-XXXX-XXXX) that includes all kinds of useful information such as:
Date created or the date the license expires
Product ID, product classification, major and minor version numbers
Custom bits like a hardware hash
Per-user hash checksum bits (e.g. the user enters their email address along with the license key and both pieces of information are used to calculate/verify the hash).
The license key data is then encrypted and then encoded using the limited alphanumeric alphabet. For online validation, the license server holds the secrets for decrypting the information. For offline validation, the decryption secret(s) are included with the software itself along with the decryption/validation code. Obviously, offline validation means the software isn't secure against someone making a keygen.
Probably the hardest part about creating a license key is figuring out how to cram as much data as possible into as few bytes as possible. Remember that users will be entering in their license keys by hand, so every bit counts and users don't want to type extremely long, complex strings in. 16 to 25 character license keys are the most common and balance how much data can be placed into a key vs. user tolerance for entering the key to unlock the software. Slicing up bytes into chunks of bits allows for more information to be included but does increase code complexity of both the generator and validator.
Encryption is a complex topic. In general, standard encryption algorithms like AES have block sizes that don't align with the goal of keeping license key lengths short. Therefore, most developers making their own license keys end up writing their own encryption algorithms (an activity which is frequently discouraged) or don't encrypt keys at all, which guarantees that someone will write a keygen. Suffice it to say that good encryption is hard to do right and a decent understanding of how Feistel networks and existing ciphers work are prerequisites.
Verifying a key is a matter of decoding and decrypting the string, verifying the hash/checksum, checking the product ID and major and minor version numbers in the data, verifying that the license hasn't expired, and doing whatever other checks need to be performed.
Writing a keygen is a matter of knowing what a license key consists of and then producing the same output that the original key generator produces. If the algorithm for license key verification is included in and used by the software, then it is just a matter of creating software that does the reverse of the verification process.
To see what the entire process looks like, here is a blog post I recently wrote that goes over choosing the license key length, the data layout, the encryption algorithm, and the final encoding scheme:
https://cubicspot.blogspot.com/2020/03/adventuring-deeply-into-software-serial.html
A practical, real-world implementation of the key generator and key verifier from the blog post can be seen here:
https://github.com/cubiclesoft/php-misc/blob/master/support/serial_number.php
Documentation for the above class:
https://github.com/cubiclesoft/php-misc/blob/master/docs/serial_number.md
A production-ready open source license server that generates and manages license keys using the above serial number code can be found here:
https://github.com/cubiclesoft/php-license-server
The above license server supports both online and offline validation modes. A software product might start its existence with online only validation. When the software product is ready to retire and no longer supported, it can easily move to offline validation where all existing keys continue to work once the user upgrades to the very last version of the software that switches over to offline validation.
A live demo of how the above license server can be integrated into a website to sell software licenses plus an installable demo application can be found here (both the website and demo app are open source too):
https://license-server-demo.cubiclesoft.com/
Full disclosure: I'm the author of both the license server and the demo site software.
There are also DRM behaviors that incorporate multiple steps to the process. One of the most well known examples is one of Adobe's methods for verifying an installation of their Creative Suite. The traditional CD Key method discussed here is used, then Adobe's support line is called. The CD key is given to the Adobe representative and they give back an activation number to be used by the user.
However, despite being broken up into steps, this falls prey to the same methods of cracking used for the normal process. The process used to create an activation key that is checked against the original CD key was quickly discovered, and generators that incorporate both of the keys were made.
However, this method still exists as a way for users with no internet connection to verify the product. Going forward, it's easy to see how these methods would be eliminated as internet access becomes ubiquitous.
All of the CD only copy protection algorithms inconvience honest users while providing no protection against piracy whatsoever.
The "pirate" only need to have access to one legitimate cd and its access code, he can then make n copies and distribute them.
It does not matter how cryptographically secure you make the code, you need to supply this with the CD in plain text or an legitimate user cannot activite the software.
Most secure schemes involve either the user providing the software supplier with some details of the machine which will run the software (cpu serial numbers, mac addresses, Ip address etc.), or, require online access to register the software on the suppliers website and in return receive an activitation token. The first option requires a lot of manual administration and is only worth it for very high value software, the, second option can be spoofed and is absolutly infuriating if you have limited network access or you are stuck behind a firewall.
On the whole its much easier to establish a trust relationship with your customers!
You can use and implement Secure Licensing API from very easily in your Software Projects using it,(you need to download the desktop application for creating secure license from https://www.systemsoulsoftwares.com/)
Creates unique UID for client software based on System Hardware(CPU,Motherboard,Hard-drive)
(UID acts as Private Key for that unique system)
Allows to send Encrypted license string very easily to client system, It verifies license string and works on only that particular system
This method allows software developers or company to store more information about software/developer/distributor services/features/client
It gives control for locking and unlocked the client software features, saving time of developers for making more version for same software with changing features
It take cares about trial version too for any number of days
It secures the License timeline by Checking DateTime online during registration
It unlocks all hardware information to developers
It has all pre-build and custom function that developer can access at every process of licensing for making more complex secure code
I need to delete my input file securely once I have finished with it, at the moment I'm overwriting all the data with zero, this is messy as my temp folder becomes full of old files also the name of the files is a security issue.
Rather than just moving them to the recycle bin I would like them to skip it and just disappear, this is in conjunction with being wiped byte wise as data recovery software can recover items from beyond the recycle bin. As the name is also important I need to rename them before I delete them.
This is a progressive problem. What is "secure" for one application is insecure for another. If security is really important and you find yourself asking these kinds of questions on Stack Overflow, then most likely need to contract with an external security consultant. Examples of really important include financial information, medical records, or anything else where there is a law or contract requiring the securing of the data. I don't say this to be mean or imply that you are incapable of solving the problem, but to point out that this is a rather complex and evolving problem.
Basically to accomplish what you want to accomplish:
Once your code you wrote finishes then change the file size to empty - this makes recovery more difficult because the original file size is lost.
Then rename the file (RenameFile)to a different name.
Finally delete the file using DeleteFile, which does not move the file to the recycle bin.
Make sure you maintain an exclusive handle on the files the whole time they are on the disk too, or they can just be copied before they are deleted.
As I said, this is a progressive problem. This is a really basic solution, and is subject to a number of vulnerabilities. So depending on the level of security needed you might consider never letting the file be written to disk, or using multiple pass overwrites. If security is really important, then actually burning the hard drive platter at a high temperature, and then smashing it is the only way to be sure.
Edit: It appears you removed your code sample.
There are third-party utilities to do this kind of thing from the command - I found PGP Command Line has this feature, if you search around you can probably find a free app that will do this from the command line. You could then just call the command from your app in order to securely delete the file.
I would say that if you are insistent upon writing your own code to do this, then instead of using all 0's, write random bytes to the disk. And don't use the built-in c++ rand function, use a more secure random number generator.
As Jim McKeeth said, this is not something you want to do yourself if there are serious legal repercussions for getting it wrong.
Jim has described well the issues with solving your problem in code. The problem is indeed progressive, and any solution you implement will only approximate complete security without ever attaining it. So one thing to do is to decide exactly what you need to protect the file against (snooping family members? co-workers? corporate espionage? totalitarian governments?), then design your solution accordingly and document its limitations.
I have a sort of an orthogonal suggestion though. Instead of - or in addition to - implementing secure wiping in code, you can require cooperation from users. For example, you can suggest (or require) that input files be stored on an encrypted volume. In corporate environments PGP Disk might be preferred, since it's a recognizable brand, while home users would be well served to use the free and well-tested TrueCrupt. Both products support creating virtual encrypted volumes as well as encrypting whole partitions. This would go a great length to keeping the names and contents of input files secure, even before you write a single line of code.
Deleting a file can be touchy subject...
Depending on the need of your customer I would like to point to the Data remanence phenomenon. Which is residual data left after a simple overwrite. Data erasure is a method of destroying the residual data.
There are a few standards on how to erase the residual data, DoD 5220.22-M is mostly referred to by "secure file delete" applications, but apparently the rules have changed.
As of the June 2007 edition of the DSS
C&SM, overwriting is no longer
acceptable for sanitization of
magnetic media; only degaussing or
physical destruction is acceptable.
So what I'm saying is, try to get the rules which your customer has to follow.
Beware of "wear leveling" algorithms used with flash storage. To promote even wear, files are moved around on the drive, and it's invisible to your app, and even the operating system. So you can "secure delete" the file all you want, and you will only affect the most recent copy of the file. But prior copies are recoverable/discoverable with recovery software. So the only way to solve that, is to encrypt the file contents.