How difficult/easy is it to decode a custom base 62 encoding?

How difficult/easy is it to decode a custom base 62 encoding? - security

Here's what I am going to do to obfuscate database id's in permalinks:
1) XOR the id with a lengthy secret key
2) Scramble (rotate, flip, reverse) bits around a little in the XOR'ed integer in a reversible way
3) Base 62 encode the resulting integer with my own secret scrambled up sequence of all
alphanumeric characters (A-Za-z0-9)
How difficult would it be to convert my Base 62 encoding back to base 10?
Also How difficult is it to reverse engineer the whole process? (obviously without taking a peak at source or compiled code) I know 'only XOR' is pretty susceptible to basic analysis.
EDIT: the result should be not more than 8-9 chars long, 3DES and AES seem to produce very long encrypted texts and can't be practically used in URLs
Resulting strings look something like:
In [2]: for i in range(1, 11):
print code(i)
...:
9fYgiSHq
MdKx0tZu
vjd0Dipm
6dDakK9x
Ph7DYBzp
sfRUFaRt
jkmg0hl
dBbX9nHk4
ifqBZwLW
WdaQE630
As you can see 1 looks nothing like 2 so this seems to works great for obfuscation of id's.

If the attacker is allowed to play around with the input, it will be trivial for a skilled attacker to "decrypt" the data. A crucial property of modern crypto systems is the "Avalanche effect" which your system lacks. Basically it means that every bit of the output is connected with every bit of the input.
If an attacker of your system is allowed to see that, for example, id = 1000 produces the output "AAAAAA" and id=1001 produces "ABAAA" and id=1002 produces "ACAAA" the algorithm can be easily reversed, and the value of the key obtained.
That said, this question is a better fit for https://security.stackexchange.com/ or https://crypto.stackexchange.com/

The standard advice for anyone trying to develop their own cryptography is, "Don't". The advanced advice is to read Bruce Schneier's Memo to the Amateur Cipher Designer and then don't.
You are not the first person to need to obfuscate IDs, so there are already methods available. #CodesInChaos suggested a good method above; you should try that first to see if it meets your needs.

Related

Program to encrypt / decrypt text string in Assembly MIPS

I want to create a program that reads in input a string of characters, and through a predefined action (I was thinking of a sum with an integer randomly generated) encrypts the string by returning the encrypted string and the key to decode it in a second moment.
Could you give me any suggestions on how to treat the string?
I would like to do so :
li $v0,8
la $a0,buffer
li $a1,1024
syscall
move $s7,$a0
This is the code to read the string.
After that I want to do:
add $t0,$s5,$s3
When I add a random generated integer to the register contain the string.
After knowing the values of the random number and the sum, I can again get the original string with a subtraction.
Is it a proper method?

That depends somewhat on the purpose of the encryption. As I understand this, the approach you're suggesting is basically a form of a Caesar Cipher. While this will protect your string to some degree against casual observers, it will definitely not be suitable for serious security purposes. It is subject to brute-force attacks, known-plaintext and chosen-plaintext attacks, and frequency analysis.
The idea behind a brute-force attack is that, for any given string of a reasonable length, there will almost always be exactly one shift that will make the string make sense, so an attacker could repeatedly try different shifts until he found the one that made the string make sense. The first shift that makes the string make sense is is almost certainly the correct shift.
If you're doing a "classical" Caesar cipher (e.g. C = A, D = B, E = C, etc.), there are only 25 possible shifts, so on average an attacker could guess the plaintext in 12.5 guesses (and 25 guesses in the worst case). In a scheme like yours you'd have to use a very large range of enormous numbers in order to be able to defend against this even slightly. For example, if you were only doing shifts of between 1 - 100 an attacker could reconstruct the plaintext in an average of 50 guesses (and 100 guesses in the worst case), which is obviously not a defense against a motivated attacker, especially since this task lends itself to easy parallelization. Assuming I did my math right, even if you had a trillion possible shifts and it took 100 operations to do and test a particular shift, you could try all of them in under 7 seconds on an Intel i7 if I did my math right and, on average, it would take less than 3.5 seconds to find the correct answer using brute force.
The idea behind frequency analysis is that your text retains the same statistical characteristics as the host language. For example, in English the most common letter is "e," so if you find the most common letter in your ciphertext it probably corresponds to "e." You can then work out how much you shifted the string to get that particular output. For example, if "g" is the most frequent letter in the ciphertext, you can guess that g = e and that they therefore must have shifted the text over by two.
A known-plaintext attack is where an attacker has an example of both the plaintext and its corresponding ciphertext and they can use that information to reconstruct what the key must have been. A chosen-plaintext attack is basically the same thing except that the attacker gets to choose which plaintext he sees the corresponding ciphertext for. (Note that this is only a problem if you're reusing keys, especially if you're doing so in a predictable manner; if you never reuse keys reconstructing the key for the known/chosen plaintext won't give the attacker any information about the key you used for other messages).
I've never tried doing this in assembly language to tell the truth but if you want good security you might want to consider AES. If you're really interested in simplicity of implementation and are willing to go with something less secure, you might also go with XTEA.

Find substring in stream without storing substring in plain text

Lets say I have a large stream of data (for example packets coming in from a network), and I want to determine if this data contains a certain substring. There are multiple string searching algorithms, but they require the algorithm to know the plain text string they are searching for.
Lets say, the string being sought is a password, and you do not want to store it in plain text in this search application. It would however appear in the stream as plain text. You could for example, store the hash and length of the password. Then for every byte in the stream check if the next length byte data from the stream hash to the password hash you have a probable match.
That way you can determine if the password was in the stream, without knowing the password. However, hashing once for every byte is not fast/efficient.
Is there perhaps a clever algorithm that could find the plain text password in the stream, without directly knowing the plain text password (and instead some non-reversible equivalent). Alternatively could a low quality version of the password be used, with the risk of false positives? For example, if the search application only knew half the password (in plain text), it could with some error detect the full password without knowing it.
thanks
P.S This question comes from a hypothetical discussion I had with some friends, about alerting you if your password was spotted in plain text on a network.

You could use a low-entropy rolling hash to pre-screen each byte so that, for the cost of lg k bits of entropy, you reduce the number of invocations of the cryptographic hash by a factor of k.

SAT is an NP-hard problem. Suppose your password is n characters long. If you could find a way to make a large enough SAT instance that
used a contiguous sequence of m >= n bytes from the data stream as its 8m input bits, and
produced the output 1 if and only if the bits present at its inputs contains your password starting at an offset that is some multiple of 8 bits
then by "operating" this SAT instance as a circuit, you would have a password detector that is (at least potentially) very difficult to "invert".
In some ways, what you want is the opposite of Boolean logic minimisation. You want the biggest, hairiest circuit (ideally for some theoretically justified notions of size and hairiness :) ) that computes the truth table. It's easy enough to come up with truth-table-preserving ways to grow the original CNF propositional logic formula -- e.g., if you have two clauses A and B, then you can always safely add a new clause consisting of all the literals in either A or B -- but it's probably much harder to come up with ways to grow the formula in ways that will confuse a modern SAT solver, since a lot of research has gone into making these programs super-efficient at detecting and exploiting all kinds of structure in the problem.
One possible avenue for injecting "complications" is to make the circuit compute functions that are difficult for circuits to compute, like divisions or square roots, and then test the results of these for equality in addition to the raw inputs. E.g., instead of making the circuit merely test that X[1 .. 8n] = YOUR_PASSWORD, make it test that X[1 .. 8n] = YOUR_PASSWORD AND sqrt(X[1 .. 8n]) = sqrt(YOUR_PASSWORD). If a SAT solver is smart enough to "see" that the first test implies the second then it can immediately dispense with all the clauses corresponding to the second -- but since everything is represented at a very low level with propositional clauses, this relationship is (I hope; as I said, modern SAT solvers are pretty amazing) well obscured. My guess is that it's better to choose functions like sqrt() that are not one-to-one on integers: this will potentially cause a SAT solver to waste time exploring seemingly promising (but ultimately incorrect) solutions.

Is there an encryption technique that could turn an 8-digit number into something 10 or 11 digits or less?

Many of the encryption techniques I've seen can easily encrypt a simple 8 digit number like "12345678" but the result is often something like "8745b34097af8bc9de087e98deb8707aac8797d097f" (made up but you get the idea).
Is there a way to encrypt this 8 digit number but have the resulting encrypted value be the same or at least only a slightly longer number? An ideal target would be to end up with a 10 digit number or less. Is this possible while still maintaining a fairly strong encryption?
Update: I didn't make the output clear enough - I am wanting an 8-digit number to turn into an 8-digit number, not 8 bytes.

A lot here is going to depend on how seriously you mean your "public-key-encryption" tag. Do you actually want public key encryption, or are you just taking that possibility into account?
If you're willing to use symmetric encryption, producing 8 bytes of output from 8 bytes of input is pretty easy: just run 3DES in ECB (Electronic Code Book) mode, and that's what you'll get. The main weakness of ECB is that a given input will always produce the same result, so if your inputs might repeat an attacker will be able to see that repetition, and may be able to notice a pattern of "encrypted value X leads to action Y", even if they can't/don't break the encryption itself at all. If you can live with that, 3DES/ECB is probably your answer.
If you can't live with that, 3DES in CFB mode is probably the next best. This will produce 16 bytes of output from 8 bytes of input (note that it's not normally doubling the input size, but adding 8 bytes to the input size).
3DES is hardly what anybody would call a cutting edge algorithm, but I'd say it still qualifies as "fairly strong encryption". Part of its weakness as an algorithm stems from its relatively small block size, but that also minimizes expansion of the output.
Edit: Sorry, I forgot to the public-key possibility. With most public-key cryptography, the smallest result is roughly equal to the key size. With RSA encryption, that'll typically mean a minimum of something like 1024 bits (and often considerably more than that). To keep the result smaller, I'd probably use Elliptical Curve Cryptography, for which a ~200 bit key is reasonably secure against known attacks. This will still be larger than 3DES/CFB, but not outrageously so.

Well you could look a stream cipher which encrypts bytes 1:1. With N bytes input, there are N bytes encrypted/decrypted output. Such ciphers are usually based on an algorithm that creates a stream of random numbers, with the encryption key/IV acting as seed.
For some stream ciphers, look at the eSTREAM candidates. I don't know of any relevant attacks on HC-128 and HC-256, for example.

Random access encryption with AES In Counter mode using Fortuna PRNG:

I'm building file-encryption based on AES that have to be able to work in random-access mode (accesing any part of the file). AES in Counter for example can be used, but it is well known that we need an unique sequence never used twice.
Is it ok to use a simplified Fortuna PRNG in this case (encrypting a counter with a randomly chosen unique key specific to the particular file)? Are there weak points in this approach?
So encryption/decryption can look like this
Encryption of a block at Offset:
rndsubseq = AESEnc(Offset, FileUniqueKey)
xoredplaintext = plaintext xor rndsubseq
ciphertext = AESEnc(xoredplaintext, PasswordBasedKey)
Decryption of a block at Offset:
rndsubseq = AESEnc(Offset, FileUniqueKey)
xoredplaintext = AESDec(ciphertext, PasswordBasedKey)
plaintext = xoredplaintext xor rndsubseq
One observation. I came to the idea used in Fortuna by myself and surely discovered later that it is already invented. But as I read everywhere the key point about it is security, but there's another good point: it is a great random-access pseudo random numbers generator so to speak (in simplified form). So the PRNG that not only produces very good sequence (I tested it with Ent and Die Hard) but also allow to access any sub-sequence if you know the step number. So is it generally ok to use Fortuna as a "Random-access" PRNG in security applications?
EDIT:
In other words, what I suggest is to use Fortuna PRNG as a tweak to form a tweakable AES Cipher with random-access ability. I read the work of Liskov, Rivest and Wagner, but could not understand what was the main difference between a cipher in a mode of operation and a tweakable cipher. They said they suggested to bring this approach from high level inside the cipher itself, but for example in my case xoring the plain text with the tweak, is this a tweak or not?

I think you may want to look up how "tweakable block ciphers" work and have a look at how the problem of disc encryption is solved: Disk encryption theory. Encrypting the whole disk is similar to your problem: encryption of each sector must be done independently (you want independent encryption of data at different offsets) and yet the whole thing must be secure. There is a lot of work done on that. Wikipedia seems to give a good overview.
EDITED to add:
Re your edit: Yes, you are trying to make a tweakable block cipher out of AES by XORing the tweak with the plaintext. More concretely, you have Enc(T,K,M) = AES (K, f(T) xor M) where AES(K,...) means AES encryption with the key K and f(T) is some function of the tweak (in your case I guess it's Fortuna). I had a brief look at the paper you mentioned and as far as I can see it's possible to show that this method does not produce a secure tweakable block cipher.
The idea (based on definitions from section 2 of the Liskov, Rivest, Wagner paper) is as follows. We have access to either the encryption oracle or a random permutation and we want to tell which one we are interacting with. We can set the tweak T and the plaintext M and we get back the corresponding ciphertext but we don't know the key which is used. Here is how to figure out if we use the construction AES(K, f(T) xor M).
Pick any two different values T, T', compute f(T), f(T'). Pick any message M and then compute the second message as M' = M xor f(T) xor f(T'). Now ask the encrypting oracle to encrypt M using tweak T and M' using tweak T'. If we deal with the considered construction, the outputs will be identical. If we deal with random permutations, the outputs will be almost certainly (with probability 1-2^-128) different. That is because both inputs to the AES encryptions will be the same, so the ciphertexts will be also identical. This would not be the case when we use random permutations, because the probability that the two outputs are identical is 2^-128. The bottom line is that xoring tweak to the input is probably not a secure method.
The paper gives some examples of what they can prove to be a secure construction. The simplest one seems to be Enc(T,K,M) = AES(K, T xor AES(K, M)). You need two encryptions per block, but they prove the security of this construction. They also mention faster variants, but they require additional primitive (almost-xor-universal function families).

Even though I think your approach is secure enough, I don't see any benefits over CTR. You have the exact same problem, which is you don't inject true randomness to the ciphertext. The offset is a known systematic input. Even though it's encrypted with a key, it's still not random.
Another issue is how do you keep the FileUniqueKey secure? Encrypted with password? A whole bunch issues are introduced when you use multiple keys.
Counter mode is accepted practice to encrypt random access files. Even though it has all kinds of vulnerabilities, it's all well studied so the risk is measurable.

What's wrong with XOR encryption?

I wrote a short C++ program to do XOR encryption on a file, which I may use for some personal files (if it gets cracked it's no big deal - I'm just protecting against casual viewers). Basically, I take an ASCII password and repeatedly XOR the password with the data in the file.
Now I'm curious, though: if someone wanted to crack this, how would they go about it? Would it take a long time? Does it depend on the length of the password (i.e., what's the big-O)?

The problem with XOR encryption is that for long runs of the same characters, it is very easy to see the password. Such long runs are most commonly spaces in text files. Say your password is 8 chars, and the text file has 16 spaces in some line (for example, in the middle of ASCII-graphics table). If you just XOR that with your password, you'll see that output will have repeating sequences of characters. The attacker would just look for any such, try to guess the character in the original file (space would be the first candidate to try), and derive the length of the password from length of repeating groups.
Binary files can be even worse as they often contain repeating sequences of 0x00 bytes. Obviously, XORing with those is no-op, so your password will be visible in plain text in the output! An example of a very common binary format that has long sequences of nulls is .doc.

I concur with Pavel Minaev's explanation of XOR's weaknesses. For those who are interested, here's a basic overview of the standard algorithm used to break the trivial XOR encryption in a few minutes:
Determine how long the key is. This
is done by XORing the encrypted data
with itself shifted various numbers
of places, and examining how many
bytes are the same.
If the bytes that are equal are
greater than a certain percentage
(6% according to Bruce Schneier's
Applied Cryptography second
edition), then you have shifted the
data by a multiple of the keylength.
By finding the smallest amount of
shifting that results in a large
amount of equal bytes, you find the
keylength.
Shift the cipher text by the
keylength, and XOR against itself.
This removes the key and leaves you
with the plaintext XORed with the
plaintext shifted the length of the
key. There should be enough
plaintext to determine the message
content.
Read more at Encryption Matters, Part 1

XOR encryption can be reasonably* strong if the following conditions are met:
The plain text and the password are about the same length.
The password is not reused for encrypting more than one message.
The password cannot be guessed, IE by dictionary or other mathematical means. In practice this means the bits are randomized.
*Reasonably strong meaning it cannot be broken by trivial, mathematical means, as in GeneQ's post. It is still no stronger than your password.

In addition to the points already mentioned, XOR encryption is completely vulnerable to known-plaintext attacks:
cryptotext = plaintext XOR key
key = cryptotext XOR plaintext = plaintext XOR key XOR plaintext
where XORring the plaintexts cancel each other out, leaving just the key.
Not being vulnerable to known-plaintext attacks is a required but not sufficient property for any "secure" encryption method where the same key is used for more than one plaintext block (i.e. a one-time pad is still secure).

Ways to make XOR work:
Use multiple keys with each key length equal to a prime number but never the same length for keys.
Use the original filename as another key but remember to create a mechanism for retrieving the filename. Then create a new filename with an extension that will let you know it is an encrypted file.
The reason for using multiple keys of prime-number length is that they cause the resulting XOR key to be Key A TIMES Key B in length before it repeats.
Compress any repeating patterns out of the file before it is encrypted.
Generate a random number and XOR this number every X Offset (Remember, this number must also be recreatable. You could use a RANDOM SEED of the Filelength.
After doing all this, if you use 5 keys of length 31 and greater, you would end up with a key length of approximately One Hundred Meg!
For keys, Filename being one (including the full path), STR(Filesize) + STR(Filedate) + STR(Date) + STR(Time), Random Generation Key, Your Full Name, A private key created one time.
A database to store the keys used for each file encrypted but keep the DAT file on a USB memory stick and NOT on the computer.
This should prevent the repeating pattern on files like Pictures and Music but movies, being four gigs in length or more, may still be vulnerable so may need a sixth key.
I personally have the dat file encrypted itself on the memory stick (Dat file for use with Microsoft Access). I used a 3-Key method to encrypt it cause it will never be THAT large, being a directory of the files with the associated keys.
The reason for multiple keys rather than randomly generating one very large key is that primes times primes get large quick and I have some control over the creation of the key and you KNOW that there really is no such thing as a truely random number. If I created one large random number, someone else can generate that same number.
Method to use the keys: Encrypt the file with one key, then the next, then the next till all keys are used. Each key is used over and over again till the entire file is encrypted with that key.
Because the keys are of different length, the overlap of the repeat is different for each key and so creates a derived key the length of Key one time Key two. This logic repeats for the rest of the keys. The reason for Prime numbers is that the repeating would occur on a division of the key length so you want the division to be 1 or the length of the key, hense, prime.
OK, granted, this is more than a simple XOR on the file but the concept is the same.
Lance

I'm just protecting against casual viewers
As long as this assumption holds, your encryption scheme is ok. People who think that Internet Explorer is "teh internets" are not capable of breaking it.
If not, just use some crypto library. There are already many good algorithms like Blowfish or AES for symmetric crypto.

The target of a good encryption is to make it mathematically difficult to decrypt without the key.
This includes the desire to protect the key itself.
The XOR technique is basically a very simple cipher easily broken as described here.
It is important to note that XOR is used within cryptographic algorithms.
These algorithms work on the introduction of mathematical difficulty around it.

Norton's Anti-virus used to use a technique of using the previous unencrypted letter as the key for next letter. That took me an extra half-hour to figure out, if I recall correctly.
If you just want to stop the casual viewer, it's good enough; I've used to hide strings within executables. It won't stand up 10 minutes to anyone who actually tries, however.
That all said, these days there are much better encryption methods readily available, so why not avail yourself of something better. If you are trying to just hide from the "casual" user, even something like gzip would do that job better.

Another trick is to generate a md5() hash for your password. You can make it even more unique by using the length of the protected text as an offset or combining it with your password to provide better distribution for short phrases. And for long phrases, evolve your md5() hash by combining each 16-byte block with the previous hash -- making the entire XOR key "random" and non-repetitive.

RC4 is essentially XOR encryption! As are many stream ciphers - the key is the key (no pun intended!) you must NEVER reuse the key. EVER!

I'm a little late in answering, but since no one has mentioned it yet: this is called a Vigenère cipher.
Wikipedia gives a number of cryptanalysis attacks to break it; even simpler, though, since most file-formats have a fixed header, would be to XOR the plaintext-header with the encrypted-header, giving you the key.

That ">6%" GeneQ mentions is the index of coincidence for English telegraph text - 26 letters, with punctuation and numerals spelled out. The actual value for long texts is 0.0665.
The <4% is the index of coincidence for random text in a 26-character alphabet, which is 1/26, or 0.385.
If you're using a different language or a different alphabet, the specific values will different. If you're using the ASCII character set, Unicode, or binary bytes, the specific values will be very different. But the difference between the IC of plaintext and random text will usually be present. (Compressed binaries may have ICs very close to that of random, and any file encrypted with any modern computer cipher will have an IC that is exactly that of random text.)
Once you've XORed the text against itself, what you have left is equivalent to an autokey cipher. Wikipedia has a good example of breaking such a cipher
http://en.wikipedia.org/wiki/Autokey_cipher

If you want to keep using XOR you could easily hash the password with multiple different salts (a string that you add to a password before hashing) and then combine them to get a larger key.
E.G. use sha3-512 with 64 unique salts, then hash your password with each salt to get a 32768 bit key that you can use to encrypt a 32Kib (Kilibit) (4KiB (kilibyte)) or smaller file. Hashing this many times should be less than a second on a modern CPU.
for something more secure you could try manipulating your key during encryption like AES (Rijndael). AES actually does XOR times and modifies the key each repeat of the key using a switch table. It became an internation standard so its quite secure.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string