SMS encryption/decryption in J2ME - java-me

I want to encrypt and decrypt SMS with J2ME, i tried TEA algorithm, but it generates a long encrypted text.
Is there any other lightweight algorithm that generates an encrypted text with the same length as original text?

Check stream chiphers (like RC4). Those won't increase space need. Encrypted data has same length than plain data.
If you want to encrypt 7-bit text message witch can be 160 chars. Then those does not help.
But if maximum length of plain text is 140 bytes, then those can be used.

Stream encryption does work for this, but even stream encryption is only working on bits. So you need to compress the text to fit in as few bits as possible. Using deflation would work in most cases, but may fail on shorter messages or messages that contain uncompressable data.
Another way is to compress your message into as few bits as possible yourself. E.g. the numbers 0..9 are ASCII 30h to 39h. But they may also be stored in binary form 0000 to 1001 binary. Packed BCD would be a good option for encoding numbers. Mathematic majors would immediately see you can use groups and division for minimum storage requirements, but that may add quite a lot of CPU resources (never mind the fun developing that solution). Once you have the shortest binary representation you encrypt with the stream cipher.
So now you have this nice short binary representation. Hopefully it will fit into the SMS message. If not, you need to encode the resulting binary representation.
Basically, what you need to do is:
send = Send(EncodeBin(Encrypt(EncodeTxt(plain))))
then
plain = DecodeTxt(Decrypt(DecodeBin(Receive(send))))
But don't let that stop you :)

Related

What is the meaning of the name Lucky Thirteen?

From the article Lucky thirteen: Breaking the TLS and DTLS record protocols‏:
The details of which specific attacks are possible depends on the
exact size of MAC tags output by the MAC algorithm negotiated by the
Handshake Protocol, and also on the fact that the exactly 13 bytes of
header data are incorporated in the MAC calculation (hence our title).
In addition, I read on the site of The Royal Holloway, University of London:
The fact that the TLS MAC calculation includes 13 bytes of header
information (5 bytes of TLS header plus 8 bytes of TLS sequence
number) is, in part, what makes the attacks possible.
As I understand, the attack is based on the padding mechanism, on the fact that CBC mode of operation is used and on the difference in the times of MAC calculation (and the compression function). I could not figure out how the size of the MAC header affects.
Can anyone please explain what is the meaning of the name Lucky Thirteen?
Thank you.
META: this is not a programming question and would fit much better on security.SX where we already have Qs on related attacks like BEAST and POODLE. I thought I remembered seeing one on Lucky-Thirteen but can't find any on searching, so I suggest migrating this.
Calling it 'lucky' is as they say 'what passes for humour amongst cryptographers' but the importance of the pseudoheader being 13 bytes is outlined in the paragraph before the one you quoted:
for certain carefully chosen message lengths and when the HMAC-SHA1 MAC algorithm is used, then TLS messages containing at least two bytes of correct padding will be processed slightly faster than TLS messages containing one byte of correct padding or padding that is incorrectly formatted.
and detailed in section 4.2 of the paper: when using a CBC+HMAC-SHA1 ciphersuite if the attacker systematically tampers a 64-byte (not including IV) ciphertext:
when the (tampered) decryption ends with valid 2-byte or larger padding, HMAC is performed on data consisting of 64-2-20+13=55 bytes or less (and >2 padding -> <55 HMAC quickly becomes very unlikely);
otherwise HMAC is performed on 56 or 57 bytes.
Because of the MD-padding done by SHA-1 (see 2.1) the latter requires one more compression function than the former, and it is the time for that additional compression function they statistically enhance and detect.
This gives a padding oracle from which plaintext can be recovered.
The 'luckiness' of 13 here is that 13 plus 9 is only slightly more than 20.
As they note in 4.3, 12 would be even luckier, if SSL/TLS had been designed differently.

Using "seed" based math to recreate application instances

Okay so I was thinking today about Minecraft a game which so many of you are so familiar with, I'm sure and while my question isn't directly related to the game I find it much simply to describe my question using the game as an example.
My question is, is there any way a type of "seed" or string of characters can be used to recreate an instance of a program (not in the literal programming sense) by storing a code which when re-entered into this program as a string at run-time, could recreate the data it once held again, in fields, text boxes, canvases, for example, exactly as it was.
As I understand it, Minecraft takes the string of ASCII characters you enter, all which truly are numbers, and performs a series of operations on it which evaluate to some type of hash or number which is finite... this number (again as I understand) is the representation of that string you entered. So it makes sense that because a string when parsed by this algorithm will always evaluate to the same hash. 1 + 1 will always = 2 so a seeds value must always equal that seeds value in the end. And in doing so you have the ability to replicate exactly, worlds, by entering this sort of key which is evaluated the same on every machine.
Now, if we can exactly replicate worlds like this this is it possible to bring it into a more abstract concept like the following?...
Say you have an application, like Microsoft Word. Word saved the data you have entered as a file on your hard drive it holds formatting data, the strings you've entered, the format of the file... all that on a physical file... Now imagine if when you entered your essay into Word instead of saving it and bringing your laptop to school you instead click on parse and instead of creating a file, you are given a hash code... Now you goto school you know you have to print it. so you log onto the computer and open Word... Now instead of open there is an option now called evaluate you click it and enter the hash your other computer formulated and it creates the exact essay you have written.
Is this possible, and if so are there obvious implementations of this i simply am not thinking of or are just so seemingly part of everyday I don't think recognize it? And finally... if possible, what methods and algorithms would go into such a thing?
[EDIT]
I had to do some research on the anatomy of a seed and I think this explains it well
The limit is 32 characters or for a
numeric seed, 19 digits plus the minus sign.
Numeric seeds can range from -9223372036854775808 to
9223372036854775807 which is a total of 18446744073709551616 Text
strings entered will be "hashed" to one of the numeric seeds in the
above range. The "Seed for the World Generator" window only allows 32
characters to be entered and will not show or use any more than that."
BUT looking back on it lossless compression IS EXACTLY what I was
describing after re-reading the wiki page and remembering that (you
are very correct) the seed only partakes in the generation, the final
data is stores as a "physical" file on the HDD (which again, you are correct) is raw uncompressed data in a file
So in retrospect, I believe I was describing lossless compression, trying in my mind to figure out how the seed was able to replicate the exact same world, forgetting the seed was only responsible for generating the code, not the saving or compression of it.
So thank you for your help guys! It's really appreciated I believe we can call this one solved!
There are several possibilities to achieve this "string" that recovers your data. However they're not all applicable depending on the context.
An actual seed, which initializes for example a peudo-random number generator, then allows to recreate the same sequence of pseudo-random numbers (see this question).
This is possibly similar to what Minecraft relies on, because the whole process of how to create a world based on some choices (possibly pseudo-random choices) is known in advance. Even if we pretend that we have random numbers, computers are actually deterministic, which makes this possible.
If your document were generated randomly then this would be applicable: with the same seed, the same gibberish comes out.
Some key-value dictionary, or hash map. Then the values have to be accessible by both sides and the string is the key that allows to retrieve the value.
Think for example of storing your word file on an online server, then your key is the URL linking to your file.
Compressing all the information that is in your data into the string. This is much harder, and there are strong limits due to the entropy of the data. See Shannon's source coding theorem for example.
You would be better off (as in, it would be easier) to just compress your file with a usual algorithm (zip or 7z or something else), rather than reimplementing it yourself, especially as soon as your document starts having fancy things (different styles, tables, pictures, unusual characters...)
With the simple hypothesis of 27 possible characters (26 letters and the space), Shannon himself shows in Prediction and Entropy of Printed English (Bell System Technical Journal, 30: 1. January 1951 pp 50-64, online version) that there is about 2.14 bits of entropy per letter in English. That's about 550 characters encoded with your 32 character string.
While this is significantly better than the 8 bits we use for each ASCII character, it also shows it is very likely to be impossible to encode a document in English in less than a fourth of its size. Then you'd still have to add punctuation, and all the rest of the fuss.

How do computers process ascii/text & images/color differently?

I've recently been thinking more about what kind of work computer hardware has to do to produce the things we expect.
Comparing text and color, it seems that both rely on combinations of 1's and 0's with 256 possible combinations per byte. ASCII may represent a letter such as (01100001) to be the letter 'A'. But then there may be a color R(01100001), G(01100001), B(01100001) representing some random color. Considering on a low level, the computer is just reading these collections of 1's and 0's, what needs to happen to ensure the computer renders the color R(01100001), G(01100001), B(01100001) and not the letter A three times on my screen?
I'm not entirely sure this question is appropriate for Stack Overflow, but I'll go ahead and give a basic answer anyways. Though it's actually a very complicated question because depending on how deep you want to go into answering it I could write an entire book on computer architecture in order to do so.
So to keep it simple I'll just give you this: It's all a matter of context. First let's just tackle text:
When you open, say, a text editor the implicit assumption is the data to be displayed in it is textual in nature. The text to be displayed is some bytes in memory (possibly copied out of some bytes on disk). There's no magical internal context from the memory's point of view that these bytes are text. Instead, the source for text editor contains some commands that point to those bytes and say "these bytes represent 300 characters of text" for example. Then there's a complex sequence of steps involving library code all the way to hardware that handles mapping those bytes according to an encoding like ASCII (there are many other ways of encoding text) to characters, finding those characters in a font, writing that font to the screen, etc.
The point is it doesn't have to interpret those bytes as text. It just does because that's what a text editor does. You could hypothetically open it in an image program and tell it to interpret those same 300 bytes as a 10x10 array (or image) of RGB values.
As for colors the same logic applies. They're just bytes in memory. But when the code that's drawing something to the screen has decided what pixels it wants to write with what colors, it will pipe those bytes via a memory mapping to the video card which will then translate them to commands that are sent to the monitor (still in some binary format representing pixels and the colors, though the reality is a lot more complicated), and the monitor itself contains firmware that then handles the detail of mapping those colors to the physical pixels. The numbers that represent the colors themselves will at some point be converted to a specific current to each R/G/B channel to raise or lower its intensity.
That's all I have time for for now but it's a start.
Update: Just to illustrate my point, I took the text of Flatland from here. Which is just 216624 bytes of ASCII text (interpreted as such by your web browser based on context: the .txt extension helps, but the web server also provides a MIME type header informing the browser that it should be interpreted as plain text. Your browser might also analyze the bytes to determine that their pattern does look like that of plain text (and that there aren't an overwhelming number of bytes that don't represent ASCII characters). I appended a few spaces to the end of the text so that its length is 217083 which is 269 * 269 * 3 and then plotted it as a 269 x 269 RGB image:
Not terribly interesting-looking. But the point is that I just took those same exact bytes and told the software, "okay, these are RGB values now". That's not to say that looking at plain text bytes as images can't be useful. For example, it can be a useful way to visualize an encryption algorithm. This shows an image that was encrypted with a pretty insecure algorithm--you can still get a very good sense of the patterns of bytes in the original unencrypted file. If it were text and not an image this would be no different, as text in a specific language like English also has known statistical patterns. A good encryption algorithm would look make the encrypted image look more like random noise.
Zero and one are just zero and one, nothing more. A byte is just a collection of 8 bit.
The meaning you assign to information depends on what you need at the moment, what "language" you use to interpret your information. 65 is either letter A in ASCII or number 65 if you're using it in, say, int a = 65 + 3.
At low level, different (thousands of) machine instructions are executed to ensure that your data is treated properly, depending for example on the type of file you're reading, its headers, which process requests the data, and so on. The different high-level functions you use to treat different information expand to very different machine code.

QR code compression

Is it possible to store about 20 000 characters in QR code? (Or even more? http://blog.qr4.nl/page/QR-Code-Data-Capacity.aspx)
I would like to store only ascii simbols (chars and numbers with extra dash and so on).
As far as I know it's possible to compress not complext text with ratio 80-98% which sound promissing: http://www.maximumcompression.com/index.html
Do you have some more experience? Thanks for sharing!
If your question is: "Is it possible to store 20K characters in QR Code?", then the answer is yes, it is possible.
If your question is: "Is it possible to guarantee you'll always be able to store 20K characters in QR Code to compression?", the answer is no. There is no way to guarantee that, due to pigeonhole principle.
If your question is: "Is there a "comfortable zone" where it is highly likely that a text input, whose maximum size is 20K, will most probably fit into a QR Code?", the proper answer is: it depends on your input data. And a more risky answer is: if you're dealing with "normal text" data, such as a book content, you're probably asking for too much.
The 80-90% compression ratio you refer to is possible because input data is extremely large (several MB), and decompression algorithms are very slow. For a "small" input data, such as 20K characters, the compression ratio for a "normal text" will more likely be in the 50-70% range, depending on algorithm strength (PPM for example, is very suitable for such input data).
Obviously, if your input data is a kind of "log file", with a huge lot of repetitions, then yes, compression ratio > 95% is easily accessible.
But compression ratio is not the only thing to take into consideration. For "real-life" usage, you'll also have to consider the QR size, and a reasonable level of correction for the QR print to survive. Betting on "max capacity with lowest possible correction" is a fairly wrong bet, at least for real life scenarios. You'll have to ask around you to know what are the "reasonable limits" of your QR Code. Most probably, printing capabilities will get into the way, and you'll have to settle for something less than maximum.
Last point, don't forget that compressed data are "binary", not "alphanumeric". As a consequence, the final capacity of your QR Code is into the last column. Which is much less than the column "alphanumeric".
QR codes have a special encoding mode for alphanumeric data (upper-case only, plus digits and a few symbols). It uses less than 8 bits per character and can store 4,296 characters at most in this mode.
This ought to be close to optimal. For simpler data (like, all alpha), a compression algorithm like gzip might be able to achieve fewer bits per byte. Of course, no standard reader would interpret the gzipped payload as such. Only a special reader would be able to.
Can you get 5x more data into a QR code this way? No, almost surely not, unless it's a trivial case like 20,000 "a"s.
Even if you could, it would create a large complex QR code. Anything holding over a few hundred bytes gets hard to scan in practice. Version 40, the largest, is useless in the real world. Even version 20 is.
In practice, when you want to use a QR to store huge ammounts of data, you simply store a URL pointing to the location of the data.
What is theoretically possible is very different to what is actually possible when you have to support real-life devices. Good luck scanning anything above version 10 (57x57 modules) with a low-end smartphone camera.

What is the purpose of base 64 encoding and why it used in HTTP Basic Authentication?

I don't get the Base64 encryption.
If one can decrypt a Base64 string, what is it's purpose?
Why is it being used for HTTP Basic auth?
It's like telling to someone my password is reversed into OLLEH.
People seeing OLLEH will know the original password was HELLO.
Base64 is not encryption -- it's an encoding. It's a way of representing binary data using only printable (text) characters.
See this paragraph from the wikipedia page for HTTP Basic Authentication:
While encoding the user name and password with the Base64 algorithm typically makes them unreadable by the naked eye, they are as easily decoded as they are encoded. Security is not the intent of the encoding step. Rather, the intent of the encoding is to encode non-HTTP-compatible characters that may be in the user name or password into those that are HTTP-compatible.
It's normally called base64 encoding, not encryption! The nice thing about base64 encoding is it allows you to represent (binary) data using only a limited, common-subset of the available characters, far more efficiently than just writing a string of 1s and 0s as ASCII for example.
Encryption requires a key (string or algorithm) in order to decrypt; hence the "crypt" (root:cryptography)
Encoding modifies/shifts/changes a character code into another. In this case, usual bytes of data can now be easily represented and transported using HTTP.
Base-64 encoding is part of the MIME specifications. It provides a transport-safe encoding for data that won't get chewed on if/when it gets relayed through a host that uses a different encoding scheme than that used by the original client.
There are lots of different hosts out on the intertubes and you can't really assume support for anything other than 7-bit ASCII, without risking data loss/confusion.
IBM mainframes, for instance, use an encoding called EBCDIC (which comes in lots of different flavors). It's codepoints are completely different from the code points used by ASCII-based 'puters -- in ASCII, the letters A-Z are 0x41 - 0x5A; in EBCDIC the letters A - Z aren't even a contiguous range: the letters A-I live at 0xC1 - 0xC9, the letters J-R live at 0xD1 - 0xD9 and the letters S-Z live at 0xE2 - 0xE9.
You might mean "Base 64 Encoding". Encryption is not the same as encoding.
Wikipedia: Encryption
In everyday language, a “code” is something secret. In science and engineering, a code is simply an agreement, a set of rules, of how to write something.
That code may be secret. In that case, it’s called an encryption. But in general, a code is not secret. Take the genetic code. It simply states that our DNA is built from four different bases – A, C, G and T and that three bases taken together form one amino acid. There’s also a table of which three letters form which amino acid.
There’s nothing secret about this code.
Likewise, Base64 is not a secret code. Rather, it’s a code that allows storing data in six bits per character (thus there are 64 different entities, i.e. the “base” of the system is 64, just as the base of our decimal system is 10, since there are 10 different entities called “digits”).
By default, message header field parameters in Hypertext Transfer Protocol (HTTP) messages cannot carry characters outside the ISO- 8859-1 character set.
If user name and password contains incompatible charset than HTTP would not be able to carry those text. to prevent from this we encode user name and password with base64 to make sure we are sending HTTP compatible char over HTTP. for more information see this Basic_access_authentication

Resources