Help needed with perl crypt function - string

I am trying to generate a unique string/id given another relatively large string(consisting of a directory path name), thought of using crypt function. However, it's not working as expected, most probably due to my inability to understand.
here the code & output:
print "Enter a string:";
chomp(my $string = <STDIN>);
my $encrypted_string = crypt($string,'di');
print "\n the encrypted string is:$encrypted_string";
$ perl crypt_test
Enter a string:abcdefghi
the encrypted string is:dipcn0ADeg0Jc
$ perl crypt_test
Enter a string:abcdefgh
the encrypted string is:dipcn0ADeg0Jc
$ perl crypt_test
Enter a string:abcde
the encrypted string is:diGyhSp4Yvj4M
I couldn't understand why it returned the same encrypted string for the first two strings and differed for the third one. Note that salt is same for all.

The crypt(3) function only takes into account the first eight chars of the input string:
By taking the lowest 7 bits of each of the first eight characters of the key, a 56-bit key is obtained. This 56-bit key is used to encrypt repeatedly a constant string (usually a string con‐
sisting of all zeros). The returned value points to the encrypted password, a series of 13 print‐
able ASCII characters (the first two characters represent the salt itself).
So what you are seeing is by design - from perlfunc:
Creates a digest string exactly like the crypt(3) function in the C library


New cipher problem, how to decode given enciphered code?

While I am trying to solve Caesar cipher, I faced few problems.
#enciphered message = 'I wtvp olel decfnefcpd lyo lwrzctesxd'
plain = 'abcdefghijklmnopqrstuvwxyz'
cipher = 'lmnopqrstuvwxyzabcdefghijk'
cipher_text = input('Enter enciphered message: ')
clean_text = ' '
for i in cipher_text:
if i != " ":
clean_text = clean_text + plain[plain.index(cipher[(ord(i)-ord('a'))])]
clean_text = clean_text + " "
Above is the code, that I created and this is what I got as a result:
Enter enciphered message: I wtvp olel decfnefcpd lyo lwrzctesxd
n hega zwpw opnqypqnao wjz whcknepdio
Here are my related questions:
Why wasn't it decoded properly? Like, It should be 'I like data structures and algorithms'
I'm also confused about the capital "I" at the beginning of the enciphered message. Do you have any insight on that?
Finally I have no idea how to decode uppercase and lowercase at the same time; how should I do that?
(1) Why wasn't it decoded properly? Like, It should be 'I like data structures and algorithms'
First of all, the alphabet is already present in ASCII. So there is no need to redefine the alphabet in plain or cipher. The key is the offset from the plaintext character to the ciphertext character, wrapping around directly after the z.
So you generally convert from a character to an index in the alphabet from 0..25, then you add (for encryption) or subtract (for decryption) the key, modulo 26 - the size of the alphabet. Then, to get the result, you convert back into a character. You're already doing the conversion to index using ord(character) - ord(a). The opposite can be done using chr.
(2) I'm also confused about the capital "I" at the beginning of the enciphered message. Do you have any insight on that?
Well, there are more possibilities for single character words. The word A would be a prime suspect.
(3) Finally I have no idea how to decode uppercase and lowercase at the same time; how should I do that?
The best way is to create a variable that indicates that something is uppercase or not; is_uppercase would be a good name. Then convert the character to lowercase. Perform the encryption / decryption operation and then convert the resulting character back into uppercase, if required. That way your encryption / decryption operation is not affected at all and kept relatively simple.

How to convert an hex string into ascii string in Lua

I have no knowledge at all about LUA and I'm trying to craft a small script for nginx.
I'm using the following library ( to encrypt some data. Specifically I'm using the code for AES 256 CBC (SHA-512, salted) encryption and storing the hex-encoded encrypted string as shown in the example.
The issue now is that I need to get that hex string back to the decrypt method which expects an ASCII string.
This is an example of the encrypted hex string:
Just had to write one recently for pretty much same reason. Abuse gsub - capture each two chars and replace them with pre-calculated values from hexnumber->character map.
-- Needs to be only done once
local hex_to_char = {}
for idx = 0, 255 do
hex_to_char[("%02X"):format(idx)] = string.char(idx)
hex_to_char[("%02x"):format(idx)] = string.char(idx)
-- Sometime later
str = "fdbcc47fe5825d49ac3429d4f8408fa4b6528dd99d938f122ee7f00ab71ae0c5c73d29d4f54ea1fbefe706b5dca04f6b6c6b8b96d9807ef58eaba07c6c6cefaf6ad8673b43a4e243fb2912fb4ff93de6488c4795ebb09ecd7a40b7c9dc2003be4ff93425d2d74688208fa4d2a8d22f32490666550f4b01340de708d7aa5bc8468d171da400f59fcff4e7d371d7ab9b48fdfde29aefc0af78b2f934927a7713994c1e8f9435067c851efc5d300405c74d"
print(str:gsub("(..)", hex_to_char))

Perl's default string encoding and representation

In the following:
my $string = "Can you \x{FB01}nd my r\x{E9}sum\x{E9}?\n";
The x{FB01} and x{E9} are code points. And code points are encoded via an encoding scheme to a series of octets.
So the character è which has the codepoint \x{FB01} is part of the string of $string. But how does this work? Are all the characters in this sentence (including the ASCII ones) encoded via UTF-8?
If yes why do I get the following behavior?
my $str = "Some arbitrary string\n";
if(Encode::is_utf8($str)) {
print "YES str IS UTF8!\n";
else {
print "NO str IT IS NOT UTF8\n";
This prints "NO str IT IS NOT UTF8\n"
Additionally Encode::is_utf8($string) returns true.
In what way are $string and $str different and one is considered UTF-8 and the other not?
And in any case what is the encoding of $str? ASCII? Is this the default for Perl?
In C, a string is a collection of octets, but Perl has two string storage formats:
String of 8-bit values.
String of 72-bit values. (In practice, limited to 32-bit or 64-bit.)
As such, you don't need to encode code points to store them in a string.
my $s = "\x{2660}\x{2661}";
say length $s; # 2
say sprintf '%X', ord substr($s, 0, 1); # 2660
say sprintf '%X', ord substr($s, 1, 1); # 2661
(Internally, an extension of UTF-8 called "utf8" is used to store the strings of 72-bit chars. That's not something you should ever have to know except to realize the performance implications, but there are bugs that expose this fact.)
Encode's is_utf8 reports which type of string a scalar contains. It's a function that serves absolutely no use except to debug the bugs I previously mentioned.
An 8-bit string can store the value of "abc" (or the string in the OP's $str), so Perl used the more efficient 8-bit (UTF8=0) string format.
An 8-bit string can't store the value of "\x{2660}\x{2661}" (or the string in the OP's $string), so Perl used the 72-bit (UTF8=1) string format.
Zero is zero whether it's stored in a floating point number, a signed integer or an unsigned integer. Similarly, the storage format of strings conveys no information about the value of the string.
You can store code points in an 8-bit string (if they're small enough) just as easily as a 72-bit string.
You can store bytes in a 72-bit string just as easily as an 8-bit string.
In fact, Perl will switch between the two formats at will. For example, if you concatenate $string with $str, you'll get a string in the 72-bit format.
You can alter the storage format of a string with the builtins utf8::downgrade and utf8::upgrade, should you ever need to work around a bug.
utf8::downgrade($s); # Switch to strings of 8-bit values (UTF8=0).
utf8::upgrade($s); # Switch to strings of 72-bit values (UTF8=1).
You can see the effect using Devel::Peek.
>perl -MDevel::Peek -e"$s=chr(0x80); utf8::downgrade($s); Dump($s);"
SV = PV(0x7b8a74) at 0x4a84c4
PV = 0x7bab9c "\200"\0
CUR = 1
LEN = 12
>perl -MDevel::Peek -e"$s=chr(0x80); utf8::upgrade($s); Dump($s);"
SV = PV(0x558a6c) at 0x1cc843c
PV = 0x55ab94 "\302\200"\0 [UTF8 "\x{80}"]
CUR = 2
LEN = 12
The \x{FB01} and \x{E9} are code points.
Not quiet, the numeric values inside the braces are codepoints. The whole \x expression is just a notation for a character. There are several notations for characters, most of them starting with a backslash, but the common one is the simple string literal. You might as well write:
use utf8;
my $string = "Can you find my résumé?\n";
# ↑ ↑ ↑
And code points are encoded via an encoding scheme to a series of octets.
True, but so far your string is a string of characters, not a buffer of octets.
But how does this work?
Strings consist of characters. That's just Perl's model. You as a programmer are supposed to deal with it at this level.
Of course, the computer can't, and the internal data structure must have some form of internal encoding. Far too much confusion ensues because "Perl can't keep a secret", the details leak out occasionally.
Are all the characters in this sentence (including the ASCII ones) encoded via UTF-8?
No, the internal encoding is lax UTF8 (no dash). It does not have some of the restrictions that UTF-8 (a.k.a. UTF-8-strict) has.
UTF-8 goes up to 0x10_ffff, UTF8 goes up to 0xffff_ffff_ffff_ffff on my 64-bit system. Codepoints greater than 0xffff_ffff will emit a non-portability warning, though.
In UTF-8 certain codepoints are non-characters or illegal characters. In UTF8, anything goes.
… is an internals function, and is clearly marked as such. You as a programmer are not supposed to peek. But since you want to peek, no one can stop you. Devel::Peek::Dump is a better tool for getting at the internals.
Read for an introduction to the topic of encoding in Perl.
is_utf8 is a badly-named function that doesn't mean what you think it means or have anything to do with that. The answer to your question is that $string doesn't have an encoding, because it's not encoded. When you call Encode::encode with some encoding, the result of that will be a string that is encoded, and has a known encoding

What the heck is a Perl string anyway?

I can't find a basic description of how string data is stored in Perl! Its like all the documentation is assuming I already know this for some reason. I know about encode(), decode(), and I know I can read raw bytes into a Perl "string" and output them again without Perl screwing with them. I know about open modes. I also gather Perl must use some interal format to store character strings and can differentiate between character and binary data. Please where is this documented???
Equivalent question is; given this perl:
$x = decode($y);
Decode to WHAT and from WHAT??
As far as I can figure there must be a flag on the string data structure that says this is binary XOR character data (of some internal format which BTW is a superset of Unicode - But I'd like it if that were stated in the docs or confirmed/discredited here.
This is a great question. To investigate, we can dive a little deeper by using Devel::Peek to see what is actually stored in our strings (or other variables).
First lets start with an ASCII string
$ perl -MDevel::Peek -E 'Dump "string"'
SV = PV(0x9688158) at 0x969ac30
PV = 0x969ea20 "string"\0
CUR = 6
LEN = 12
Then we can turn on unicode IO layers and do the same
$ perl -MDevel::Peek -CSAD -E 'Dump "string"'
SV = PV(0x9eea178) at 0x9efcce0
PV = 0x9f0faf8 "string"\0
CUR = 6
LEN = 12
From there lets try to manually add some wide characters
$ perl -MDevel::Peek -CSAD -e 'Dump "string \x{2665}"'
SV = PV(0x9be1148) at 0x9bf3c08
PV = 0x9bf7178 "string \342\231\245"\0 [UTF8 "string \x{2665}"]
CUR = 10
LEN = 12
From that you can clearly see that Perl has interpreted this correctly as utf8. The problem is that if I don't give the octets using the \x{} escaping the representation looks more like the regular string
$ perl -MDevel::Peek -CSAD -E 'Dump "string ♥"'
SV = PV(0x9143058) at 0x9155cd0
PV = 0x9168af8 "string \342\231\245"\0
CUR = 10
LEN = 12
All Perl sees is bytes and has no way to know that you meant them as a unicode character, unlike when you entered the escaped octets above. Now lets use decode and see what happens
$ perl -MDevel::Peek -CSAD -MEncode=decode -E 'Dump decode "utf8", "string ♥"'
SV = PV(0x8681100) at 0x8683068
PV = 0x869dbf0 "string \342\231\245"\0 [UTF8 "string \x{2665}"]
CUR = 10
LEN = 12
TADA!, now you can see that the string is correctly internally represented matching what you entered when you used the \x{} escaping.
The actual answer is it is "decoding" from bytes to characters, but I think it makes more sense when you see the Peek output.
Finally, you can make Perl see you source code as utf8 by using the utf8 pragma, like so
$ perl -MDevel::Peek -CSAD -Mutf8 -E 'Dump "string ♥"'
SV = PV(0x8781170) at 0x8793d00
PV = 0x87973b8 "string \342\231\245"\0 [UTF8 "string \x{2665}"]
CUR = 10
LEN = 12
Rather like the fluid string/number status of its scalar variables, the internal format of Perl's strings is variable and depends on the contents of the string.
Take a look at perluniintro, which says this.
Internally, Perl currently uses either whatever the native eight-bit character set of the platform (for example Latin-1) is, defaulting to UTF-8, to encode Unicode strings. Specifically, if all code points in the string are 0xFF or less, Perl uses the native eight-bit character set. Otherwise, it uses UTF-8.
What that means is that a string like "I have £ two" is stored as (bytes) I have \x{A3} two. (The pound sign is U+00A3.) Now if I append a multi-byte unicode string such as U+263A - a smiling face - Perl will convert the whole string to UTF-8 before it appends the new character, giving (bytes) I have \xC2\xA3 two\xE2\x98\xBA. Removing this last character again leaves the string UTF-8 encoded, as `I have \xC2\xA3 two.
But I wonder why you need to know this. Unless you are writing an XS extension in C the internal format is transparent and invisible to you.
Perls internal string format is implementation dependant, but usually a super set of UtF-8. It doesn't matter what it is because you use decode and encode to convert strings to and from the internal format to other encodings.
Decode converts to perls internal format, encode converts from perls internal format.
Binary data is stored internaly the same way characters 0 through 255 are.
Encode and decode just convert between formats. For example UTF8 encoding means each character will only be an octet using perl character vlaues 0 through 255, ie that the string consists of UTF8 octets.
Short answer: It's a mess
Slightly longer: The difference isn't visible to the programmer.
Basically you have to remember if your string contains bytes or characters, where characters are unicode codepoints. If you only encounter ASCII, the difference is invisible, which is dangerous.
Data itself and the representation of such data are distinct, and should not be confused. Strings are (conceptually) a sequence of codepoints, but are represented as a byte array in memory, and represented as some byte sequence when encoded. If you want to store binary data in a string, you re-interpret the number of a codepoint as a byte value, and restrict yourself to codepoints in 0–255.
(E.g. a file has no encoding. The information in that file has some encoding (be it ASCII, UTF-16 or EBCDIC at a character level, and Perl, HTML or .ini at an application level))
The exact storage format of a string is irrelevant, but you can store complete integers inside such a string:
# this will work if your perl was compiled with large integers
my $string = chr 2**64; # this is so not unicode
say ord $string; # 18446744073709551615
The internal format is adjusted accordingly to accomodate such values; normal strings won't take up one integer per character.
Perl can handle more than Unicode can, so it's very flexible. Sometimes you want to interface with something that cannot, so you can use encode(...) and decode(...) handle those transformations. see

Implementing Digest-MD5 in J2ME: How to compute the 16 octet MD5 hash of a String?

I am implementing digest-md5 in J2ME.In the computation of the client response, the following steps are given:
Create a string of the form "username:realm:password". Call this string X.
Compute the 16 octet MD5 hash of X. Call the result Y.
Create a string of the form "Y:nonce:cnonce:authzid". Call this string A1.
Create a string of the form "AUTHENTICATE:digest-uri". Call this string A2.
Compute the 32 hex digit MD5 hash of A1. Call the result HA1.
Compute the 32 hex digit MD5 hash of A2. Call the result HA2.
Create a string of the form "HA1:nonce:nc:cnonce:qop:HA2". Call this string KD.
Compute the 32 hex digit MD5 hash of KD. Call the result Z.
Does anyone here know how to implement step 2? I have a md5 function that returns a 32 hex digit but i don't know how to compute a 16 octet md5 hash?
I just would like to reiterate that I am using J2ME. In that case I can not simply use MessageDigest.
Thank you in advance. :)
see this
use MD-5 instead SHA 256
