Problems when processing bytes in perl

Problems when processing bytes in perl - string

I am working on a script that will encrypt a file using RSA. It reads and encrypts 1 byte at a time. Encrypting and decrypting normal .txt files works well, but when encrypting and then decrypting binary files (e.g. .gif) they come out corrupted.
This is done by the encryptFile and decryptFile sub.
I am using the IO::File module to open files and add binmode(":raw") so that when a byte is read it is interpreted as a byte and not text, so this can't be the problem.
When encrypting the bytes I first use my bytesToBigint() sub to translate the byte into an integer. Then I use my rsa::encrypt() sub to encrypt the integer. Now the encrypted integer will be much larger than 1 byte so I have to represent it in multiple bytes.
I do this in the sub bigintToBytes() which basically splits an integer into multiple bytes and stores them in a string. The string is returned and then written to the file.
For example: if bigintToBytes(16739) is called then the string 'Ac' is returned because
Starting with 16739 (dec)
→ 0100000101100011 (binary)
→ 01000001|01100011
→ 65|99 (dec)
and chr(65) = 'A' and chr(99) = 'c' → 'Ac'.
This sub may be the cause, but why? is it because I am storing bytes in a string? When calling the read($buf, $bufsize) sub on open files, using the IO::File module, it also stores bytes in a string, which works.
I would be really thankfull if you point out what I am missing.
Here is the bigintToBytes sub
sub bigintToBytes {
use bignum;
use bytes;
my $bigint = shift;
my $bits_in_int = length_in_bytes($bigint)*8;
my $bytes = '';
my $new_byte = 0;
my $count = 0;
while ($bits_in_int > 0){
# add first bit in bigint to the new byte
$new_byte = ($new_byte << 1)|($bigint >> ($bits_in_int-1));
# remove the first bit in bigint
$bigint = $bigint & (2**($bits_in_int)-1);
$bits_in_int--;
$count++;
if ($count == 8){
$bytes = $bytes.chr($new_byte);
$new_byte = 0;
$count = 0;
}
}
return $bytes;
}
I have also added use bytes; to the encryptFile and decryptFile subs if it matter.
I don't feel it is necessary for me to post the bytesToBigint, encryptFile and decryptFile subs, because bytesToBigint basically does the reverse and encryptFile and decryptFile just read the bytes and process them using these function.
Edit:
Here is the code snippet in encryptFile sub. This is where bytesToBigint and encryptFile are used and written to the file.
my $bufsize = 1;
my $buf = '';
while ( $file->read($buf, $bufsize)){
my $msg = myMath::bytesToBigint($buf);
my $enc_msg = rsa::encrypt($n, $enc_key, $msg);
my $enc_msg_as_chars = myMath::bigintToBytes($enc_msg);
#in case the encrypted unit is to small
while (length($enc_msg_as_chars) < $min_bytes){
$enc_msg_as_chars = chr(0).$enc_msg_as_chars;
}
$newFile->print($enc_msg_as_chars);
}

The problem in bigintToBytes is that it returns an empty string when input integer is zero. So I added
if($x == 0){
return chr(0);
}
and the problem is solved!

Related

Writing bytes in a pdf

I am getting a response for an API in form of bytes.
let a = "JSHDHHFHFHHFKFLLFLDMDMDMDMMSKKW==";
I want to write this into a pdf file.
The approach I have taken till now is to convert it to binary using atob library. Then I convert it to a Uint8Array and write to a file using fs.writestream. When the file write completes it gives me an output of unidetified type.
%
1 0 obj
<</Filter/FlateDecode/First 141/N 20/Length 848/Type/ObjStm>>
stream
xUmoâ8þ+óm[U½ø%/ÎiU ÈÂr]º¸ë¢|ð/)`©ý÷7ã´vK·HQ2gÆÏ3À#)ÀcHBd#*P§¤ Î U8GC8q3È_C¦x¦¤øU8Oà3$c¨*Æ/æK?óÝ7÷¸5Á`
íÆ-Ð4?¶Î¬Çï²3Ù=0YØÑ8èm0.ÍÆUî1  ¢ý+3í²©¶Î6"ûº5~º?½»®Fzçôºº¹!ÙO>=¸ÑÜigºù÷¼ÏyDqÇIè?*Ê¥J!âB)ÿ§ ½àøÌÝDÇþ{;ü2¹ê5®¯û¶.'1Í«ÚÊZã#$~4GØÿda0vº®½Íª6H-àZ&Lo?jõÃáÁIÐë¦¼=¬õªñ+¿T¿oòë³S$±Þ±ð³wzm^ú²ÍNÏ¨J[Bß6iEði³´eµYíi»þ|#ÜÂþ½©ÐÁx/iãt©ÑÕÌT¯LKNy³µ]½dzHÞÈì¿ì¯Á¢ùï2Ta°<b¬HDNµ¤¹êþø2Äz=WTâ÷hg õr¡æQîI²2xj;÷æÁe[|Ó à±¦#b\:IEÌ,ékvª_]ØÌ´v×,Mû$êCô¯hêgþp»DEäÁ4óàµ#Å¡v$§vDx¤y yR;qè#Q;ByÇíÓ{Z6»£UÛªlsÎ»
ÜÙ>5»[pÍÎ_§tíO;Û¬u}¢ñm·µYv|mJÓ`)_<×Ç%²½ªZ×<^ôJûÍ\þÒ{Â£Þð'"u?ÅÅ!\{þÈ~?âEF¡xàBxÏþigX]¿quu&^/ú¶ìÅ½IüþvZj<§A_½ñ¾ëº5¯ÖÄ.²?ãsÁY_1ñ±Á 1ÚUvó¶
£Ü-Ms1~ÑÛº#Hÿìr$ö¤ÿ²}R
endstream
endobj
22 0 obj```
When I am trying the response on online editor it gives me the write response.
The code I have used till now.
let encodedPDF = JSON.parse(d).Resp_Policy_Document.PDF_BYTES;
var bin = atob(encodedPDF);
var binaryLen = bin.length;
var bytes = new Uint8Array(binaryLen);
for (var i = 0; i < binaryLen; i++)
{
var ascii = bin.charCodeAt(i);
bytes[i] = ascii;
}
let writer=fs.createWriteStream('Last.pdf');
writer.write(bin);

The data you get is Base64 encoded. That's a pretty common way for APIs to pass information. The giveaways? the equal signs at the end, and the use of ASCII uppercase and lowercase letters, numbers, +, and /.
So, you need to decode it that way. Use something like this
const pdfBinary = Buffer.from(a, 'base64');
The contents of this buffer, I guess, are a pdf document. You should write it directly to a file without trying to convert it to a text string.

CRijndael only encrpyting first 32 bytes of longer string

I'm using CRijndael ( http://www.codeproject.com/Articles/1380/A-C-Implementation-of-the-Rijndael-Encryption-Decr ) for encryption using a null based iv (I know that's an issue but for certain reasons I'm stuck with having to use that).
For strings that are longer (or contain a few ampersands) I'm only ever getting the first 32 bytes encrypted. Shorter strings are encrypted without any issues. Code is below, any ideas?
char dataIn[] = "LONG STRING HERE";
string preInput = dataIn;
CRijndael aRijndael;
aRijndael.MakeKey("32-BIT-KEY-HERE", CRijndael::sm_chain0, 32, 16);
while (preInput.length() % 16 != 0) {
preInput += '\0';
}
const char *encInput = preInput.c_str();
char szReq[1000];
aRijndael.Encrypt(preInput.c_str(), szReq, preInput.size(), CRijndael::CBC);
const std::string preBase64 = szReq;
std::string encoded = base64_encode(reinterpret_cast<const unsigned char*>(preBase64.c_str()), preBase64.length());

Converting Text to HTML In D

I'm trying to figure the best way of encoding text (either 8-bit ubyte[] or string) to its HTML counterpart.
My proposal so far is to use a lookup-table to map the 8-bit characters
string[256] lutLatin1ToHTML;
lutLatin1ToXML[0x22] = "&quot";
lutLatin1ToXML[0x26] = "&amp";
...
in HTML that have special meaning using the function
pure string toHTML(in string src,
ref in string[256] lut) {
return src.map!(a => (lut[a] ? lut[a] : new string(a))).reduce!((a, b) => a ~ b) ;
}
I almost work except for the fact that I don't know how to create a string from a `ubyte? (the no-translation case).
I tried
writeln(new string('a'));
but it prints garbage and I don't know why.
For more details on HTML encoding see https://en.wikipedia.org/wiki/Character_entity_reference

You can make a string from a ubyte most easily by doing "" ~ b, for example:
ubyte b = 65;
string a = "" ~ b;
writeln(a); // prints A
BTW, if you want to do a lot of html stuff, my dom.d and characterencodings.d might be useful:
https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff
It has a html parser, dom manipulation functions similar to javascript (e.g. ele.querySelector(), getElementById, ele.innerHTML, ele.innerText, etc.), conversion from a few different character encodings, including latin1, and outputs ascii safe html with all special and unicode characters properly encoded.
assert(htmlEntitiesEncode("foo < bar") == "foo < bar";
stuff like that.

In this case Adam's solution works just fine, of course. (It takes advantage of the fact that ubyte is implicitly convertible to char, which is then appended to the immutable(char)[] array for which string is an alias.)
In general the safe way of converting types is to use std.conv.
import std.stdio, std.conv;
void main() {
// utf-8
char cc = 'a';
string s1 = text(cc);
string s2 = to!string(cc);
writefln("%c %s %s", cc, s1, s2);
// utf-16
wchar wc = 'a';
wstring s3 = wtext(wc);
wstring s4 = to!wstring(wc);
writefln("%c %s %s", wc, s3, s4);
// utf-32
dchar dc = 'a';
dstring s5 = dtext(dc);
dstring s6 = to!dstring(dc);
writefln("%c %s %s", dc, s5, s6);
ubyte b = 65;
string a = to!string(b);
}
NB. text() is actually intended for processing multiple arguments, but is conveniently short.

Generating a fake ISBN from book title? (Or: How to hash a string into a 6-digit numeric ID)

Short version: How can I turn an arbitrary string into a 6-digit number with minimal collisions?
Long version:
I'm working with a small library that has a bunch of books with no ISBNs. These are usually older, out-of-print titles from tiny publishers that never got an ISBN to begin with, and I'd like to generate fake ISBNs for them to help with barcode scanning and loans.
Technically, real ISBNs are controlled by commercial entities, but it is possible to use the format to assign numbers that belong to no real publisher (and so shouldn't cause any collisions).
The format is such that:
978-0-01-######-?
Gives you 6 digits to work with, from 000000 to 999999, with the ? at the end being a checksum.
Would it be possible to turn an arbitrary book title into a 6-digit number in this scheme with minimal chance of collisions?

After using code snippets for making a fixed-length hash and calculating the ISBN-13 checksum, I managed to create really ugly C# code that seems to work. It'll take an arbitrary string and convert it into a valid (but fake) ISBN-13:
public int GetStableHash(string s)
{
uint hash = 0;
// if you care this can be done much faster with unsafe
// using fixed char* reinterpreted as a byte*
foreach (byte b in System.Text.Encoding.Unicode.GetBytes(s))
{
hash += b;
hash += (hash << 10);
hash ^= (hash >> 6);
}
// final avalanche
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
// helpfully we only want positive integer < MUST_BE_LESS_THAN
// so simple truncate cast is ok if not perfect
return (int)(hash % MUST_BE_LESS_THAN);
}
public int CalculateChecksumDigit(ulong n)
{
string sTemp = n.ToString();
int iSum = 0;
int iDigit = 0;
// Calculate the checksum digit here.
for (int i = sTemp.Length; i >= 1; i--)
{
iDigit = Convert.ToInt32(sTemp.Substring(i - 1, 1));
// This appears to be backwards but the
// EAN-13 checksum must be calculated
// this way to be compatible with UPC-A.
if (i % 2 == 0)
{ // odd
iSum += iDigit * 3;
}
else
{ // even
iSum += iDigit * 1;
}
}
return (10 - (iSum % 10)) % 10;
}
private void generateISBN()
{
string titlehash = GetStableHash(BookTitle.Text).ToString("D6");
string fakeisbn = "978001" + titlehash;
string check = CalculateChecksumDigit(Convert.ToUInt64(fakeisbn)).ToString();
SixDigitID.Text = fakeisbn + check;
}

The 6 digits allow for about 10M possible values, which should be enough for most internal uses.
I would have used a sequence instead in this case, because a 6 digit checksum has relatively high chances of collisions.
So you can insert all strings to a hash, and use the index numbers as the ISBN, either after sorting or without it.
This should make collisions almost impossible, but it requires keeping a number of "allocated" ISBNs to avoid collisions in the future, and keeping the list of titles that are already in store, but it's information that you would most probably want to keep anyway.
Another option is to break the ISBN standard and use hexadecimal/uuencoded barcodes, that may increase the possible range to a point where it may work with a cryptographic hash truncated to fit.
I would suggest that since you are handling old book titles, which may have several editions capitalized and punctuated differently, I would strip punctuation, duplicated whitespaces and convert everything to lowercase before the comparison to minimize the chance of a technical duplicate even though the string is different (Unless you want different editions to have different ISBNs, in that case, you can ignore this paragraph).

Checksum Algorithm Producing Unpredictable Results

I'm working on a checksum algorithm, and I'm having some issues. The kicker is, when I hand craft a "fake" message, that is substantially smaller than the "real" data I'm receiving, I get a correct checksum. However, against the real data - the checksum does not work properly.
Here's some information on the incoming data/environment:
This is a groovy project (see code below)
All bytes are to be treated as unsigned integers for the purpose of checksum calculation
You'll notice some finagling with shorts and longs in order to make that work.
The size of the real data is 491 bytes.
The size of my sample data (which appears to add correctly) is 26 bytes
None of my hex-to-decimal conversions are producing a negative number, as best I can tell
Some bytes in the file are not added to the checksum. I've verified that the switch for these is working properly, and when it is supposed to - so that's not the issue.
My calculated checksum, and the checksum packaged with the real transmission always differ by the same amount.
I have manually verified that the checksum packaged with the real data is correct.
Here is the code:
// add bytes to checksum
public void addToChecksum( byte[] bytes) {
//if the checksum isn't enabled, don't add
if(!checksumEnabled) {
return;
}
long previouschecksum = this.checksum;
for(int i = 0; i < bytes.length; i++) {
byte[] tmpBytes = new byte[2];
tmpBytes[0] = 0x00;
tmpBytes[1] = bytes[i];
ByteBuffer tmpBuf = ByteBuffer.wrap(tmpBytes);
long computedBytes = tmpBuf.getShort();
logger.info(getHex(bytes[i]) + " = " + computedBytes);
this.checksum += computedBytes;
}
if(this.checksum < previouschecksum) {
logger.error("Checksum DECREASED: " + this.checksum);
}
//logger.info("Checksum: " + this.checksum);
}
If anyone can find anything in this algorithm that could be causing drift from the expected result, I would greatly appreciate your help in tracking this down.

I don't see a line in your code where you reset your this.checksum.
This way, you should alway get a this.checksum > previouschecksum, right? Is this intended?
Otherwise I can't find a flaw in your above code. Maybe your 'this.checksum' is of the wrong type (short for instance). This could rollover so that you get negative values.
here is an example for such a behaviour
import java.nio.ByteBuffer
short checksum = 0
byte[] bytes = new byte[491]
def count = 260
for (def i=0;i<count;i++) {
bytes[i]=255
}
bytes.each { b ->
byte[] tmpBytes = new byte[2];
tmpBytes[0] = 0x00;
tmpBytes[1] = b;
ByteBuffer tmpBuf = ByteBuffer.wrap(tmpBytes);
long computedBytes = tmpBuf.getShort();
checksum += computedBytes
println "${b} : ${computedBytes}"
}
println checksum +"!=" + 255*count
just play around with the value of the 'count' variable which somehow corresponds to the lenght of your input.

Your checksum will keep incrementing until it rolls over to being negative (as it is a signed long integer)
You can also shorten your method to:
public void addToChecksum( byte[] bytes) {
//if the checksum isn't enabled, don't add
if(!checksumEnabled) {
return;
}
long previouschecksum = this.checksum;
this.checksum += bytes.inject( 0L ) { tot, it -> tot += it & 0xFF }
if(this.checksum < previouschecksum) {
logger.error("Checksum DECREASED: " + this.checksum);
}
//logger.info("Checksum: " + this.checksum);
}
But that won't stop it rolling over to being negative. For the sake of saving 12 bytes per item that you are generating a hash for, I would still suggest something like MD5 which is know to work is probably better than rolling your own... However I understand sometimes there are crazy requirements you have to stick to...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Problems when processing bytes in perl - string

The problem in bigintToBytes is that it returns an empty string when input integer is zero. So I added if($x == 0){ return chr(0); } and the problem is solved!

Related

Writing bytes in a pdf

CRijndael only encrpyting first 32 bytes of longer string

Converting Text to HTML In D

Generating a fake ISBN from book title? (Or: How to hash a string into a 6-digit numeric ID)

Checksum Algorithm Producing Unpredictable Results

Categories

Resources