Converting compressed data in array of bytes to string - string

Suppose I have an Array[Byte] called cmp. val cmp = Array[Byte](120, -100).
Now, new String(cmp) gives x�, and (new String(cmp)).getBytes gives Array(120, -17, -65, -67) which isn't equal to the original Array[Byte](120, -100). This byte of -100 was part of an Array[Byte] obtained by compressing some string using Zlib.
Note: These operations were done in Scala's repl.

When you've got arbitrary binary data, never ever try to convert it to a string as if it's actually text data which has been encoded into binary data using a normal encoding such as UTF-8. (Even when you do have text data, always specify the encoding when calling the String constructor or getBytes().) Otherwise it's like trying to load an mp3 into an image editor and complaining when it doesn't look like a proper picture...
Basically, you should probably use base64 for this. There are plenty of base64 encoders around; I like this public domain one as it has a pretty sensible interface. Alternatively, you could use hex - that will be more readable if you want to be able to easily understand the original binary content from the text representation manually, but will take more space (2 characters for each original 1 byte, vs base64's 4 characters for each original 3 bytes).

More like Java, but java.io.ByteArrayInputStream, java.util.zip.InflaterInputStream and java.io.DataInputStream can be used.
import java.io._
val bis = new ByteArrayInputStream(cmp)
val zis = new InflaterInputStream(bis)
val dis = new DataInputStream(zis)
val str = dis.readUTF()
To go backwards,
val bos = new ByteArrayOutputStream()
val zos = new InflaterOutputStream(bos)
val dos = new DataOutputStream(zos)
dos.writeUTF(str)
val cmp = bos.toArray

Related

Convert bytes to string while replacing SOH character

Using python 3.6. I have a bytes array that is coming over a socket (a FIX message) that contains hex character 1, start of header, as a delimiter. The raw bytes look like
b'8=FIX.4.4\x019=65\x0135=0\x0152=20220809-21:37:06.893\x0149=TRADEWEB\x0156=ABFIXREPO\x01347=UTF-8\x0110=045\x01'
I want to store this bytes array to a file for logging. I have seen FIX messages where this delimiter is converted to ^A control character. The final string I would like to have is -
8=FIX.4.4^A9=65^A35=0^A52=20220809-21:37:06.893^A49=TRADEWEB^A56=ABFIXREPO^A347=UTF-8^A10=045^A
I have tried various different ways to achieve this but could not, for example, tried repr(bytes) and (ord(b) in bytes).
Any pointers are highly appreciated.
Thanks.
One way I can think of doing this is by splitting the original decoded byte array , and insert control character using a loop.
data = b'8=FIX.4.4\x019=65\x0135=0\x0152=20220809-21:37:06.893\x0149=TRADEWEB\x0156=ABFIXREPO\x01347=UTF-8\x0110=045\x01'
data = data.decode().split("\x01")
print(data)
ctl_char = "^A"
string =""
for i in data:
string+=i
string+=ctl_char
print(string)
Please note there can be a better way of doing it.

Groovy encodeBase64() returning unexpected result for PNG image file

I am trying to convert a PNG image file to Base64 encoding in Groovy.
Here is my code:
ImageFile = new File("D:/DATA/CustomScript/Logo.png").text;
String encoded = ImageFile.getBytes().encodeBase64().toString();
I get the following as result:
iVBORw0KGgoAAAANSUhEUgAAAIQAAABPCAIAAAClCfqHAAAABGdBTUEAALE/C/xhBQAAAAlwSFlzAAAOwwAADsMBx2+oZAAAAQ1JREFUeF7t1KGRgwAURdFVyHQbSwOkKlrIoECDSwusoYgDcz97396Z/3eGUQxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgzIE2IcxzHP87qu176tJ8T4/X7Lsuz7fu3b6k1BigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAigEpBqQYP2JAnhNj27ZxHN/v9/f7vU5385wYn8/n9XoNwzBN03W6l/P8BwSpsfw4c1/6AAAAAElFTkSuQmCC
The same image when passed through https://www.base64encode.org/ gives this result:
iVBORw0KGgoAAAANSUhEUgAAAIQAAABPCAIAAAClCfqHAAAABGdBTUEAALGPC/xhBQAAAAlwSFlzAAAOwwAADsMBx2+oZAAAAQ1JREFUeF7t1KGRgwAURdFVyHQbSwOkKlrIoECDSwusoYgDc497396Z/3eGUQxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgxIMSDFgBQDUgzIE2IcxzHP87qu176tJ8T4/X7Lsuz7fu3b6k1BigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAigEpBqQYkGJAnhNj27ZxHN/v9/f7vU5385wYn8/n9XoNwzBN03W6l/P8BwSpsfw4c1/6AAAAAElFTkSuQmCC
I have tried to highlight some of the differences. It is clear that both encoded strings are different.
Problem is that I have to pass this image's Base64 encoding to another system and it is accepting the one from https://www.base64encode.org/ but rejecting the one generated by Groovy.
Any ideas what I am doing wrong here?
You are hiting an encoding problem here. Binary data is not character data; character data is effected by encodings. Instead of text use the bytes of the file. E.g.
def f = "/tmp/screenshot-000.png" as File
assert f.bytes.encodeBase64().toString()==("/tmp/encoded_20190208131326.txt" as File).text
Answer from user cfrick was extremely helpful. Unfortunately, it didn't solve my problem. I believe the reason was that I was on an older version of Groovy.
This code eventually solved my problem:
String base64Image = "";
File file = new File(imagePath);
FileInputStream imageInFile = new FileInputStream(file);
byte[] imageData = new byte[file.size()];
imageInFile.read(imageData);
base64Image = Base64.getEncoder().encodeToString(imageData);

Scala - proper way to print a string to file

What is the proper way to print a string - and only the string - to file? When I try to do it the standard way known to me, i.e:
def printToFile(o:Object,n:String) = try{
val pathToOutput = "..\\some\\parent\\directory\\"
val path = Paths.get(pathToOutput + n)
val b = new ByteArrayOutputStream()
val os = new ObjectOutputStream(b)
os.writeObject(o)
Files.write(path, b.toByteArray,
StandardOpenOption.CREATE,
StandardOpenOption.TRUNCATE_EXISTING)
}catch{
case _:Exception => println("failed to write")
}
it always seems to prepend
’ NUL ENQtSTXT
Where the part after ENQt seems to vary.
(Doesn't matter if I declare oan Object or a String.)
This is very annoying because I want to print a couple of .dot-Strings (Graphviz) in order to then batch-process the resulting .dot-files to .pdf-files. The prepended nonsense, however, forces me to open each .dot-file and remove it manually - which kind of defeats the purpose of batch-processing them.
This has nothing to do with Scala specifically, it's the way the Java Standard Library works. When you do a writeObject you are writing a Serialized representation of the Object, together with a bunch of additional bytes the JVM can use to re-create that object. If you know the object is a String, then strong-type it (i.e., use printToFile(o:String,n:String) and you can use Files.write(path, o.getBytes, .... Otherwise you could use o.toString.getBytes.
Generally in JVM, if you want to write characters and not bytes, you should prefer *Writer over *OutputStream. In this case (assuming you have a File where you want to write and a String which you want to write):
val writer = new BufferedWriter(new FileWriter(file))
try {
writer.write(string)
} finally {
writer.close()
}
Or with the character-oriented overload of Files.write:
Files.write(path, Collections.singletonList(string), ...)

Node Buffers, from utf8 to binary

I'm receiving data as utf8 from a source and this data was originally in binary form (it was a Buffer). I have to convert back this data to a Buffer. I'm having a hard time figuring how to do this.
Here's a small sample that shows my problem:
var hexString = 'e61b08020304e61c09020304e61d0a020304e61e65';
var buffer1 = new Buffer(hexString, 'hex');
var str = buffer1.toString('utf8');
var buffer2 = new Buffer(str, 'utf8');
console.log('original content:', hexString);
console.log('buffer1 contains:', buffer1.toString('hex'));
console.log('buffer2 contains:', buffer2.toString('hex'));
prints
original content: e61b08020304e61c09020304e61d0a020304e61e65
buffer1 contains: e61b08020304e61c09020304e61d0a020304e61e65
buffer2 contains: efbfbd1b08020304efbfbd1c09020304efbfbd1d0a020304efbfbd1e65
Here, I would like buffer2 to be the exact same thing as buffer1.
How can I convert an utf8 string to its original binary Buffer?
You cannot expect binary data converted to utf8 and back again to be the same as the original binary data because of the way utf8 works (especially when invalid utf8 characters are replaced with \ufffd).
You have to use another format that correctly preserves the data. This could be 'hex', 'base64', 'binary', or some other binary-safe format provided by a third-party module. Obviously you should probably keep it as a Buffer if you can.
The accepted answer is misleading. Your main problem is that you're dealing with invalid UTF-8. If the data were valid, the conversion would not cause issues.
Specifically, take the first two bytes: e61b.
In binary, that's: 11100110, 00011011. This is invalid. Take a look at this diagram from the utf-8 wikipedia page.
This says that if a byte starts with 1110, the next byte must start with two bytes starting with 10 after it. This is not the case here.
Whenever js hits an invalid character, it replaces it with �, the unicode replacement character. The codepoint for that is U+FFFD, and the utf-8 encoding of that code point is efbfbd. Notice that this shows up in your output a few times.

Convert Binary data from file to readable string

I have binary data stored in a file. I am doing this:
byte[] fileBytes = File.ReadAllBytes(#"c:\carlist.dat");
string ascii = Encoding.ASCII.GetString(fileBytes);
This is giving me following result with lot of invalid characters. What am i doing wrong?
?D{F ?x#??4????? NBR-OF-CARSNUMBER-OF-CARS!"#??? NBR-OF-CARS$%??1y0#123?G??#$ NBR-OF-CARS%45??1y#  NUMBER-OF-CARSd?
hmm... seems like a save was made from a byte buffer where after NBR-OF-CARS was written some numeric data. If you have an access to the code that saves the file could you check if there are numbers over there and if there are - check does the code converts numbers to string before witing the value into the binary stream.

Resources