bytes to string conversion with invalid characters

bytes to string conversion with invalid characters - string

I need to parse UDP packets which can be invalid or contain some errors. I would like to replace invalid characters with . after a bytes to string conversion, in order to display the content of the packets.
How can I do it? This is my code:
func main() {
a := []byte{'a', 0xff, 0xaf, 'b', 0xbf}
s := string(a)
s = strings.Replace(s, string(0xFFFD), ".", 0)
fmt.Println("s: ", s) // I would like to display "a..b."
for _, r := range s {
fmt.Println("r: ", r)
}
rs := []rune(s)
fmt.Println("rs: ", rs)
}

The root problem with your approach is that the result of type converting []byte to string does not have any U+FFFDs in it: this type-conversion only copies bytes from the source to the destination, verbatim.
Just as byte slices, strings in Go are not obliged to contain UTF-8-encoded text; they can contain any data, including opaque binary data which has nothing to do with text.
But some operations on strings—namely type-converting them to []rune and iterating over them using range—do interpret strings as UTF-8-encoded text.
That is precisely where you got tripped: your range debugging loop attempted to interpret the string, and each time another attempt at decoding a properly encoded code point failed, range yielded a replacement character, U+FFFD.
To reiterate, the string obtained by the type-conversion does not contain the characters you wanted to get replaced by your regexp.
As to how to actually make a valid UTF-8-encoded string out of your data, you might employ a two-step process:
Type-convert your byte slice to a string—as you already do.
Use any means of interpreting a string as UTF-8—replacing U+FFFD which will dynamically appear during this process—as you're iterating.
Something like this:
var sb strings.Builder
for _, c := range string(b) {
if c == '\uFFFD' {
sb.WriteByte('.')
} else {
sb.WriteRune(c)
}
}
return sb.String()
A note on performance: since type-converting a []byte to string copies memory—because strings are immutable while slices are not—the first step with type-conversion might be a waste of resources for code dealing with large chunks of data and/or working in tight processing loops.
In this case, it may be worth using the DecodeRune function of the encoding/utf8 package which works on byte slices.
An example from its docs can be easily adapted to work with the loop above.
See also: Remove invalid UTF-8 characters from a string

#kostix answer is correct and explains very clearly the issue with scanning unicode runes from a string.
Just adding the following remark : if your intention is to view characters only in the ASCII range (printable characters < 127) and you don't really care about other unicode code points, you can be more blunt :
// create a byte slice with the same byte length as s
var bs = make([]byte, len(s))
// scan s byte by byte :
for i := 0; i < len(s); i++ {
switch {
case 32 <= s[i] && s[i] <= 126:
bs[i] = s[i]
// depending on your needs, you may also keep characters in the 0..31 range,
// like 'tab' (9), 'linefeed' (10) or 'carriage return' (13) :
// case s[i] == 9, s[i] == 10, s[i] == 13:
// bs[i] = s[i]
default:
bs[i] = '.'
}
}
fmt.Printf("rs: %s\n", bs)
playground
This function will give you something close to the "text" part of hexdump -C.

You may want to use strings.ToValidUTF8() for this:
ToValidUTF8 returns a copy of the string s with each run of invalid UTF-8 byte sequences replaced by the replacement string, which may be empty.
It "seemingly" does exactly what you need. Testing it:
a := []byte{'a', 0xff, 0xaf, 'b', 0xbf}
s := strings.ToValidUTF8(string(a), ".")
fmt.Println(s)
Output (try it on the Go Playground):
a.b.
I wrote "seemingly" because as you can see, there's a single dot between a and b: because there may be 2 bytes, but a single invalid sequence.
Note that you may avoid the []byte => string conversion, because there's a bytes.ToValidUTF8() equivalent that operates on and returns a []byte:
a := []byte{'a', 0xff, 0xaf, 'b', 0xbf}
a = bytes.ToValidUTF8(a, []byte{'.'})
fmt.Println(string(a))
Output will be the same. Try this one on the Go Playground.
If it bothers you that multiple (invalid sequence) bytes may be shrinked into a single dot, read on.
Also note that to inspect arbitrary byte slices that may or may not contain texts, you may simply use hex.Dump() which generates an output like this:
a := []byte{'a', 0xff, 0xaf, 'b', 0xbf}
fmt.Println(hex.Dump(a))
Output:
00000000 61 ff af 62 bf |a..b.|
There's your expected output a..b. with other (useful) data like the hex offset and hex representation of bytes.
To get a "better" picture of the output, try it with a little longer input:
a = []byte{'a', 0xff, 0xaf, 'b', 0xbf, 50: 0xff}
fmt.Println(hex.Dump(a))
00000000 61 ff af 62 bf 00 00 00 00 00 00 00 00 00 00 00 |a..b............|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 ff |...|
Try it on the Go Playground.

Related

Read JPEG magic number with FileChannel and ByteBuffer

I started digging into Java NIO API and as a first try I wanted to read a JPEG file magic number.
Here's the code
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.io.FileInputStream;
public class JpegMagicNumber {
public static void main(String[] args) throws Exception {
FileChannel file = new FileInputStream(args[0]).getChannel();
ByteBuffer buffer = ByteBuffer.allocate(6);
file.read(buffer);
buffer.flip();
System.out.println(Charset.defaultCharset().decode(buffer).toString());
file.close();
buffer.clear();
}
}
I expect to get the magic number chars back but all I get is garbage data into the terminal.
Am I doing something wrong ?

Short answer: There is nothing particularly defective with the code. JPEG just has 'garbage' up front.
Long answer: JPEG internally is made up of segments, one after the other. These segments start with a 0xFF byte, followed by an identifier byte, and then an optional payload/content.
Example start:
FF D8 FF E0 00 10 4A 46 49 46 00 01 01 00 00 01 00 01 00 00 FF E1
The image starts with the Start Of Image (SOI) segment, 0xFF 0xD8, which has no payload.
The next segment is 'application specific', 0xFF 0xE0. Two bytes follow with the length of the payload (these two bytes included!).
0x4A 0x46 0x49 0x46 : JFIF ← perhaps what you were looking for?
JPEG doesn't have a magic number in the sense you were perhaps looking for, like 'PK' for zip or '‰PNG' for PNG. (The closest thing is 0xFF 0xD8 0xFF for the SOI and the first byte of the next segment.)
So your code does correctly read the first six bytes of the file, decodes them into characters per your native platform, and prints them out, but a JPEG header just looks that way.

Decoding UDP Data with Node_pcap

I am trying to build quick nodejs script to look at some data in my network. Using node_pcap I manage to decode almost everything but the payload data that is end through the UDP protocol. This is my code (fairly basic but gives me the headers and payloads)
const interface = 'en0';
let filter = 'udp';
const pcap = require('pcap'),
pcap_session = pcap.createSession(interface, filter),
pcap_session.on('packet', function (raw_packet) {
let packet = pcap.decode.packet(raw_packet);
let data = packet.payload.payload.payload.data;
console.log(data.toString()); // not full data
});
When I try to print data using toString() method, it gives me most of the data but the beginning. I have something like this printed :
Li?��ddn-�*ys�{"Id":13350715,... I've cut the rest of the data which is the rest of the JSON.
But I am suspecting that the bit of data that I can't read contain some useful info such has how many packet, offset packet and so on..
I manage to get a part of it from the buffer from a payload :
00 00 00 01 52 8f 0b 4a 4d 3f cb de 08 00 01 00 00 00 04 a4 00 00 26 02 00 00 26 02 00 00 00 03 00 00 00 00 00 00 09 2d 00 00 00 00 f3 03 01 00 00 2a 00 02 00 79 00 05 73 01 d2
Although I have an idea of what kind of data it can be I have no idea of its structure.
Is there a way that I could decode this bit of the buffer ? I tried to look at some of the buffer method such as readInt32LE, readInt16LE but in vain. Is there some reading somewhere that can guide me through the process of decoding it?
[Edit] The more I looked into it, the more I suspect the data to be BSON and not JSON, that would explain why I can read some bit of it but not everything. Any chance someone manage to decode BSON from a packet ?

How does the library know which packet decoder to use?
It starts at Layer 2 of the TCP/IP stack by identifying which protocol is used https://github.com/node-pcap/node_pcap/blob/master/decode/pcap_packet.js#L29-L56
switch (this.link_type) {
case "LINKTYPE_ETHERNET":
this.payload = new EthernetPacket(this.emitter).decode(buf, 0);
break;
case "LINKTYPE_NULL":
this.payload = new NullPacket(this.emitter).decode(buf, 0);
break;
case "LINKTYPE_RAW":
this.payload = new Ipv4(this.emitter).decode(buf, 0);
break;
case "LINKTYPE_IEEE802_11_RADIO":
this.payload = new RadioPacket(this.emitter).decode(buf, 0);
break;
case "LINKTYPE_LINUX_SLL":
this.payload = new SLLPacket(this.emitter).decode(buf, 0);
break;
default:
console.log("node_pcap: PcapPacket.decode - Don't yet know how to decode link type " + this.link_type);
}
Then it goes upper and tries to decode the proper protocol based on the flags it finds in the header https://github.com/node-pcap/node_pcap/blob/master/decode/ipv4.js#L12-L17 in this particular case for the IPv4 protocol
IPFlags.prototype.decode = function (raw_flags) {
this.reserved = Boolean((raw_flags & 0x80) >> 7);
this.doNotFragment = Boolean((raw_flags & 0x40) > 0);
this.moreFragments = Boolean((raw_flags & 0x20) > 0);
return this;
};
Then in your case it would match with the udp protocol https://github.com/node-pcap/node_pcap/blob/master/decode/ip_protocols.js#L15
protocols[17] = require("./udp");
Hence, If you check https://github.com/node-pcap/node_pcap/blob/master/decode/udp.js#L32 the packet is automatically decoded and it exposes a toString method
UDP.prototype.toString = function () {
var ret = "UDP " + this.sport + "->" + this.dport + " len " + this.length;
if (this.sport === 53 || this.dport === 53) {
ret += (new DNS().decode(this.data, 0, this.data.length).toString());
}
return ret;
};
What does this mean for you?
In order to decode a udp(any) packet you just call the high level api pcap.decode.packet(raw_packet) and then call toString method to display the decoded body payload
pcap_session.on('packet', function (raw_packet) {
let packet = pcap.decode.packet(raw_packet);
console.log(packet.toString());
});

Converting hex values in buffer to integer

Background: I'm using node.js to get the volume setting from a device via serial connection. I need to obtain this data as an integer value.
I have the data in a buffer ('buf'), and am using readInt16BE() to convert to an int, as follows:
console.log( buf )
console.log( buf.readInt16BE(0) )
Which gives me the following output as I adjust the external device:
<Buffer 00 7e>
126
<Buffer 00 7f>
127
<Buffer 01 00>
256
<Buffer 01 01>
257
<Buffer 01 02>
258
Problem: All looks well until we reach 127, then we take a jump to 256. Maybe it's something to do with signed and unsigned integers - I don't know!
Unfortunately I have very limited documentation about the external device, I'm having to reverse engineer it! Is it possible it only sends a 7-bit value? Hopefully there is a way around this?
Regarding a solution - I must also be able to convert back from int to this format!
Question: How can I create a sequential range of integers when 7F seems to be the largest value my device sends, which causes a big jump in my integer scale?
Thanks :)

127 is the maximum value of a signed 8-bit integer. If the integer is overflowing into the next byte at 128 it would be safe to assume you are not being sent a 16 bit value, but rather 2 signed 8-bit values, and reading the value as a 16-bit integer would be incorrect.
I would start by using the first byte as a multiplier of 128 and add the second byte, this will give the series you are seeking.
buf = Buffer([0,127]) //<Buffer 00 7f>
buf.readInt8(0) * 128 + buf.readInt8(1)
>127
buf = Buffer([1,0]) //<Buffer 01 00>
buf.readInt8(0) * 128 + buf.readInt8(1)
>128
buf = Buffer([1,1]) //<Buffer 01 01>
buf.readInt8(0) * 128 + buf.readInt8(1)
>129
The way to get back is to divide by 128, round it down to the nearest integer for the first byte, and the second byte contains the remainder.
i = 129
buf = Buffer([Math.floor(i / 128), i % 128])
<Buffer 01 01>

Needed to treat the data as two signed 8-bit values. As per #forrestj the solution is to do:
valueInt = buf.readInt8(0) * 128 + buf.readInt8(1)
We can also convert the int value into the original format by doing the following:
byte1 = Math.floor(valueInt / 128)
byte2 = valueInt % 128

How to convert hex text file to jpeg

I have been given a text file containing hex data which I know forms a jpeg image. Below is an example of the format:
FF D8 FF E0 00 10 4A 46 49 46 00 01 02 00 00 64 00 64 00 00 FF E1 00 B8 45 78 69 00 00 4D
This is only a snippet but you get the idea.
Does anyone know how I could convert this back into the original jpeg?

To convert from a hex string to a byte you use the Convert.ToByte with a base 16 parameter.
To convert a byte array to a Bitmap you put it in a Stream and use the Bitmap(Stream) constructor:
using System.IO;
//..
string hexstring = File.ReadAllText(yourInputFile);
byte[] bytes = new byte[hexstring.Length / 2];
for (int i = 0; i < hexstring.Length; i += 2)
bytes[i / 2] = Convert.ToByte( hexstring.Substring(i, 2), 16);
using (MemoryStream ms = new MemoryStream(bytes))
{
Bitmap bmp = new Bitmap(ms);
// now you can do this:
bmp.Save(yourOutputfile, System.Drawing.Imaging.ImageFormat.Jpeg);
bmp.Dispose(); // dispose of the Bitmap when you are done with it!
// or that:
pictureBox1.Image = bmp; // Don't dispose as long as the PictureBox needs it!
}
I guess that there are more LINQish way but as long as it works..

Why do I get nonstandard responses from the TPM Through TBS?

I have a C++ program to do a basic TPM_GetCapabilities Through TPM Base Services and the Windows 7 SDK.
I've setup the program below
int _tmain(int argc, _TCHAR* argv[])
{
TBS_CONTEXT_PARAMS pContextParams;
TBS_HCONTEXT hContext;
TBS_RESULT rv;
pContextParams.version = TBS_CONTEXT_VERSION_ONE;
rv = Tbsi_Context_Create(&pContextParams, &hContext);
printf("\n1 RESULT : %x STATUS : %x", rv, hContext);
BYTE data[200] =
{0,0xc1, /* TPM_TAG_RQU_COMMAND */
0,0,0,18, /* blob length, bytes */
0,0,0,0x65, /* TPM_ORD_GetCapability */
0,0,0,0x06, /* TPM_CAP_VERSION */
0,0,0,0}; /* 0 bytes subcap */
BYTE buf[4000];
UINT32 len = 4000;
rv = Tbsip_Submit_Command(hContext,0,TBS_COMMAND_PRIORITY_NORMAL,data,18,buf,&len);
//CAPABILITY_RETURN* retVal = new CAPABILITY_RETURN(buf);
//printf("\n2 Response Tag: %x Output Bytes: %x",tag,);
printf("\n2 RESULT : %x STATUS : %x\n", rv, hContext);
printBuf(buf,len);
rv = Tbsip_Context_Close(hContext);
printf("\n3 RESULT : %x STATUS : %x", rv, hContext);
My Return Buffer looks like:
00:C4:00:00:00:12:00:00:00:00:00:00:00:04:01:01:00:00
According to this doc, Section 7.1 TPM_GetCapability I should get the following:
Looking at my output buffer, I am getting TPM_TAG_RSP_COMMAND,a value of 18 for my paramSize, 0 for my TPM_RESULT, 0x...04 for ordinal (Not sure what this is supposed to mean.) then 1,1,0,0 for my final bits. I'm at a loss as to how to decipher this.

The answer to your question:
You don't get a nonstandard response.
The response is perfectly fine, there is nothing nonstandard in it. It looks exactly like it is defined in the spec.
The response' content resp you get is also what is to be expected. A Standard conform TPM has to answer with 01 01 00 00 when asked for TPM_CAP_VERSION.
Why?
First of all: The line stating TPM_COMMAND_CODE ordinal is not part of the response.
It has no PARAM # and no PARAM SZ. It is only relevant for calculating the HMAC of the response.
So the response is the following:
00 C4 tag
00 00 00 12 paramSize
00 00 00 00 returnCode
00 00 00 04 respSize
01 01 00 00 resp
You asked for the capability TPM_CAP_VERSION. Here is what the spec says:
Value: 0x00000006
Capability Name: TPM_CAP_VERSION
Sub cap: Ignored
TPM_STRUCT_VER structure.
The major and minor version MUST indicate 1.1.
The firmware revision MUST indicate 0.0.
The use of this value is deprecated, new software SHOULD
use TPM_CAP_VERSION_VAL to obtain version and revision information
regarding the TPM.
So when you decode resp, which is a TPM_STRUCT_VER, you get the following:
typedef struct tdTPM_STRUCT_VER {
BYTE major; // ==> 1
BYTE minor; // ==> 1
BYTE revMajor; // ==> 0
BYTE revMinor; // ==> 0
} TPM_STRUCT_VER;
So 1.1 and 0.0, exactly according to specification.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

bytes to string conversion with invalid characters - string

Related

Read JPEG magic number with FileChannel and ByteBuffer

Decoding UDP Data with Node_pcap

Converting hex values in buffer to integer

How to convert hex text file to jpeg

Why do I get nonstandard responses from the TPM Through TBS?

Categories

Resources