Why does Node.js shorten my buffer when writing to file?

Why does Node.js shorten my buffer when writing to file? - node.js

I'm experiencing some weird behavior in NodeJS v14.16.1 on Windows 10.
When I append 4 bytes to a file, it's only appending 2 bytes:
const fs = require('fs');
let infoFile = fs.openSync('field.info', 'a');
fs.writeSync(infoFile, Buffer.from([0x06, 0x00, 0x04, 0x00]))
fs.closeSync(infoFile);
Output for field.info:
06 04
Expected output:
06 00 04 00
Buffer.from([0x06, 0x00, 0x04, 0x00, 0x00]) produces the expected output:
06 00 04 00 00
The .toString() on the Buffer's produce strings of the correct length, so I can only imagine it's an issue in fs.writeSync. I can't find anything in the documentation that would indicate the issue...
Edit
fs.writeSync reports 4 bytes written but only 2 exist in the file.

This is most certainly due to the editor you opening the file with, interpreting it as UTF-16. Thus, you see only two characters, because each character is representing two bytes. Showing the file properties in windows explorer clearly shows the correct file size.

Related

bytes to string conversion with invalid characters

I need to parse UDP packets which can be invalid or contain some errors. I would like to replace invalid characters with . after a bytes to string conversion, in order to display the content of the packets.
How can I do it? This is my code:
func main() {
a := []byte{'a', 0xff, 0xaf, 'b', 0xbf}
s := string(a)
s = strings.Replace(s, string(0xFFFD), ".", 0)
fmt.Println("s: ", s) // I would like to display "a..b."
for _, r := range s {
fmt.Println("r: ", r)
}
rs := []rune(s)
fmt.Println("rs: ", rs)
}

The root problem with your approach is that the result of type converting []byte to string does not have any U+FFFDs in it: this type-conversion only copies bytes from the source to the destination, verbatim.
Just as byte slices, strings in Go are not obliged to contain UTF-8-encoded text; they can contain any data, including opaque binary data which has nothing to do with text.
But some operations on strings—namely type-converting them to []rune and iterating over them using range—do interpret strings as UTF-8-encoded text.
That is precisely where you got tripped: your range debugging loop attempted to interpret the string, and each time another attempt at decoding a properly encoded code point failed, range yielded a replacement character, U+FFFD.
To reiterate, the string obtained by the type-conversion does not contain the characters you wanted to get replaced by your regexp.
As to how to actually make a valid UTF-8-encoded string out of your data, you might employ a two-step process:
Type-convert your byte slice to a string—as you already do.
Use any means of interpreting a string as UTF-8—replacing U+FFFD which will dynamically appear during this process—as you're iterating.
Something like this:
var sb strings.Builder
for _, c := range string(b) {
if c == '\uFFFD' {
sb.WriteByte('.')
} else {
sb.WriteRune(c)
}
}
return sb.String()
A note on performance: since type-converting a []byte to string copies memory—because strings are immutable while slices are not—the first step with type-conversion might be a waste of resources for code dealing with large chunks of data and/or working in tight processing loops.
In this case, it may be worth using the DecodeRune function of the encoding/utf8 package which works on byte slices.
An example from its docs can be easily adapted to work with the loop above.
See also: Remove invalid UTF-8 characters from a string

#kostix answer is correct and explains very clearly the issue with scanning unicode runes from a string.
Just adding the following remark : if your intention is to view characters only in the ASCII range (printable characters < 127) and you don't really care about other unicode code points, you can be more blunt :
// create a byte slice with the same byte length as s
var bs = make([]byte, len(s))
// scan s byte by byte :
for i := 0; i < len(s); i++ {
switch {
case 32 <= s[i] && s[i] <= 126:
bs[i] = s[i]
// depending on your needs, you may also keep characters in the 0..31 range,
// like 'tab' (9), 'linefeed' (10) or 'carriage return' (13) :
// case s[i] == 9, s[i] == 10, s[i] == 13:
// bs[i] = s[i]
default:
bs[i] = '.'
}
}
fmt.Printf("rs: %s\n", bs)
playground
This function will give you something close to the "text" part of hexdump -C.

You may want to use strings.ToValidUTF8() for this:
ToValidUTF8 returns a copy of the string s with each run of invalid UTF-8 byte sequences replaced by the replacement string, which may be empty.
It "seemingly" does exactly what you need. Testing it:
a := []byte{'a', 0xff, 0xaf, 'b', 0xbf}
s := strings.ToValidUTF8(string(a), ".")
fmt.Println(s)
Output (try it on the Go Playground):
a.b.
I wrote "seemingly" because as you can see, there's a single dot between a and b: because there may be 2 bytes, but a single invalid sequence.
Note that you may avoid the []byte => string conversion, because there's a bytes.ToValidUTF8() equivalent that operates on and returns a []byte:
a := []byte{'a', 0xff, 0xaf, 'b', 0xbf}
a = bytes.ToValidUTF8(a, []byte{'.'})
fmt.Println(string(a))
Output will be the same. Try this one on the Go Playground.
If it bothers you that multiple (invalid sequence) bytes may be shrinked into a single dot, read on.
Also note that to inspect arbitrary byte slices that may or may not contain texts, you may simply use hex.Dump() which generates an output like this:
a := []byte{'a', 0xff, 0xaf, 'b', 0xbf}
fmt.Println(hex.Dump(a))
Output:
00000000 61 ff af 62 bf |a..b.|
There's your expected output a..b. with other (useful) data like the hex offset and hex representation of bytes.
To get a "better" picture of the output, try it with a little longer input:
a = []byte{'a', 0xff, 0xaf, 'b', 0xbf, 50: 0xff}
fmt.Println(hex.Dump(a))
00000000 61 ff af 62 bf 00 00 00 00 00 00 00 00 00 00 00 |a..b............|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 ff |...|
Try it on the Go Playground.

Convert binary <something> of hex bytes to list of decimal values

I have the following binary (something):
test = b'40000000111E0C09'
Every two digits is a hexadecimal number I want out, so the following is clearer than the above:
test = b'40 00 00 00 11 1E 0C 09'
0x40 = 64 in decimal
0x00 = 0 in decimal
0x11 = 17 in decimal
0x1E = 30 in decimal
You get the idea.
How can I use struct.unpack(fmt, binary) to get the values out? I ask about struct.unpack() because it gets more complicated... I have a little-endian 4-byte integer in there... The last four bytes were:
b'11 1E 0C 09'
What is the above in decimal, assuming it's little-endian?
Thanks a lot! This is actually from a CAN bus, which I'm accessing as a serial port (frustrating stuff..)

Assuming you have string b'40000000111E0C09', you can use codecs.decode() with hex parameter to decode it to bytes form:
import struct
from codecs import decode
test = b'40000000111E0C09'
test_decoded = decode(test, 'hex') # from hex string to bytes
for i in test_decoded:
print('{:#04x} {}'.format(i, i))
Prints:
0x40 64
0x00 0
0x00 0
0x00 0
0x11 17
0x1e 30
0x0c 12
0x09 9
To get last four bytes as UINT32 (little-endian), you can do then (struct docs)
print( struct.unpack('<I', test_decoded[-4:]) )
Prints:
(151789073,)

Read JPEG magic number with FileChannel and ByteBuffer

I started digging into Java NIO API and as a first try I wanted to read a JPEG file magic number.
Here's the code
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.io.FileInputStream;
public class JpegMagicNumber {
public static void main(String[] args) throws Exception {
FileChannel file = new FileInputStream(args[0]).getChannel();
ByteBuffer buffer = ByteBuffer.allocate(6);
file.read(buffer);
buffer.flip();
System.out.println(Charset.defaultCharset().decode(buffer).toString());
file.close();
buffer.clear();
}
}
I expect to get the magic number chars back but all I get is garbage data into the terminal.
Am I doing something wrong ?

Short answer: There is nothing particularly defective with the code. JPEG just has 'garbage' up front.
Long answer: JPEG internally is made up of segments, one after the other. These segments start with a 0xFF byte, followed by an identifier byte, and then an optional payload/content.
Example start:
FF D8 FF E0 00 10 4A 46 49 46 00 01 01 00 00 01 00 01 00 00 FF E1
The image starts with the Start Of Image (SOI) segment, 0xFF 0xD8, which has no payload.
The next segment is 'application specific', 0xFF 0xE0. Two bytes follow with the length of the payload (these two bytes included!).
0x4A 0x46 0x49 0x46 : JFIF ← perhaps what you were looking for?
JPEG doesn't have a magic number in the sense you were perhaps looking for, like 'PK' for zip or '‰PNG' for PNG. (The closest thing is 0xFF 0xD8 0xFF for the SOI and the first byte of the next segment.)
So your code does correctly read the first six bytes of the file, decodes them into characters per your native platform, and prints them out, but a JPEG header just looks that way.

Converting hex values in buffer to integer

Background: I'm using node.js to get the volume setting from a device via serial connection. I need to obtain this data as an integer value.
I have the data in a buffer ('buf'), and am using readInt16BE() to convert to an int, as follows:
console.log( buf )
console.log( buf.readInt16BE(0) )
Which gives me the following output as I adjust the external device:
<Buffer 00 7e>
126
<Buffer 00 7f>
127
<Buffer 01 00>
256
<Buffer 01 01>
257
<Buffer 01 02>
258
Problem: All looks well until we reach 127, then we take a jump to 256. Maybe it's something to do with signed and unsigned integers - I don't know!
Unfortunately I have very limited documentation about the external device, I'm having to reverse engineer it! Is it possible it only sends a 7-bit value? Hopefully there is a way around this?
Regarding a solution - I must also be able to convert back from int to this format!
Question: How can I create a sequential range of integers when 7F seems to be the largest value my device sends, which causes a big jump in my integer scale?
Thanks :)

127 is the maximum value of a signed 8-bit integer. If the integer is overflowing into the next byte at 128 it would be safe to assume you are not being sent a 16 bit value, but rather 2 signed 8-bit values, and reading the value as a 16-bit integer would be incorrect.
I would start by using the first byte as a multiplier of 128 and add the second byte, this will give the series you are seeking.
buf = Buffer([0,127]) //<Buffer 00 7f>
buf.readInt8(0) * 128 + buf.readInt8(1)
>127
buf = Buffer([1,0]) //<Buffer 01 00>
buf.readInt8(0) * 128 + buf.readInt8(1)
>128
buf = Buffer([1,1]) //<Buffer 01 01>
buf.readInt8(0) * 128 + buf.readInt8(1)
>129
The way to get back is to divide by 128, round it down to the nearest integer for the first byte, and the second byte contains the remainder.
i = 129
buf = Buffer([Math.floor(i / 128), i % 128])
<Buffer 01 01>

Needed to treat the data as two signed 8-bit values. As per #forrestj the solution is to do:
valueInt = buf.readInt8(0) * 128 + buf.readInt8(1)
We can also convert the int value into the original format by doing the following:
byte1 = Math.floor(valueInt / 128)
byte2 = valueInt % 128

Why do I get nonstandard responses from the TPM Through TBS?

I have a C++ program to do a basic TPM_GetCapabilities Through TPM Base Services and the Windows 7 SDK.
I've setup the program below
int _tmain(int argc, _TCHAR* argv[])
{
TBS_CONTEXT_PARAMS pContextParams;
TBS_HCONTEXT hContext;
TBS_RESULT rv;
pContextParams.version = TBS_CONTEXT_VERSION_ONE;
rv = Tbsi_Context_Create(&pContextParams, &hContext);
printf("\n1 RESULT : %x STATUS : %x", rv, hContext);
BYTE data[200] =
{0,0xc1, /* TPM_TAG_RQU_COMMAND */
0,0,0,18, /* blob length, bytes */
0,0,0,0x65, /* TPM_ORD_GetCapability */
0,0,0,0x06, /* TPM_CAP_VERSION */
0,0,0,0}; /* 0 bytes subcap */
BYTE buf[4000];
UINT32 len = 4000;
rv = Tbsip_Submit_Command(hContext,0,TBS_COMMAND_PRIORITY_NORMAL,data,18,buf,&len);
//CAPABILITY_RETURN* retVal = new CAPABILITY_RETURN(buf);
//printf("\n2 Response Tag: %x Output Bytes: %x",tag,);
printf("\n2 RESULT : %x STATUS : %x\n", rv, hContext);
printBuf(buf,len);
rv = Tbsip_Context_Close(hContext);
printf("\n3 RESULT : %x STATUS : %x", rv, hContext);
My Return Buffer looks like:
00:C4:00:00:00:12:00:00:00:00:00:00:00:04:01:01:00:00
According to this doc, Section 7.1 TPM_GetCapability I should get the following:
Looking at my output buffer, I am getting TPM_TAG_RSP_COMMAND,a value of 18 for my paramSize, 0 for my TPM_RESULT, 0x...04 for ordinal (Not sure what this is supposed to mean.) then 1,1,0,0 for my final bits. I'm at a loss as to how to decipher this.

The answer to your question:
You don't get a nonstandard response.
The response is perfectly fine, there is nothing nonstandard in it. It looks exactly like it is defined in the spec.
The response' content resp you get is also what is to be expected. A Standard conform TPM has to answer with 01 01 00 00 when asked for TPM_CAP_VERSION.
Why?
First of all: The line stating TPM_COMMAND_CODE ordinal is not part of the response.
It has no PARAM # and no PARAM SZ. It is only relevant for calculating the HMAC of the response.
So the response is the following:
00 C4 tag
00 00 00 12 paramSize
00 00 00 00 returnCode
00 00 00 04 respSize
01 01 00 00 resp
You asked for the capability TPM_CAP_VERSION. Here is what the spec says:
Value: 0x00000006
Capability Name: TPM_CAP_VERSION
Sub cap: Ignored
TPM_STRUCT_VER structure.
The major and minor version MUST indicate 1.1.
The firmware revision MUST indicate 0.0.
The use of this value is deprecated, new software SHOULD
use TPM_CAP_VERSION_VAL to obtain version and revision information
regarding the TPM.
So when you decode resp, which is a TPM_STRUCT_VER, you get the following:
typedef struct tdTPM_STRUCT_VER {
BYTE major; // ==> 1
BYTE minor; // ==> 1
BYTE revMajor; // ==> 0
BYTE revMinor; // ==> 0
} TPM_STRUCT_VER;
So 1.1 and 0.0, exactly according to specification.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why does Node.js shorten my buffer when writing to file? - node.js

This is most certainly due to the editor you opening the file with, interpreting it as UTF-16. Thus, you see only two characters, because each character is representing two bytes. Showing the file properties in windows explorer clearly shows the correct file size.

Related

bytes to string conversion with invalid characters

Convert binary <something> of hex bytes to list of decimal values

Read JPEG magic number with FileChannel and ByteBuffer

Converting hex values in buffer to integer

Why do I get nonstandard responses from the TPM Through TBS?

Categories

Resources