MessagePack slower than native node.js JSON - node.js

I just installed node-msgpack and tested it against native JSON. MessagePack is much slower. Anyone know why?
Using the authors' own benchmark...
node ~/node_modules/msgpack/bench.js
msgpack pack: 4165 ms
msgpack unpack: 1589 ms
json pack: 1352 ms
json unpack: 761 ms

I'll assume you're talking about https://github.com/pgriess/node-msgpack.
Just looking at the source, I'm not sure how it could be. For example in src/msgpack.cc they have the following:
Buffer *bp = Buffer::New(sb._sbuf.size);
memcpy(Buffer::Data(bp), sb._sbuf.data, sb._sbuf.size);
In node terms, they are allocating and filling a new SlowBuffer for every request. You can benchmark the allocation part by doing following:
var msgpack = require('msgpack');
var SB = require('buffer').SlowBuffer;
var tmpl = {'abcdef' : 1, 'qqq' : 13, '19' : [1, 2, 3, 4]};
console.time('SlowBuffer');
for (var i = 0; i < 1e6; i++)
// 20 is the resulting size of their "DATA_TEMPLATE"
new SB(20);
console.timeEnd('SlowBuffer');
console.time('msgpack.pack');
for (var i = 0; i < 1e6; i++)
msgpack.pack(tmpl);
console.timeEnd('msgpack.pack');
console.time('stringify');
for (var i = 0; i < 1e6; i++)
JSON.stringify(tmpl);
console.timeEnd('stringify');
// result - SlowBuffer: 915ms
// result - msgpack.pack: 5144ms
// result - stringify: 1524ms
So by just allocating memory for the message they've already spent 60% of stringify time. There's just one reason why it's so much slower.
Also take into account that JSON.stringify has gotten a lot of love from Google. It's highly optimized and would be difficult to beat.

I decided to benchmark all popular Node.js modules for binary encoding Msgpack, along with the PSON (protocol JSON) encoding library, versus JSON, and the results are as follows:
JSON fastest for encoding unless it includes a binary array
msgpack second fastest normally and fastest when including a binary array
msgpack-js - consistently second to msgpack
pson - consistently slower than msgpack-js
msgpack5 - dog slow always
I have published the benchmarking repository and detailed results at https://github.com/mattheworiordan/nodejs-encoding-benchmarks

Related

Contents of large file getting corrupted while reading records sequentially

I have a file, with around 85 million json records. The file size is around 110 Gb. I want to read from this file in batches of 1 million (in sequence). I am trying to read from this file line by line using a scanner, and appending these 1 million records. Here is the code gist of what I am doing:
var rawBatch []string
batchSize := 1000000
file, err := os.Open(filePath)
if err != nil {
// error handling
}
scanner = bufio.NewScanner(file)
for scanner.Scan() {
rec := string(scanner.Bytes())
rawBatch = append(rawBatch, string(recBytes))
if len(rawBatch) == batchSize {
for i := 0; i < batchSize ; i++ {
var tRec parsers.TRecord
err := json.Unmarshal(rawBatch[i], &tRec)
if err != nil {
// Error thrown here
}
}
//process
rawBatch = nil
}
}
file.Close()
Sample of correct record:
type TRecord struct {
Key1 string `json:"key1"`
key2 string `json:"key2"`
}
{"key1":"15","key2":"21"}
The issue I am facing here is that while reading these records, some of these records are getting corrupted, example: changing a colon to semi colon, or double quote to #. Getting this error:
Unable to load Record: Unable to load record in:
{"key1":#15","key2":"21"}
invalid character '#' looking for beginning of value
Some observations:
Once we start reading, the contents of the file itself get corrupted.
For every batch of 1 million, I saw 1 (or max 2) records getting corrupted. Out of 84 million records, a total of 95 records were corrupted.
My code is working for for a file with size around 42Gb (23 million records). With a higher sized data file, my code is behaving erroneously.
':' are changing to ';'. Double quotes are changing to '#'. Space is changing to '!'. All these combinations, in their binary representations, have a single bit difference. Any chance that we might have some accidental bit manipulation?
Any ideas on why this is happening? And how can I fix it?
Details:
Go version used: go1.15.6 darwin/amd64
Hardware details: Debian GNU/Linux 9.12 (stretch), 224Gb RAM, 896Gb Hard disk
As suggested by #icza in the comments,
That occasional, very rare 1 bit change suggests hardware failure (memory, processor cache, hard disk). I do recommend to test it on another computer.
I tested my code on some other machines. The code is running perfectly fine now. Looks like this occasional rare bit change, due to some hard failure, was causing this issue.

How to display unpacked UDP data properly

I'm trying to write a small code which displays data received from a game (F1 2019) over UDP.
The F1 2019 game send out data via the UDP. I have been able to receive the packets and have separated the header and data and now unpacked the data according to the structure in which the data is sent using the rawutil module.
The struct in which the packets are sent can be found here:
https://forums.codemasters.com/topic/38920-f1-2019-udp-specification/
I'm only interested in the telemetry packet.
import socket
import cdp
import struct
import array
import rawutil
from pprint import pprint
# settings
ip = '0.0.0.0'
port = 20777
# listen for packets
listen_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
listen_socket.bind((ip, port))
while True:
# Receiving data
data, address = listen_socket.recvfrom(65536)
header = data[:20]
telemetry = data[20:]
# decode the header
packetFormat, = rawutil.unpack('<H', header[:2])
gameMajorVersion, = rawutil.unpack('<B', header[2:3])
gameMinorVersion, = rawutil.unpack('<B', header[3:4])
packetVersion, = rawutil.unpack('<B', header[4:5])
packetId, = rawutil.unpack('<B', header[5:6])
sessionUID, = rawutil.unpack('<Q', header[6:14])
sessionTime, = rawutil.unpack('<f', header[14:18])
frameIdentifier, = rawutil.unpack('<B', header[18:19])
playerCarIndex, = rawutil.unpack('<B', header[19:20])
# print all info (just for now)
## print('Packet Format : ',packetFormat)
## print('Game Major Version : ',gameMajorVersion)
## print('Game Minor Version : ',gameMinorVersion)
## print('Packet Version : ',packetVersion)
## print('Packet ID : ', packetId)
## print('Unique Session ID : ',sessionUID)
## print('Session Time : ',sessionTime)
## print('Frame Number : ',frameIdentifier)
## print('Player Car Index : ',playerCarIndex)
## print('\n\n')
#start getting the packet data for each packet starting with telemetry data
if (packetId == 6):
speed, = rawutil.unpack('<H' , telemetry[2:4])
throttle, = rawutil.unpack('<f' , telemetry[4:8])
steer, = rawutil.unpack('<f' , telemetry[8:12])
brake, = rawutil.unpack('<f' , telemetry[12:16])
gear, = rawutil.unpack('<b' , telemetry[17:18])
rpm, = rawutil.unpack('<H' , telemetry[18:20])
print (speed)
The UDP specification states that the speed of the car is sent in km/h. However when I unpack the packet, the speed is a multiple of 256, so 10 km/h is 2560 for example.
I want to know if I'm unpacking the data in the wrong way? or is it something else that is causing this.
The problem is also with the steering for example. the spec says it should be between -1.0 and 1.0 but the actual values are either very large or very small.
screengrab here: https://imgur.com/a/PHgdNrx
Appreciate any help with this.
Thanks.
I recommend you don't use the unpack method, as with big structures (e.g. MotionPacket has 1343 bytes) your code will immediately get very messy.
However, if you desperately want to use it, call unpack only once, such as:
fmt = "<HBBBBQfBB"
size = struct.calcsize(fmt)
arr = struct.unpack("<HBBBBQfBB", header[:size])
Alternatively, have a look at ctypes library, especially ctypes.LittleEndianStructure where you can set the _fields_ attribute to a sequence of ctypes (such as uint8 etc, without having to translate them to relevant symbols as with unpack).
https://docs.python.org/3.8/library/ctypes.html#ctypes.LittleEndianStructure
Alternatively alternatively, have a look at namedtuples.
Alternatively alternatively alternatively, there's a bunch of python binary IO libs, such as binio where you can declare a structure of ctypes, as this is a thin wrapper anyway.
To fully answer your question, the structure seems to be:
struct PacketHeader
{
uint16 m_packetFormat; // 2019
uint8 m_gameMajorVersion; // Game major version - "X.00"
uint8 m_gameMinorVersion; // Game minor version - "1.XX"
uint8 m_packetVersion; // Version of this packet type, all start from 1
uint8 m_packetId; // Identifier for the packet type, see below
uint64 m_sessionUID; // Unique identifier for the session
float m_sessionTime; // Session timestamp
uint m_frameIdentifier; // Identifier for the frame the data was retrieved on
uint8 m_playerCarIndex; // Index of player's car in the array
};
Meaning that the sequence of symbols for unpack should be: <H4BQfLB, because uint in ctypes is actually uint32, where you had uint8.
I also replaced BBBB with 4B. Hope this helps.
Haider I wanted to read car speed from Formula 1 2019 too. I found your question, from your question I had some tips and solved my issue. And now i think i must pay back. The reason you get speed multiplied with 256 is you start from wrong byte and this data is formatted little endian. The code you shared starts from 22nd byte to read speed, if you start it from 23rd byte you will get correct speed data.

GML room_goto() Error, Expecting Number

I'm trying to make a game that chooses a room from a pool of rooms using GML, but I get the following error:
FATAL ERROR in action number 3 of Create Event for object obj_control:
room_goto argument 1 incorrect type (5) expecting a Number (YYGI32)
at gml_Object_obj_control_CreateEvent_3 (line 20) - room_goto(returnRoom)
pool = ds_list_create()
ds_list_insert(pool, 0, rm_roomOne)
ds_list_insert(pool, 1, rm_roomTwo)
ds_list_insert(pool, 2, rm_roomThree)
ds_list_insert(pool, 3, rm_roomFour)
var returnIndex;
var returnRoom;
returnIndex = irandom(ds_list_size(pool))
returnRoom = ds_list_find_value(pool, returnIndex)
if (ds_list_size(pool) == 0){
room_goto(rm_menu_screen)
}else{
room_goto(returnRoom)
}
I don't get the error message saying it's expecting a number.
This is weird indeed... I think this should actually work.. But I have no GM around to test :(
For now you can also solve this using "choose". This saves a list (and saves memory, because you're not cleaning up the list by deleting it - thus it resides in memory)
room_goto(choose(rm_roomOne, rm_roomTwo, rm_roomThree, rm_roomFour));
choose basically does exactly what you're looking for. Might not be the best way to go if you're re-using the group of items though.

Lua string from file

I'm trying to make a system which backs up and restores points for a gameserver, so it can safely restart without loosing anything.
I have made a script to do just this and the actual backing up part works fine, but the restore part does not.
This is the script that runs if 'Backup(read)' is used (Backup(write) works perfectly as it is designed to do):
if (source and read) then
System.LogAlways("[System] Restoring serverdata from file 'backup.CHK'");
for line in source:lines() do
Backup = {};
Backup.Date = (Date or line:match("File Last Modified: (.-)"));
Backup.Time = (Time or line:match("time: (.-)"));
US = tonumber((US or line:match("us: (.-)")));
NK = tonumber((NK or line:match("nk: (.-)")));
local params = {class = "Player";
position = {x = 1, y = 1, z = -1000};
Respawn = { bRespawn = 0; nTimer =0; bUnique = 1; };
bUsable = 0;
orientation = {0, 90, 135};
name = "BackupEntity"; };
local ent = System.SpawnEntity(params);
g_gameRules.game:SetTeam(1, ent.id);
g_gameRules.game:SetSynchedEntityValue(playerId, 100, (NK/3));
g_gameRules.game:SetTeam(2, ent.id);
g_gameRules.game:SetSynchedEntityValue(playerId, 100, (US/3));
System.RemoveEntity(params);
end
source:close();
return;
end
I'm not sure what I'm doing wrong,and most sites that I have looked at don't help that much. The problem is that it's not reading any values from the file.
Any help will be appreciated :).
Edit:
The reason that we have to divide the score by 3 is because the server multiplies all scores by 3. If we were not to divide it by 3, then the score will always be 3 times larger on each restore.
Example contents of the backup.CHK file:
The server is dependent on this file, and writes to it every hour. Please do not edit.
File Last Modified: 11/07/2013
This file was generated by the servers' autobackup system.
--------------------------
time: 22:51
us: 453445
nk: 454567
A couple of ideas of what might be causing the problem:
Use of (.-) lazy matching which matches the shortest pattern possible -- this can include an empty string. Usually, you want to make the pattern as specific as possible while still matching the required possible inputs. eg. It looks like (%d+) for us and nk is an appropriate fit.
The for line in source:lines() do reads one line at a time. That necessarily means not all the variables are going to be set inside the loop. Yet everything starting at local params and down uses those variables as if they were. It seems to me that section of code shouldn't even be in the loop.
Lastly, have you considered saving the Backup file as just another lua file? Doing so means you can let lua do the heavy lifting for you and you won't have to bother parsing it yourself. That also minimizes the risk for error.

Performance Impact with FileOutputStream using OpenCSV

We are using OpenCSV
(http://opencsv.sourceforge.net/apidocs/au/com/bytecode/opencsv/CSVWriter.html)
to write a report from a file with xml content.
There are two ways to go about this ->
i) Write using FileOutputStream
FileOutputStream fos = new FileOutputStream(file);
OutputStreamWriter osr= new OutputStreamWriter(fos);
writer = new CSVWriter(osr);
ii) Write using BufferedWriter
BufferedWriter out = new BufferedWriter(new FileWriter(file));
writer = new CSVWriter(out);
Does anybody know how the performance of the writing of this report gets affected by choosing one option over another ?
To my understanding OpenCSV does not care as long as it gets a stream that it can use.
The delta (difference) in performance would be the step before it, where the outputstream is created from the file.
What is the performance impact of using OutputStreamWriter versus BufferedWriter ?
After running some benchmarks with Google Caliper, it appears that the BufferedWriter option is the fastest (but there's really not much of a difference, so I'd just use the option that you're comfortable with).
How to interpret results:
The FileOutputStreamWriter scenario corresponds with option i
The BufferedWriter scenario corresponds with option ii
The FileWriter scenario is one I added which just uses a plain old FileWriter.
Each benchmark was run 3 times: writing 1000, 10,000, and 100,000 rows.
The tests were run on Linux Mint, i5-2500k (1.6GHz) CPU, 8GB RAM, with Oracle JDK7 (writing to a SATA green HDD). Results would vary with a different setup, but this should be good for comparison purposes.
rows benchmark ms linear runtime
1000 FileOutputStreamWriter 6.10 =
1000 BufferedWriter 5.89 =
1000 FileWriter 5.96 =
10000 FileOutputStreamWriter 50.55 ==
10000 BufferedWriter 50.71 ==
10000 FileWriter 51.64 ==
100000 FileOutputStreamWriter 525.13 =============================
100000 BufferedWriter 505.05 ============================
100000 FileWriter 535.20 ==============================
FYI opencsv wraps the Writer you give it in a PrintWriter.

Resources