How do I effectively send a large packet / combine smaller packets? - node.js

I have a larger buffer I'm trying to send as a packet. Nodejs splits the buffer into smaller (65k packets). Once they are received by the client, how can I ensure the packets go together and effectively recombine them into a buffer?
Pretty much using this as a test:
// tcp socket
var buf = Buffer.alloc(265000);
socket.write(buf);
Then on client side I need to combine the 65k packets somehow together back into a buffer.
Thanks

TCP is free to break data up on the wire into packets of any size. The size can be different based on different implementations or physical transports. You cannot know exactly how this will happen and should not depend upon exactly how it is implemented. It can even vary depending upon which route your data takes.
Further, the .on('data', ...) event just gives you whatever data has arrived so far. While the order of the packets is guaranteed, there is no guarantee that if you write a certain set of bytes that they will all arrive in the same data event. They can be broken into smaller pieces and may arrive in smaller pieces. This is what happens at the lower level of TCP when you have no real protocol on top of TCP.
So, if you're sending a chunk of data over TCP, you have to invent your own protocol to know when you've got an entire set of data. There are a variety of different schemes for doing this.
Delimiter character. Some sort of delimiter character that won't occur in the actual data and indicates the end of a set of data. You read and parse the data until you get a delimiter character and then you know you have a complete set of data you can process. The HTTP protocol uses a newline character as a delimiter. Sometimes a zero byte is used as a delimiter.
Send length first. For binary data, the length of the data is often sent first and then the recipient knows how many bytes of data they're reading until they have a whole set.
Existing protocols. Something like the webSocket protocol lets you send messages of any size and it will automatically wrap them into packets that contain information about length so that they can be recombined for you automatically into the original set of data without you have to do this yourself. There are thousands of other protocols, one of which may be a perfect match for your needs and you can just use an existing implementation without having to write your own.
One you have some mechanism of knowing when you've received a complete set of data, you then set up your data event handler to read data, collect it into a buffer and watch for the end of the data (using whichever mechanism you have selected). When you see the end of a set, you separate that out from any other data that may have arrived after it and then process it.
So, let's say you were using a zero byte as your delimiter and you've made sure that a zero cannot and does not occur in your real data. Then, you'd set up a data handler like this:
let accumulatedData = Buffer.alloc(0);
socket.on('data', data => {
// check for delimiter character
let offset = data.indexOf(0);
if (offset !== -1) {
// get the whole message into one Buffer
let msg = Buffer.concat(accumulatedData, data.slice(0, offset));
// put rest of data into the accumulatedData buffer as part of next piece of data
// skip past the delimiter
accumulatedData = data.slice(offset + 1);
// emit that we now have a whole msg
socket.emit('_msg', msg);
} else {
// no delimiter yet, just accumulate the data
accumulatedData = Buffer.concat(accumulatedData, data);
}
});
// if any accumulated data still here at end of socket
// notify about it
// this is optional as it may be a partial piece of data (no delimiter)
socket.on('end', () => {
if (accumulatedData.length) {
socket.emit('_msg', accumulatedData);
}
});
// this is my own event which is emitted when a whole message is available
// for processing
socket.on('_msg', msg => {
// code here to process whole msg
});
Note: This implementation removes the delimiter from the end of the msg

Nodejs is not splitting up the data; TCP/IP is. The maximum amount of data allowed in an IP payload is 64Kb. This is why your packets are being split up (fragmented).
This also means that TCP/IP will piece together the data at the receiving end. This is why you don't have to reassemble REST requests or websites. This is all handled by the lower network layers.
You may want to look at this example. You can edit the createServer() function to send more data like so:
var server = net.createServer(function(socket) {
let buf = Buffer.alloc(265000);
for (var i = 0; i < 264900; i++) {
buf[i] = 'E';
}
buf[264900] = '\r'; // newline
buf[264901] = '\n';
buf[264902] = 0; // string terminator
socket.write(buf);
socket.pipe(socket);
});
The above (along with the other code from the gist) will respond to any request with a string containing 264900 'E's and a newline.
Now, you can use netcat (if on linux) to receive your request:
$ netcat 127.0.0.1 1337
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE ... etc
The buffer can contain whatever and it will all still be transferred. A string is just easy to demonstrate.
In conclusion: Let the network do the work. You will need to read the incoming buffer on the client and save it to its own local buffer but that's pretty much it.
Further reading:
https://nodejs.org/api/net.html#net_socket_write_data_encoding_callback
https://techterms.com/definition/packet

Related

read expressjs request body in 4k chunks

I've got an ExpressJS endpoint that needs to take a request body and divide it into sequential 4k byte chunks or pages (aligned every 4k bytes) and do something different with each chunk. Since this is binary data extra care would need to be taken so there isn't any interpretation as unicode or anything.
My first thought was something like this:
req.on("data", chunk => {
// do something here
})
But the "do something here" would have to be, "take whatever size of data is in the chunk, process it in 4k increments, and retain whatever is left < 4k to append the next chunk to. When I ran a test I saw that the first chunk received was just under 32k bytes, so indeed with every request I would have the overhead of shuffling bytes around so that I got 4k byte aligned chunks.
My second thought was to do something like this:
req.on("readable", () => {
var chunk
while (null !== (chunk = req.read(4096))) {
// do something here
}
})
In this case the "do something here" would be similar, but since I would only be reading 4k at a time I should theoretically have a bit less work to do. In a test that I ran, I found that every read did return exactly 4k until the last read, which returned less than 4k. It would be ideal if it always happened like this that I never had to deal with storing part of a 4k page to then append the remainder of the page, but I don't know of any guarantees that read() will always nicely provide this amount.
I'm not sure if I'm trying to re-invent the wheel here either, if there is a mechanism already in place to (asychronously) always pull off the next 4k without having to reassemble partial chunks.
What's the best practice for doing this?
You can accumulate the data into a higher scoped buffer variable as data events arrive and when you have 4k or more, break out a 4k chunk and process it:
const dataSizeToProcess = 1024 * 4; // 4k
let accumulatedData;
req.on('data', chunk => {
// accumulate this chunk of data
if (!accumulatedData) {
// first chunk of data
accumulatedData= chunk;
} else {
// add this chunk to the existing buffer
accumulatedData= Buffer.concat([accumulatedData, chunk]);
}
// process as many whole chunks as we have in the accumulatedData
while (accumulatedData.length >= dataSizeToProcess) {
// get a whole chunk into its own buffer from the start
const piece = accumulatedData.slice(0, dataSizeToProcess);
// make accumulatedData be the rest of the data
accumulatedData = accumulatedData.slice(dataSizeToProcess);
// now process the data in the piece buffer
}
});
req.on('end', () => {
// process the last bit of data in accumulatedData
});
The documentation for "read" says:
The optional size argument specifies a specific number of bytes to read. If size bytes are not available to be read, null will be returned unless the stream has ended, in which case all of the data remaining in the internal buffer will be returned.
This ensures that you will either read the entire 4k or, at the end of file you will read whatever is remaining. So there is no need for anything more complicated.

how to trim unknown first characters of string in code vision

I set a mega16 (16bit AVR microcontroller) to receive data from the serial port
which is connected to Bluetooth module HC-05 for attaining an acceptable number
sent by my android app and an android application sends a number in the form of a
string array whose maximum length is equal to 10 digits. The problem arrives
while receiving data such that one or two unknown characters(?) exist at the
beginning of the received string. I have to remove these unknown characters from
the beginning of the string in the case of existence.
this problem is just for HC-05. I mean I had no problem while sending numbers by
another microcontroller instead of android applications.
here is what I send by mobile:
"430102030405060\r"
and what is received in the serial port of microcontroller:
"??430102030405060\r"
or
"?430102030405060\r"
here is USART Receiver interrupt code:
//-------------------------------------------------------------------------
// USART Receiver interrupt service routine
interrupt [USART_RXC] void usart_rx_isr(void)
{
char status,data;
status=UCSRA;
data=UDR;
if (data==0x0D)
{
puts(ss);printf("\r")
a=0;
memset(ss, '\0', sizeof(ss));
}
else
{
ss[a]=data;
a+=1;
}
if ((status & (FRAMING_ERROR | PARITY_ERROR | DATA_OVERRUN))==0)
{
rx_buffer[rx_wr_index++]=data;
if RX_BUFFER_SIZE == 256
// special case for receiver buffer size=256
if (++rx_counter == 0) rx_buffer_overflow=1;
else
if (rx_wr_index == RX_BUFFER_SIZE) rx_wr_index=0;
if (++rx_counter == RX_BUFFER_SIZE)
{
rx_counter=0;
rx_buffer_overflow=1;
}
endif
}
}
//-------------------------------------------------------------------------
how can I remove extra characters (?) from the beginning of received data in codevision?
You do not need to remove them, just do not pass them to your processing.
You either may test the data character before putting it into your line buffer (ss) or after the complete line was received look for the first relevant character and only pass the string starting from this position to your processing functions.
Var 1:
BOOL isGarbage(char c){
return c<'0' || c > '9';
}
if (data==0x0D)
{
puts(ss);printf("\r")
a=0;
memset(ss, '\0', sizeof(ss));
} else {
if(!isGarbage(data))
{
ss[a]=data;
a+=1;
}
}
Var2:
if (data==0x0D)
{
const char* actualString = ss;
while(isGarbage(*actualString )){
actualString ++;
}
puts(actualString );printf("\r")
a=0;
memset(ss, '\0', sizeof(ss));
} else {
ss[a]=data;
a+=1;
}
However:
maybe you should try to solve the issue in contrast to just fix the symptoms (suppress '?' characters).
What is the exact value of the questionable characters? I suspect, that '?' is only used to represent non printable data.
Maybe your interface configuration is wrong and the sender uses software flow control on the line and the suspicious characters are XON/XOFF bytes
One additional note:
You may run into trouble if you use more complex functions or even peripheral devices from your interrupt service routine (ISR).
I would strongly suggest to only fill buffers there and do all other stuff in the main loop. triggered by some volatile flags data buffers.
Also I do not get why you are using an additional buffer (ss) in the ISR, since it seems that there already is a RX-Buffer. The implementation looks like that there is a good RX-receive buffer implementation that should have some functions/possibilities to get the buffer contents within the main loop, so that you do not need to add your own code to the ISR.
Additional additional notes:
string array whose maximum length is equal to 10 digits.
I count more than that, I hope your ss array is larger than that and you also should consider the fact that something may go wrong on transmission and you get a lot more characters before the next '\n'. Currently you overwrite all your ram.

Serial data acquisition program reading from buffer

I have developed an application in Visual C++ 2008 to read data periodically (50ms) from a COM Port. In order to periodically read the data, I placed the read function in an OnTimer function, and because I didn't want the rest of the GUI to hang, I called this timer function from within a thread. I have placed the code below.
The application runs fine, but it is showing the following unexpected behaviour: after the data source (a hardware device or even a data emulator) stop sending data, my application continues to receive data for a period of time that is proportional to how long the read function has been running for (EDIT: This excess period is in the same ballpark as the period of time the data is sent for). So if I start and stop the data flow immediately, this would be reflected on my GUI, but if I start data flow and stop it ten seconds later, my GUI continues to show data for 10 seconds more (EDITED).
I have made the following observations after exhausting all my attempts at debugging:
As mentioned above, this excess period of operation is proportional to how long the hardware has been sending data.
The frequency of incoming data is 50ms, so to receive 10 seconds worth of data, my GUI must be receiving around 200 more data packets.
The only buffer I have declared is abBuffer which is just a byte array of fixed size. I don't think this can increase in size, so this data is being stored somewhere.
If I change something in the data packet, this change, understandably, is shown on the GUI after a delay (because of the above points). But this would imply that the data received at the COM port is stored in some variable sized buffer from which my read function is reading data.
I have timed the read and processing periods. The latter is instantaneous while the former very rarely (3 times in 1000 reads (following no discernible pattern)) takes 16ms. This is well within the 50ms window the GUI has for each read.
The following is my thread and timer code:
UINT CMyCOMDlg::StartThread(LPVOID param)
{
THREADSTRUCT *ts = (THREADSTRUCT*)param;
ts->_this->SetTimer(1,50,0);
return 0;
}
//Timer function that is called at regular intervals
void CMyCOMDlg::OnTimer(UINT_PTR nIDEvent)
{
if(m_bCount==true)
{
DWORD NoBytesRead;
BYTE abBuffer[45];
if(ReadFile((m_hComm),&abBuffer,45,&NoBytesRead,0))
{
if(NoBytesRead==45)
{
if(abBuffer[0]==0x10&&abBuffer[1]==0x10||abBuffer[0]==0x80&&abBuffer[1]==0x80)
{
fnSetData(abBuffer);
}
else
{
CString value;
value.Append("Header match failed");
SetDlgItemText(IDC_RXRAW,value);
}
}
else
{
CString value;
value.Append(LPCTSTR(abBuffer),NoBytesRead);
value.Append("\r\nInvalid Packet Size");
SetDlgItemText(IDC_RXRAW,value);
}
}
else
{
DWORD dwError2 = GetLastError();
CString error2;
error2.Format(_T("%d"),dwError2);
SetDlgItemText(IDC_RXRAW,error2);
}
fnClear();
}
else
{
KillTimer(1);
}
CDialog::OnTimer(nIDEvent);
}
m_bCount is just a flag I use to kill the timer and the ReadFile function is a standard Windows API call. ts is a structure that contains a pointer to the main dialog class, i.e., this.
Can anyone think of a reason this could be happening? I have tried a lot of things, and also my code does so little I cannot figure out where this unexpected behaviour is happening.
EDIT:
I am adding the COM port settings and timeouts used below :
dcb.BaudRate = CBR_115200;
dcb.ByteSize = 8;
dcb.StopBits = ONESTOPBIT;
dcb.Parity = NOPARITY;
SetCommState(m_hComm, &dcb);
_param->_this=this;
COMMTIMEOUTS timeouts;
timeouts.ReadIntervalTimeout=1;
timeouts.ReadTotalTimeoutMultiplier = 0;
timeouts.ReadTotalTimeoutConstant = 10;
timeouts.WriteTotalTimeoutMultiplier = 1;
timeouts.WriteTotalTimeoutConstant = 1;
SetCommTimeouts(m_hComm, &timeouts);
You are processing one message at a time in the OnTimer() function. Since the timer interval is 1 second but the data source keeps sending message every 50 milliseconds, your application cannot process all messages in the timely manner.
You can add while loop as follow:
while(true)
{
if(::ReadFile(m_hComm, &abBuffer, sizeof(abBuffer), &NoBytesRead, 0))
{
if(NoBytesRead == sizeof(abBuffer))
{
...
}
else
{
...
break;
}
}
else
{
...
break;
}
}
But there is another problem in your code. If your software checks the message while the data source is still sending the message, NoBytesRead could be less than 45. You may want to store the data into the message buffer like CString or std::queue<unsigned char>.
If the message doesn't contain a NULL at the end of the message, passing the message to the CString object is not safe.
Also if the first byte starts at 0x80, CString will treat it as a multi-byte string. It may cause the error. If the message is not a literal text string, consider using other data format like std::vector<unsigned char>.
By the way, you don't need to call SetTimer() in the separate thread. It doesn't take time to kick a timer. Also I recommend you to call KillTimer() somewhere outside of the OnTimer() function so that the code will be more intuitive.
If the data source continuously keeps sending data, you may need to use PurgeComm() when you open/close the COMM port.

Remove NodeJs Stream padding

I'm writing an application where I need to strip the first X and last Y bytes from a stream. So what I need is basically a function I can pass to pipe that takes X and Y as parameters and removes the desired number of bytes from the stream as it comes through. My simplified setup is like this:
const rs = fs.createReadStream('some_file')
const ws = fs.createWriteStream('some_other_file')
rs.pipe(streamPadding(128, 512)).pipe(ws)
After that, some_other_fileshould contain all the contents of some_fileminus the first 128 Bytes and the last 512 bytes. I've read up on streams, but couldn't figure out how to properly do this, so that it also handles errors during the transfer and does backpressure correctly.
As far as I know, I'd need a duplex stream, that, whenever I read from it, reads from its input stream, keeps track of where in the stream we are and skips the first 128 bytes before emitting data. Some tips on how to implement that would be very helpful.
The second part seems more difficult, if not impossible to do, because how would I know whether I already reached the last 512 bytes or not, before the input stream actually closed. I suspect that might not be possible, but I'm sure there must be a way to solve this problem, so if you have any advice on that, I'd be very thankful!
You can create a new Transform Stream which does what you wish. As for losing the last x bytes, you can always keep the last x bytes buffered and just ignore them when the stream ends.
Something like this (assuming you're working with buffers).
const {Transform} = require('stream');
const ignoreFirst = 128,
ignoreLast = 512;
let lastBuff,
cnt = 0;
const MyTrimmer = new Transform({
transform(chunk,encoding,callback) {
let len = Buffer.byteLength(chunk);
// If we haven't ignored the first bit yet, make sure we do
if(cnt <= ignoreFirst) {
let diff = ignoreFirst - cnt;
// If we have more than we want to ignore, adjust pointer
if(len > diff)
chunk = chunk.slice(diff,len);
// Otherwise unset chunk for later
else
chunk = undefined;
}
// Keep track of how many bytes we've seen
cnt += len;
// If we have nothing to push after trimming, just get out
if(!chunk)
return callback();
// If we already have a saved buff, concat it with the chunk
if(lastBuff)
chunk = Buffer.concat([lastBuff,chunk]);
// Get the new chunk length
len = Buffer.byteLength(chunk);
// If the length is less than what we ignore at the end, save it and get out
if(len < ignoreLast) {
lastBuff = chunk;
return callback();
}
// Otherwise save the piece we might want to ignore and push the rest through
lastBuff = chunk.slice(len-ignoreLast,len);
this.push(chunk.slice(0,len-ignoreLast));
callback();
}
});
Then you add that your pipeline, assuming you're reading from a file and writing to a file:
const rs = fs.createReadStream('some_file')
const ws = fs.createWriteStream('some_other_file')
myTrimmer.pipe(ws);
rs.pipe(myTrimmer);

recv with flags MSG_DONTWAIT | MSG_PEEK on TCP socket

I have a TCP stream connection used to exchange messages. This is inside Linux kernel. The consumer thread keeps processing incoming messages. After consuming one message, I want to check if there are more pending messages; in which case I would process them too. My code to achieve this looks like below. krecv is wrapper for sock_recvmsg(), passing value of flags without modification (krecv from ksocket kernel module)
With MSG_DONTWAIT, I am expecting it should not block, but apparently it blocks. With MSG_PEEK, if there is no data to be read, it should just return zero. Is this understanding correct ? Is there a better way to achieve what I need here ? I am guessing this should be a common requirement as message passing across nodes is used frequently.
int recvd = 0;
do {
recvd += krecv(*sockp, (uchar*)msg + recvd, sizeof(my_msg) - recvd, 0);
printk("recvd = %d / %lu\n", recvd, sizeof(my_msg));
} while(recvd < sizeof(my_msg));
BUG_ON(recvd != sizeof(my_msg));
/* For some reason, below line _blocks_ even with no blocking flags */
recvd = krecv(*sockp, (uchar*)tempbuf, sizeof(tempbuf), MSG_PEEK | MSG_DONTWAIT);
if (recvd) {
printk("more data waiting to be read");
more_to_process = true;
} else {
printk("NO more data waiting to be read");
}
You might check buffer's length first :
int bytesAv = 0;
ioctl(m_Socket,FIONREAD,&bytesAv); //m_Socket is the socket client's fd
If there are data in it , then recv with MSG_PEEK should not be blocked ,
If there are no data at all , then no need to MSG_PEEK ,
that might be what you like to do .
This is a very-very old question, but
1. problem persits
2. I faced with it.
At least for me (Ubuntu 19.04 with python 2.7) this MSG_DONTWAIT has no effect, however if I set the timeout to zero (with settimeout function), it works nicely.
This can be done in c with setsockopt function.

Resources