NodeJS Writable streams writev alternating between 1 and highWaterMark chunks

NodeJS Writable streams writev alternating between 1 and highWaterMark chunks - node.js

So I have a stream that generates data and one that writes them to the database. Writing to the databse is slow. I use the writev function to write a batch of 3000 chunk at once.
const generator = new DataGenerator(); // extends Readable
const dbWriter = new DBWriter({ highWaterMark: 3000 }); // extends Writable, implements _writev method
pipeline(
generator,
dbWriter
)
But when I log chunk counts in the _writev method, I get following output:
1
2031
969
1
1635
1365
1
1728
1272
1
...
I understand the first line is 1. A chunk comes, DB starts writing. 2031 chunks come in the meantime.
Then DB starts writing the 2031 chunks and another 969 chunks come in the meantime, not 3000. And then in the next step, only 1 is written again. Like if receiving chunks to buffer would reset only when everything is written, not when the 3000 buffer is not full.
What I would expect:
1
2031
3000
3000
3000
...
3000
123
Why?

Well because there is no warranty that you will get 3000 chunk of data, it does tell the limit of the inner buffer that your writable stream has. It is okay that you can receive an arbitrary amount of data because read stream knows nothing about your buffer size.
Best regards.

Related

Fortran function to start reading from specific line of input file?

I am working with a very large input file but the information I need only starts halfway through the file. Is there a way to start reading the input file at a certain line?
I am currently reading each line from the start of the file just to skip it but this can take quite a while depending on the size of the input file.
implicit none
integer :: i
open(99, file = 'input.dat')
do i=1, 10000
read(99,*)
end do ! skips the first 10,000 lines of file input.dat
This works but when I want to skip say 95,038,000 lines it takes a considerable amount of time and since I know what line I want to start reading from is there a way to start from that line?
My input file looks like this (... indicating much of the same):
100
1 0.01 20000 20000
He 51.71286 -72.51866 -18.82236
He 26.74500 -55.83966 -21.50548
He 54.21926 10.26991 55.95801
...
He 53.88083 36.44334 -12.26679
He -73.74439 -15.63201 -73.70352
He -64.81084 -24.94384 -76.42190
100
2 0.01 20000 20000
He -75.32897 -18.60672 25.41119
He -26.30221 -58.53324 -61.39479
He -64.44293 -28.82557 -15.57422
...

MicroPython client not receive text file larger than 4kb (4096 bytes) from Python Server

I have an micropython client on esp32 board, and Python on linux server. I am trying send 5.5kb text file from Python Server to MicroPython client. It sends successfully but MicroPython client does not receive all data. Codes as follows;
Python Server:
with open('downloads/%s' % (request_path), 'rb') as f:
data = f.read()
self.wfile.write(data) #data is 5.5kb
MicroPython Client
recvData = sock.read(4096).decode('utf-8').split("\r\n")
print("Response_Received:: %s" % recvData)
sock.close()
Response_Received:: ['HTTP/1.0 200 OK', 'Server: SimpleHTTP/0.6 Python/3.5.3', 'Date: Sat, 09 Jun 2018 09:29:41 GMT', '', '# Ity: asdasd\n# ksduygfkhsgdkjfksjdhfg\n kjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy98\n 47y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhs\n gdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349rio\n t34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r3\n 49riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkv\n nvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogijiksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweu\n oiruy9847y397r349riot34jt;o\n giji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbs\n djkvbjcxbvhweioufhoiweuoiruy9847y397\n r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufh\n oiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduyg\n fkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhwei\n oufhoiweuoiruy9847y397r349riot\n 34jt;ogiji4vuijo4vjlkvnvl;kksduyg\n fkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhw\n eioufhoiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjd\n hfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiwe\n uoiruy9847y397r349riot34jt;ogiji4vuij\n o4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhwe\n ioufhoiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfk\n hsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;o\n giji4vuijo4vjlkvnvl;kksduygfkhsgdk\n jfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvh\n weioufhoiweuoiruy9847y397r349riot34jt;ogiji\n 4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiru\n y9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;k4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduyg\n fkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcx\n bvhweioufhoiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhs\n gdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdj\n nvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogijiksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9bfkjcbsdjkvbjcxbvhweioufhoi847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweu\nnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogijiksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweu\nnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogijiksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufhoiweuoiruy9847y397r349riot34jt;ogiji4vuijo4vjlkvnvl;kksduygfkhsgdkjfksjdhfgkjdhsbfkjdhsbfkjcbsdjkvbjcxbvhweioufnvl;k']
Client receives only 4140 bytes of the array data in due to buffer size(4096), 4th element of the recvData is lost. MicroPython does not accept over this Buffer size. How can i receive all my data (5.5kb) in 4th element of recvData array without any loss?
I have tried to fragment the received data, but it was not successful.
while True:
chunck = s.recv(4096)
if not chunck:
break
fragments.append(chunck)

Since your goal is to write the file to the filesystem, the simplest solution is to stop trying to hold the entire file in memory. Instead of building up your fragments array, just write the received chunks to a file:
with open('datafile', 'w') as fd:
while True:
chunk = s.recv(4096)
if not chunk:
break
fd.write(chunk)
This requires a constant amount of memory and can be used to receive
files of arbitrary size.

TCP Stack Sending A Mbuf Chain: where is the loop?

I am new to BSD TCP stack code here, but we have an application that converted and uses BSD TCP stack code in the user mode. I am debugging an issue that this application would hold (or not sending data) in a strange situation. I included part of the related callstack below.
Before calling sosend in uipc_socket, I have verified that the length stored in the top, _m0->m_pkthdr.len has the right value. The chain should hold 12 pieces of TCP payload each with 1368 bytes long. In the end, my callback function only got called 10 times instead of 12 and the 10 smaller mbuf chains contains less payload data after they were combined.
I checked each function in the call stack and I could not find a loop anywhere to iterate through from the head to the end of the mbuf chain as I expected. Only loops I could find was the nested do ... while() loops in sosend_generic of uipc_socket.c, however, my code path only executed the loop once, since the resid was set to 0 immediately after the (uio == NULL) check.
\#2 0x00007ffff1acb868 in ether_output_frame (ifp=0xbf5f290, m=0x7ffff7dfeb00) at .../freebsd_plebnet/net/if_ethersubr.c:457
\#3 0x00007ffff1acb7ef in ether_output (ifp=0xbf5f290, m=0x7ffff7dfeb00, dst=0x7fffffff2c8c, ro=0x7fffffff2c70) at .../freebsd_plebnet/net/if_ethersubr.c:429
\#4 0x00007ffff1ada20b in ip_output (m=0x7ffff7dfeb00, opt=0x0, ro=0x7fffffff2c70, flags=0, imo=0x0, inp=0x7fffd409e000) at .../freebsd_plebnet/netinet/ip_output.c:663
\#5 0x00007ffff1b0743b in tcp_output (tp=0x7fffd409c000) at /scratch/vindu/ec_intg/ims/src/third-party/nse/nsnet.git/freebsd_plebnet/netinet/tcp_output.c:1288
\#6 0x00007ffff1ae2789 in tcp_usr_send (so=0x7fffd40a0000, flags=0, m=0x7ffeda1b7270, nam=0x0, control=0x0, td=0x7ffff1d5f3d0) at .../freebsd_plebnet/netinet/tcp_usrreq.c:882
\#7 0x00007ffff1a93743 in sosend_generic (so=0x7fffd40a0000, addr=0x0, uio=0x0, top=0x7ffeda1b7270, control=0x0, flags=128, td=0x7ffff1d5f3d0) at .../freebsd_plebnet/kern/uipc_socket.c:1390
\#8 0x00007ffff1a9387d in sosend (so=0x7fffd40a0000, addr=0x0, uio=0x0, top=0x7ffeda1b7270, control=0x0, flags=128, td=0x7ffff1d5f3d0) at .../freebsd_plebnet/kern/uipc_socket.c:1434

DocumentDB performance issues

When running from DocumentDB queries from C# code on my local computer a simple DocumentDB query takes about 0.5 seconds in average. Another example, getting a reference to a document collection takes about 0.7 seconds in average. Is this to be expected? Below is my code for checking if a collection exists, it is pretty straight forward - but is there any way of improving the bad performance?
// Create a new instance of the DocumentClient
var client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey);
// Get the database with the id=FamilyRegistry
var database = client.CreateDatabaseQuery().Where(db => db.Id == "FamilyRegistry").AsEnumerable().FirstOrDefault();
var stopWatch = new Stopwatch();
stopWatch.Start();
// Get the document collection with the id=FamilyCollection
var documentCollection = client.CreateDocumentCollectionQuery("dbs/"
+ database.Id).Where(c => c.Id == "FamilyCollection").AsEnumerable().FirstOrDefault();
stopWatch.Stop();
// Get the elapsed time as a TimeSpan value.
var ts = stopWatch.Elapsed;
// Format and display the TimeSpan value.
var elapsedTime = String.Format("{0:00} seconds, {1:00} milliseconds",
ts.Seconds,
ts.Milliseconds );
Console.WriteLine("Time taken to get a document collection: " + elapsedTime);
Console.ReadKey();
Average output on local computer:
Time taken to get a document collection: 0 seconds, 752 milliseconds
In another piece of my code I'm doing 20 small document updates that are about 400 bytes each in JSON size and it still takes 12 seconds in total. I'm only running from my development environment but I was expecting better performance.

In short, this can be done end to end in ~9 milliseconds with DocumentDB. I'll walk through the changes required, and why/how they impact results below.
The very first query always takes longer in DocumentDB because it does some setup work (fetching physical addresses of DocumentDB partitions). The next couple requests take a little bit longer to warm the connection pools. The subsequent queries will be as fast as your network (the latency of reads in DocumentDB is very low due to SSD storage).
For example, if you modify your code above to measure, for example 10 readings instead of just the first one like shown below:
using (DocumentClient client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey))
{
long totalRequests = 10;
var database = client.CreateDatabaseQuery().Where(db => db.Id == "FamilyRegistry").AsEnumerable().FirstOrDefault();
Stopwatch watch = new Stopwatch();
for (int i = 0; i < totalRequests; i++)
{
watch.Start();
var documentCollection = client.CreateDocumentCollectionQuery("dbs/"+ database.Id)
.Where(c => c.Id == "FamilyCollection").AsEnumerable().FirstOrDefault();
Console.WriteLine("Finished read {0} in {1}ms ", i, watch.ElapsedMilliseconds);
watch.Reset();
}
}
Console.ReadKey();
I get the following results running from my desktop in Redmond against the Azure West US data center, i.e. about 50 milliseconds. These numbers may vary based on the network connectivity and distance of your client from the Azure DC hosting DocumentDB:
Finished read 0 in 217ms
Finished read 1 in 46ms
Finished read 2 in 51ms
Finished read 3 in 47ms
Finished read 4 in 46ms
Finished read 5 in 93ms
Finished read 6 in 48ms
Finished read 7 in 45ms
Finished read 8 in 45ms
Finished read 9 in 51ms
Next, I switch to Direct/TCP connectivity from the default of Gateway to improve the latency from two hops to one, i.e., change the initialization code to:
using (DocumentClient client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey, new ConnectionPolicy { ConnectionMode = ConnectionMode.Direct, ConnectionProtocol = Protocol.Tcp }))
Now the operation to find the collection by ID completes within 23 milliseconds:
Finished read 0 in 197ms
Finished read 1 in 117ms
Finished read 2 in 23ms
Finished read 3 in 23ms
Finished read 4 in 25ms
Finished read 5 in 23ms
Finished read 6 in 31ms
Finished read 7 in 23ms
Finished read 8 in 23ms
Finished read 9 in 23ms
How about when you run the same results from an Azure VM or Worker Role also running in the same Azure DC? The same operation completes with about 9 milliseconds!
Finished read 0 in 140ms
Finished read 1 in 10ms
Finished read 2 in 8ms
Finished read 3 in 9ms
Finished read 4 in 9ms
Finished read 5 in 9ms
Finished read 6 in 9ms
Finished read 7 in 9ms
Finished read 8 in 10ms
Finished read 9 in 8ms
Finished read 9 in 9ms
So, to summarize:
For performance measurements, please allow for a few measurement samples to account for startup/initialization of the DocumentDB client.
Please use TCP/Direct connectivity for lowest latency.
When possible, run within the same Azure region.
If you follow these steps, you can get great performance and you'll be able to get the best performance numbers with DocumentDB.

NodeJS FS Write to Read inconsistent data without overflow (Solved as Buffer.toString)

I'm using the fs.createWriteStream(path[, options]) function to create a write stream splitted in text lines each ending with \n.
But, when the process ended, if I go to check the stream leater it seems to be corrupted, showing some (few) corrupted lines (like a 0.05% of the lines looks partialy cutted like a buffer overflow error).
Anyway if I grow the internal stream buffer from 16k to 4M with the highWaterMark option at creation of the streem, the error rate seems to change but do not disappear!)

It's due to an error in reading, not in writing.
I was doing something like:
var lines=[],
line='',
buf=new Buffer(8192);
while ((fs.readSync(fd,buf,0,buf.length))!=0) {
lines = buf.toString().split("\n");
lines[0]=line+lines[0];
line=lines.pop();
but this method you can find here and there on the web is really really wrong!
You have to check the real buffer lengt when you convert it to string, using buf.toString(null, 0 ,read_len)!!
var lines=[],
line='',
buf=new Buffer(8192),
read_len;
while ((read_len=fs.readSync(fd,buf,0,buf.length))!=0) {
lines = buf.toString(null, 0 ,read_len).split("\n");
lines[0]=line+lines[0];
line=lines.pop();

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

NodeJS Writable streams writev alternating between 1 and highWaterMark chunks - node.js

Well because there is no warranty that you will get 3000 chunk of data, it does tell the limit of the inner buffer that your writable stream has. It is okay that you can receive an arbitrary amount of data because read stream knows nothing about your buffer size. Best regards.

Related

Fortran function to start reading from specific line of input file?

MicroPython client not receive text file larger than 4kb (4096 bytes) from Python Server

TCP Stack Sending A Mbuf Chain: where is the loop?

DocumentDB performance issues

NodeJS FS Write to Read inconsistent data without overflow (Solved as Buffer.toString)

Categories

Resources