NodeJS FS Write to Read inconsistent data without overflow (Solved as Buffer.toString) - node.js

I'm using the fs.createWriteStream(path[, options]) function to create a write stream splitted in text lines each ending with \n.
But, when the process ended, if I go to check the stream leater it seems to be corrupted, showing some (few) corrupted lines (like a 0.05% of the lines looks partialy cutted like a buffer overflow error).
Anyway if I grow the internal stream buffer from 16k to 4M with the highWaterMark option at creation of the streem, the error rate seems to change but do not disappear!)

It's due to an error in reading, not in writing.
I was doing something like:
var lines=[],
line='',
buf=new Buffer(8192);
while ((fs.readSync(fd,buf,0,buf.length))!=0) {
lines = buf.toString().split("\n");
lines[0]=line+lines[0];
line=lines.pop();
but this method you can find here and there on the web is really really wrong!
You have to check the real buffer lengt when you convert it to string, using buf.toString(null, 0 ,read_len)!!
var lines=[],
line='',
buf=new Buffer(8192),
read_len;
while ((read_len=fs.readSync(fd,buf,0,buf.length))!=0) {
lines = buf.toString(null, 0 ,read_len).split("\n");
lines[0]=line+lines[0];
line=lines.pop();

Related

Is there a string size limit when feeding .fromstring() method as input?

I'm working on multiple well-formed xml files, whose sizes range from 100 MB to 4 GB. My goal is to read them as strings and then import them as ElementTree objects using .fromstring() method (from xml.etree.ElementTree module).
However, as the process goes through and the string size increases, two exceptions occured related to memory restriction :
xml.etree.ElementTree.ParseError: out of memory: line 1, column 0
OverflowError: size does not fit in an int
It looks like .fromstring() method enforces a string size limit to the input, around 1GB... ?
To debug this, I wrote a short script using a for loop:
xmlFiles_list = [path1, path2, ...]
for fp in xmlFiles_list:
xml_fo = open(fp, mode='r', encoding="utf-8")
xml_asStr = xml_fo.read()
xml_fo.close()
print(len(xml_asStr.encode("utf-8")) / 10**9) # display string size in GB
try:
etree = cElementTree.fromstring(xml_asStr)
print(".fromstring() success!\n")
except Exception as e:
print(f"Error :{type(e)} {str(e)}\n")
continue
The ouput is as following :
0.895206753
.fromstring() success!
1.220224531
Error :<class 'xml.etree.ElementTree.ParseError'> out of memory: line 1, column 0
1.328233473
Erreur :<class 'xml.etree.ElementTree.ParseError'> out of memory: line 1, column 0
2.567867904
Error :<class 'OverflowError'> size does not fit in an int
4.080672538
Error :<class 'OverflowError'> size does not fit in an int
I found multiple workarounds to avoid this issue : .parse() method or lxml module for bette performance. I just hope someone could shed some light on this :
Is there a specific string size limit in xml.etree.ET module and .fromstring() method ?
Why do I end up with two different exceptions as the string size increases ? Are they related to the same memory-allocation restriction ?
Python version/system: 3.9 (64 bits)
RAM : 32go
Hope my topic is clear enough, I'm new on stackoverflow

NodeJS Writable streams writev alternating between 1 and highWaterMark chunks

So I have a stream that generates data and one that writes them to the database. Writing to the databse is slow. I use the writev function to write a batch of 3000 chunk at once.
const generator = new DataGenerator(); // extends Readable
const dbWriter = new DBWriter({ highWaterMark: 3000 }); // extends Writable, implements _writev method
pipeline(
generator,
dbWriter
)
But when I log chunk counts in the _writev method, I get following output:
1
2031
969
1
1635
1365
1
1728
1272
1
...
I understand the first line is 1. A chunk comes, DB starts writing. 2031 chunks come in the meantime.
Then DB starts writing the 2031 chunks and another 969 chunks come in the meantime, not 3000. And then in the next step, only 1 is written again. Like if receiving chunks to buffer would reset only when everything is written, not when the 3000 buffer is not full.
What I would expect:
1
2031
3000
3000
3000
...
3000
123
Why?
Well because there is no warranty that you will get 3000 chunk of data, it does tell the limit of the inner buffer that your writable stream has. It is okay that you can receive an arbitrary amount of data because read stream knows nothing about your buffer size.
Best regards.

TCP Stack Sending A Mbuf Chain: where is the loop?

I am new to BSD TCP stack code here, but we have an application that converted and uses BSD TCP stack code in the user mode. I am debugging an issue that this application would hold (or not sending data) in a strange situation. I included part of the related callstack below.
Before calling sosend in uipc_socket, I have verified that the length stored in the top, _m0->m_pkthdr.len has the right value. The chain should hold 12 pieces of TCP payload each with 1368 bytes long. In the end, my callback function only got called 10 times instead of 12 and the 10 smaller mbuf chains contains less payload data after they were combined.
I checked each function in the call stack and I could not find a loop anywhere to iterate through from the head to the end of the mbuf chain as I expected. Only loops I could find was the nested do ... while() loops in sosend_generic of uipc_socket.c, however, my code path only executed the loop once, since the resid was set to 0 immediately after the (uio == NULL) check.
\#2 0x00007ffff1acb868 in ether_output_frame (ifp=0xbf5f290, m=0x7ffff7dfeb00) at .../freebsd_plebnet/net/if_ethersubr.c:457
\#3 0x00007ffff1acb7ef in ether_output (ifp=0xbf5f290, m=0x7ffff7dfeb00, dst=0x7fffffff2c8c, ro=0x7fffffff2c70) at .../freebsd_plebnet/net/if_ethersubr.c:429
\#4 0x00007ffff1ada20b in ip_output (m=0x7ffff7dfeb00, opt=0x0, ro=0x7fffffff2c70, flags=0, imo=0x0, inp=0x7fffd409e000) at .../freebsd_plebnet/netinet/ip_output.c:663
\#5 0x00007ffff1b0743b in tcp_output (tp=0x7fffd409c000) at /scratch/vindu/ec_intg/ims/src/third-party/nse/nsnet.git/freebsd_plebnet/netinet/tcp_output.c:1288
\#6 0x00007ffff1ae2789 in tcp_usr_send (so=0x7fffd40a0000, flags=0, m=0x7ffeda1b7270, nam=0x0, control=0x0, td=0x7ffff1d5f3d0) at .../freebsd_plebnet/netinet/tcp_usrreq.c:882
\#7 0x00007ffff1a93743 in sosend_generic (so=0x7fffd40a0000, addr=0x0, uio=0x0, top=0x7ffeda1b7270, control=0x0, flags=128, td=0x7ffff1d5f3d0) at .../freebsd_plebnet/kern/uipc_socket.c:1390
\#8 0x00007ffff1a9387d in sosend (so=0x7fffd40a0000, addr=0x0, uio=0x0, top=0x7ffeda1b7270, control=0x0, flags=128, td=0x7ffff1d5f3d0) at .../freebsd_plebnet/kern/uipc_socket.c:1434

Reading console output from mplayer to parse track's position/length

When you run mplayer, it will display the playing track's position and length (among some other information) through, what I'd assume is, stdout.
Here's a sample output from mplayer:
MPlayer2 2.0-728-g2c378c7-4+b1 (C) 2000-2012 MPlayer Team
Cannot open file '/home/pi/.mplayer/input.conf': No such file or directory
Failed to open /home/pi/.mplayer/input.conf.
Cannot open file '/etc/mplayer/input.conf': No such file or directory
Failed to open /etc/mplayer/input.conf.
Playing Bomba Estéreo - La Boquilla [Dixone Remix].mp3.
Detected file format: MP2/3 (MPEG audio layer 2/3) (libavformat)
[mp3 # 0x75bc15b8]max_analyze_duration 5000000 reached
[mp3 # 0x75bc15b8]Estimating duration from bitrate, this may be inaccurate
[lavf] stream 0: audio (mp3), -aid 0
Clip info:
album_artist: Bomba Estéreo
genre: Latin
title: La Boquilla [Dixone Remix]
artist: Bomba Estéreo
TBPM: 109
TKEY: 11A
album: Unknown
date: 2011
Load subtitles in .
Selected audio codec: MPEG 1.0/2.0/2.5 layers I, II, III [mpg123]
AUDIO: 44100 Hz, 2 ch, s16le, 320.0 kbit/22.68% (ratio: 40000->176400)
AO: [pulse] 44100Hz 2ch s16le (2 bytes per sample)
Video: no video
Starting playback...
A: 47.5 (47.4) of 229.3 (03:49.3) 4.1%
The last line (A: 47.5 (47.4) of 229.3 (03:49.3) 4.1%) is what I'm trying to read but, for some reason, it's never received by the Process.OutputDataReceived event handler.
Am I missing something? Is mplayer using some non-standard way of outputting the "A:" line to the console?
Here's the code in case it helps:
Public Overrides Sub Play()
player = New Process()
player.EnableRaisingEvents = True
With player.StartInfo
.FileName = "mplayer"
.Arguments = String.Format("-ss {1} -endpos {2} -volume {3} -nolirc -vc null -vo null ""{0}""",
tmpFileName,
mTrack.StartTime,
mTrack.EndTime,
100)
.CreateNoWindow = False
.UseShellExecute = False
.RedirectStandardOutput = True
.RedirectStandardError = True
.RedirectStandardInput = True
End With
AddHandler player.OutputDataReceived, AddressOf DataReceived
AddHandler player.ErrorDataReceived, AddressOf DataReceived
AddHandler player.Exited, Sub() KillPlayer()
player.Start()
player.BeginOutputReadLine()
player.BeginErrorReadLine()
waitForPlayer.WaitOne()
KillPlayer()
End Sub
Private Sub DataReceived(sender As Object, e As DataReceivedEventArgs)
If e.Data = Nothing Then Exit Sub
If e.Data.Contains("A: ") Then
' Parse the data
End If
End Sub
Apparently, the only solution is to run mplayer in "slave" mode, as explained here: http://www.mplayerhq.hu/DOCS/tech/slave.txt
In this mode we can send commands to mplayer (via stdin) and the response (if any) will be sent via stdout.
Here's a very simple implementation that displays mplayer's current position (in seconds):
using System;
using System.Threading;
using System.Diagnostics;
using System.Collections.Generic;
namespace TestMplayer {
class MainClass {
private static Process player;
public static void Main(string[] args) {
String fileName = "/home/pi/Documents/Projects/Raspberry/RPiPlayer/RPiPlayer/bin/Electronica/Skrillex - Make It Bun Dem (Damian Marley) [Butch Clancy Remix].mp3";
player = new Process();
player.EnableRaisingEvents = true;
player.StartInfo.FileName = "mplayer";
player.StartInfo.Arguments = String.Format("-slave -nolirc -vc null -vo null \"{0}\"", fileName);
player.StartInfo.CreateNoWindow = false;
player.StartInfo.UseShellExecute = false;
player.StartInfo.RedirectStandardOutput = true;
player.StartInfo.RedirectStandardError = true;
player.StartInfo.RedirectStandardInput = true;
player.OutputDataReceived += DataReceived;
player.Start();
player.BeginOutputReadLine();
player.BeginErrorReadLine();
Thread getPosThread = new Thread(GetPosLoop);
getPosThread.Start();
}
private static void DataReceived(object o, DataReceivedEventArgs e) {
Console.Clear();
Console.WriteLine(e.Data);
}
private static void GetPosLoop() {
do {
Thread.Sleep(250);
player.StandardInput.Write("get_time_pos" + Environment.NewLine);
} while(!player.HasExited);
}
}
}
I found the same problem with another application that works more or less in a similar way (dbPowerAmp), in my case, the problem was that the process output uses Unicode encoding to write the stdout buffer, so I have to set the StandardOutputEncoding and StandardError to Unicode to be able start reading.
Your problem seems to be the same, because if "A" cannot be found inside the output that you published which clearlly shows that existing "A", then probably means that the character differs when reading in the current encoding that you are using to read the output.
So, try setting the proper encoding when reading the process output, try setting them to Unicode.
ProcessStartInfo.StandardOutputEncoding
ProcessStartInfo.StandardErrorEncoding
Using "read" instead of "readline", and treating the input as binary, will probably fix your problem.
First off, yes, mplayer slave mode is probably what you want. However, if you're determined to parse the console output, it is possible.
Slave mode exists for a reason, and if you're half serious about using mplayer from within your program, it's worth a little time to figure out how to properly use it. That said, I'm sure there's situations where the wrapper is the appropriate approach. Maybe you want to pretend that mplayer is running normally, and control it from the console, but secretly monitor file position to resume it later? The wrapper might be easier than translating all of mplayers keyboard commands into slave mode commands?
Your problem is likely that you're trying to use "readline" from within python on an endless line. That line of output contains \r instead of \n as the line separator, so readline will treat it as a single endless line. sed also fails this way, but other commands (such as grep) treat \r as \n under some circumstances.
Handling of \r is inconsistent, and can't be relied on. For instance, my version of grep treats \r as \n when matching IF output is a console, and uses \n to seperate the output. But if output is a pipe, it treats it as any other character.
For instance:
mplayer TMBG-Older.mp3 2>/dev/null | tr '\r' '\n' | grep "^A: " | sed 's/^A: *\([0-9.]*\) .*/\1/' | tail -n 1
I'm using "tr" here to force it to '\n', so other commands in the pipe can deal with it in a consistent manner.
This pipeline of commands outputs a single line, containing ONLY the ending position in seconds, with decimal point. But if you remove the "tr" command from this pipe, bad things happen. On my system, it shows only "0.0" as the position, as "sed" doesn't deal well with the '\r' line separators, and ALL the position updates are treated as the same line.
I'm fairly sure python doesn't handle \r well either, and that's likely your problem. If so, using "read" instead of "readline" and treating it like binary is probably the correct solution.
There are other problems with this approach though. Buffering is a big one. ^C causes this command to output nothing, mplayer must quit gracefully to show anything at all, as pipelines buffers things, and buffers get discarded on SIGINT.
If you really wanted to get fancy, you could probably cat several input sources together, tee the output several ways, and REALLY write a wrapper around mplayer. A wrapper that's fragile, complicated, and might break every time mplayer is updated, a user does something unexpected, or the name of the file being played contains something weird, SIGSTOP or SIGINT. And probably other things that I haven't though of.

When changing a file name, Recording Start is overdue for 3 seconds.

Using Two ASFWriter Filters in a graph.One is making wmv file,
Anather is for live streaming.
Carrying out streaming,
When changing a file name, Recording Start is overdue for 3 seconds.
so,The head of a New WMV is missing.
It's troubled.
CAMERA ------ InfTee Filter --- --- AsfWriter Filter → WMV FIle
X
Microphone --- InfTee Filter2 --- --- AsfWriter Filter2 → Live Streaming
void RecStart()
{
...
ConnectFilters(pInfTee,"Infinite Pin Tee Filter(1)",L"Output1",pASFWriter,"ASFWriter",L"Video Input 01"));
ConnectFilters(pInfTee,"Infinite Pin Tee Filter(2)",L"Output2",pASFWriter2,"ASFWriter",L"Video Input 01"));
ConnectFilters(pSrcAudio,"Audio Source",L"Capture",pInfTee2,"Infinite Pin Tee Filter",L"Input"));
ConnectFilters(pInfTee2,"Infinite Pin Tee Filter(1)A",L"Output1",pASFWriter,"ASFWriter",L"Audio Input 01"));
ConnectFilters(pInfTee2,"Infinite Pin Tee Filter(2)A",L"Output2",pASFWriter2,"ASFWriter",L"Audio Input 01"));
pASFWriter2->QueryInterface(IID_IConfigAsfWriter,(void**)&pConfig);
pConfig->QueryInterface(IID_IServiceProvider,(void**)&pProvider);
pProvider->QueryService(IID_IWMWriterAdvanced2, IID_IWMWriterAdvanced2, (void**)&mpWriter2);
mpWriter2->SetLiveSource(TRUE);
mpWriter2->RemoveSink(0);
WMCreateWriterNetworkSink(&mpNetSink);
DWORD dwPort = (DWORD)streamingPortNo;
mpNetSink->Open(&dwPort);
mpNetSink->GetHostURL(url, &url_len);
hr =mpWriter2->AddSink(mpNetSink);
pGraph->QueryInterface(IID_IMediaEventEx,(void **)&pMediaIvent);
pMediaIvent->SetNotifyWindow((OAHWND)this->m_hWnd,WM_GRAPHNOTIFY,0);
pGraph->QueryInterface(IID_IMediaControl,(void **)&pMediaControl);
pMediaControl->Run();
}
void OnTimer()
{
pMediaControl->Stop();
CComQIPtr<IFileSinkFilter,&IID_IFileSinkFilter> pIFS = pASFWriter;
pIFS->SetFileName(NewFilename,NULL);
pMediaControl->Run();
}
---------------------------------------------------------------------------
→ I think ... In order to wait for starting of streaming,
it is missing for 3 seconds in head of New WMV File.
Are there any measures?
---------------------------------------------------------------------------
When you restart the graph, you inevitably miss a fragment of data due to initialization overhead. And, it is impossible to switch files without stopping the graph. The solution is to use multiple graphs and keep capturing while the part with file writing is being reinitialized.
See DirectShow Bridges for a typical solution addressing this problem.

Resources