fs.readFile can't read file paths on windows, wrong encoding?

fs.readFile can't read file paths on windows, wrong encoding? - node.js

I'm passing the string "C:\random-folder\ui-debug.log" file path to fs.readFile() on a virtualized Windows 11 installation under macos and a x64 bit electron installation and nodeJS 18 (idk if that matters).
Using the following code:
const filepath = path.resolve(path.normalize(pathStr))
const file = fs.readFileSync(filepath)
but continuously get this error:
TypeError [ERR_INVALID_ARG_VALUE]: The argument 'path' must be a string or Uint8Array without null bytes. Received 'C:\\random-folder\\ui-debug.log\x00'
The file path always seems to be interpreted incorrectly by readFileSync, the \x00 is always added to the end of the path it errors with. This same code works fine on macos.
What is the correct way to read arbitrary file paths with nodejs/electron under Windows? Thanks!

The issue was to do with how the string was being parsed in the first place. Before passing the filepath to readFileSync, it was being converted to ucs2
const pathStr = clipboard.readBuffer('FileNameW').toString('ucs2')
On windows some extra characters were added at the end of the string but was console logged as normal (some internal conversion going on?) so it looked like a normal path in console but internally it had some extra encoded characters on the end.
I solved it with some regex:
const pathStr = clipboard.readBuffer('FileNameW').toString('ucs2').replace(RegExp(String.fromCharCode(0), 'g'), '')
Now readFileSync reads the file just fine!

Related

Comparing fs.readdirSync to what should be the same fs.readdirSync on MacOS vs Windows

Unexpected behavior when comparing two folders containing the same filenames on MacOS. On windows the comparison works. On MacOS .includes is never true.
Steps to reproduce:
Create 2 separate folders with filenames containing special characters ä, ö, ü.
for example:
'Aktivität.json',
'Anfängerin.json',
'Arbeitsgerät.json',
'Augenhöhle.json',
'Ausländer.json',
'Ängstlichkeit.json',
'Ärger.json',
'Ärztin.json',
'Bankdrücken.json',
'Bauchspeicheldrüse.json',
'Bäckerei.json'
Create and run node script:
import fs from "fs";
var dir = "../path/";
var path = `${dir}folder1/`;
var files = await fs.readdirSync(path);
var pathDone = `${dir}folder2/`;
var filesDone = await fs.readdirSync(pathDone);
console.log(files.length,filesDone.length)
files = files.filter((val) =>( !filesDone.includes(val)&&val.includes('.json')));
console.log(files)
console.log(filesDone)
I know it must be to do with how the filenames are encoded, but why would be comparing the two with the same filenames not work?

I have no idea what is causing this error! However it is not being caused by anything to do with coming from windows. Some of the files are being downloaded from my server using Cyberduck, there seems to be a problem with how the filenames are being saved (that is not visible to me in node, terminal or in finder!). I am now just going to use scp command instead of cyberduck for this tastk.

Reading file using python is not working properly in Linux

I'm running a python code where we read a fixed width file, which we extracted from ftp server. the code is working on windows without any issue. but when i am running the same code in the linux ec2 instance it's giving an error saying that "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x99 in position 1819: invalid start byte". but the same code running in windows without any error.since i am not aware about the encoding type of the source file i am passing the encoding type as None. And this also working fine in windows but when we running the code in linux its giving an error saying that "encoding type None is not recognize".
i am using the codecs library to read the file and python version that i am using is 3.7.3
with codecs.open("recode.dat",encoding=None,errors='replace') as open_src:
with open("target_file.dat", 'w+',encoding=None) as open_tgt:
for src_rec in open_src:
new_rec = ''
for f_length in data_type_length:
f_length = int(f_length)
field = '"' + src_rec[:f_length].strip() + '"|'
new_rec += (field)
src_rec = src_rec[f_length:]
open_tgt.write(new_rec[:-1] + '\n')

require errors show private instead of actual file path

Today I got surprised by the following (you can try at node repl):
require("/tmp/bad.json")
SyntaxError: /private/tmp/bad.json: Unexpected token n in JSON at position 3
As you can see, I required (intentionally) a JSON file that contains a syntax error. However, on the error message, instead of the actual file path, which starts at '/tmp/' you can see that it has been replaced by the string /private/.
Why is this?
I'm using node v8.15.0

This has nothing to do with Node or the node version but with the operating system. In this case, I was using OSX, where /tmp is just a symbolic link to /private/tmp. Then when the error was triggered the actual path is showing.

Tesseract not using path variable

Why does my Tesseract instance require me to explicitly set my datapath, but doesn't want to read the environment variable?
Let me clarify: running the code
ITesseract tesseract = new Tesseract();
String result = tesseract.doOCR(myImage);
Throws an error:
Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the
parent directory of your "tessdata" directory.
I already have set my environment variable, ie doing
echo $TESSDATA_PREFIX returns /usr/share/tessdata/
Now, setting the path variable explicitly in my code, ie:
Itesseract tesseract = new Tesseract();
tesseract.setDatapath("/usr/share/tessdata/");
String result = tesseract.doOCR(myImage);
WORKS PERFECTLY. Why?
I'm using Manjaro 17.0.5

The library was initially designed to use the data files bundled in its tessdata folder. In your case, if you want to read from the standard tessdata directory, you would want to set datapath as follows:
tesseract.setDatapath(System.getenv("TESSDATA_PREFIX"));

How's python Pyminizip compress_multiple work?

My python version is 3.5 through Anaconda on Windows 10 environment. I'm using Pyminizip because I need password protected for my zip files, and Zipfile doesn't support it yet.
I am able to zip single file through the function pyminizip.compress, and the encrypt function worked as expected. However, when trying to use pyminizip.compress_multiple I always encountered a Python crash (as pictures) and I believe it's due to the problem of my bad input format.
What I would like to know is: What's the acceptable format for input argument src file LIST path? From Pyminizip's documentation:
pyminizip.compress_multiple([u'pyminizip.so', 'file2.txt'], "file.zip", "1233", 4, progress)
Args:
1. src file LIST path (list)
2. dst file path (string)
3. password (string) or None (to create no-password zip)
4. compress_level(int) between 1 to 9, 1 (more fast) <---> 9 (more compress)
It seems the first argument src file LIST path should be a list containing all files required to be zipped. Accordingly, I tried to use compress_multiple to compress single file with command:
pyminizip.compress_multiple( ['Filename.txt'], 'output.zip', 'password', 4, optional)
and it lead to Python crash. So I try to add a full path into the args.
pyminizip.compress_multiple( [os.getcwd(), 'Filename.txt'], ... )
and still, it crashed again. So I think maybe I have to split the path like this
path = os.getcwd().split( os.sep )
pyminizip.compress_multiple( [path, 'Filename.txt'], ...)
still got a bad luck. Any ideas?

Pyminizip requires the path name (or relative path name from where the script is running from) in the files.
Your example:
pyminizip.compress_multiple( [os.getcwd(), 'Filename.txt'], ... )
gives a list of files of os.getcwd(), and then another file, 'Filename.txt'. You need to combine them into a single path using os.path.join()
in your filename example, you will need:
pyminizip.compress_multiple( [os.path.join(getcwd(), 'Filename.txt')],...)
conversly:
pyminizip.compress_multiple( [os.path.join(getcwd(), 'Filename1.txt'), os.path.join(getcwd(), 'Filename2.txt')],...)

From here - https://pypi.org/project/pyminizip/, the usage of compress_multiple is
pyminizip.compress_multiple([u'pyminizip.so', 'file2.txt'], [u'/path_for_file1', u'/path_for_file2'], "file.zip", "1233", 4, progress)
The second parameter is a bit confusing, but if used, it will create a zip file, which when uncompressed, will create a directory structure like:

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

fs.readFile can't read file paths on windows, wrong encoding? - node.js

Related

Comparing fs.readdirSync to what should be the same fs.readdirSync on MacOS vs Windows

Reading file using python is not working properly in Linux

require errors show private instead of actual file path

Tesseract not using path variable

How's python Pyminizip compress_multiple work?

Categories

Resources