Alternative to fs.readdirSync large directory directory in Node

Alternative to fs.readdirSync large directory directory in Node - node.js

I have a single directory with a few million json files in it. I ultimately want to iterate over each file in the directory, read it, do something with the information and then write something into a database.
My script works perfectly when I use a test directory with a few hundred files. However, it stalls when I use the real directory. I strongly believe that I have pinpointed the problem to the use of:
fs.readdirSync('my dir path')
Converting this to the Async function would not help anything since I need the file names before anything else can happen anyways. However, my belief is that this operation hangs because it simply "takes too long" for it to read the entire directory.
For reference here is a broader portion of the function:
function traverseFS(){
var path = 'my dir name and path';
var files = fs.readdirSync(path);
for (var i in files) {
path + '/' + files[i];
var fileText = fs.readFileSync(currentFile,'utf8');
var json= JSON.parse(fileText);
if(json)
// do something
}
}
My question is either:
Is there something I can do get this to work using readdirSync?
Is there another operation I should be using?

You would need to either use a child process (easiest) that creates a directory listing and parse that or write your own streamable binding to scandir() (on *nix) and/or whatever the equivalent is on Windows and use that. For the latter, you may want to use the libuv code (*nix, Windows) as a guide.

Related

How do I get the filename of an open std::fs::File in Rust?

I have an open std::fs::File, and I want to get it's filename, e.g. as a PathBuf. How do I do that?
The simple solution would be to just save the path used in the call to File::open. Unfortunately, this does not work for me. I am trying to write a program that reads log files, and the program that writes the logs keep changing the filenames as part of it's log rotation. So the file may very well have been renamed since it was opened. This is on Linux, so renaming open files is possible.
How do I get around this issue, and get the current filename of an open file?

On a typical Unix filesystem, a file may have multiple filenames at once, or even none at all. The file metadata is stored in an inode, which has a unique inode number, and this inode number can be linked from any number of directory entries. However, there are no reverse links from the inode back to the directory entries.
Given an open File object in Rust, you can get the inode number using the ino() method. If you know the directory the log file is in, you can use std::fs::read_dir() to iterate over all entries in that directory, and each entry will also have an ino() method, so you can find the one(s) matching your open file object. Of course this approach is subject to race conditions – the directory entry may already be gone again once you try to do anything with it.

On linux, files handles held by the current process can be found under /proc/self/fd. These look and act like symlinks to the original files (though I think they may technically be something else - perhaps someone who knows more can chip in).
You can therefore recover the (possibly changed) file name by constructing the correct path in /proc/self/fd using your file descriptor, and then following the symlink back to the filesystem.
This snippet shows the steps:
use std::fs::read_link;
use std::os::unix::io::AsRawFd;
use std::path::PathBuf;
// if f is your std::fs::File
// first construct the path to the symlink under /proc
let path_in_proc = PathBuf::from(format!("/proc/self/fd/{}", f.as_raw_fd()));
// ...and follow it back to the original file
let new_file_name = read_link(path_in_proc).unwrap();

Multiple Excel files using SSIS [duplicate]

I have a source from which the files are to be processed. Multiple files are coming to that loacation at any time randomly (Package should run every 2 hours). I have to process only the new files, i can not delete, move the already processed files from that location. I can only copy the files to Archive location. How can I achieve this ?

You can achieve this using the following steps.
Use the foreach file enumerator for your incoming folder and save
the filename in "IncomingFile" variable. Configure to select "Name
and Extension"[In my code I have used that otherwise you need to do
some modification to the script]
Create tow SSIS variables Like "ArchivePath" as string and
"IsLoaded" as Boolean[default to false].
Create the SSIS script component and use "IncomingFile" and
"ArchivePath" as the readonly variable. "IsLoaded" should be the
ReadandWrite variable.
Write the following code in the script component. If file is already
exists then it will return true. Otherwise False.
public void Main()
{
var archivePath = Dts.Variables["ArchivePath"].Value.ToString();
var incomingFile = Dts.Variables["IncomingFile"].Value.ToString();
var fileFullPath = string.Format(#"{0}\{1}",archivePath,incomingFile);
bool isLoaded = File.Exists(fileFullPath);
Dts.Variables["IsLoaded"].Value = isLoaded;
Dts.TaskResult = (int)ScriptResults.Success;
}
Use the Precedence constraint to call the Data flow task and evaluation operation should be "Expression" . Set something as follows in your expression box.
#IsLoaded==False
Hope this helps.

Your package should process the files in a given directory, then move them to another directory once processed. That way, each time the package runs, it has to fully process the source directory.
To process each files in a directory, use the ForEach Container. You can specify a folder to look in, and some expressions to filter. If, for instance, your filename contains a timestamp, you could use that timestamp to filter your files in or out.
You use a flat file source to read files, then use the filesystem task to move them around.

To start, take a look at the answer here: Enumerate files in a folder using SSIS Script Task
The SSIS Script Task should enumerate all the files in a given folder, then take a snapshot of the already processed files from a table where you will keep a log of what's processed, ignore the already processed ones and just return the non-processed in an object variable for a for-each task to consume.

fs.writeFile() and fs.readFile() strange behavior

I'm writing a desktop app using electron and react. I want to store some information in a JSON file. I've tried both web-fs and browserify-fs to accomplish this, and neither is working as expected. My setup is as follows
project/app/(react files)
project/index.html
project/js/bundle.js
project/main.js
I'm using watchify to compile all the changes in the react files to the bundle.js file (which is read by index.html).
The following is ran from app.js in project/app/ (which is also where the JSON file is stored)
import * as fs from 'browserify-fs';
...
fs.writeFile('./fileData.json', data, function(err){
if(err)console.log(err);
else console.log("success");
});
'success' is always logged to the console, however the contents of the file is not updated, regardless of how I specify the path.
I've tried './fileData.json'
'/fileData.json'
__dirname + '/fileData.json' (which tells me that __dirname couldn't be found)
(absolute path to fileData.json) (which tells me that /Users could not be found)
After doing the above, if I change the writeFile to readFile and log the contents to the console, the updated file is printed. Even if I delete the fileData.json file, the file is successfully read.
This makes me believe that fs.writeFile() is writing to a different directory, not the one the process is being ran from. Despite this, I cannot find any other fileData.json files anywhere on my computer. There's a couple other weird behaviors:
When logging __filename (which should log the entire filepath), the only thing printed is "/app.js" with no leading file path.
Calling "process.cwd()" just gives me "/"
When calling fs.writeFile() with the full file path "/Users/...." I get a folder not found error
Anyone know what could be causing this behavior and how to fix it?
Edit - I also tried getting the absolute path by adding
var path = require('path')
var appDir = path.resolve('./app');
which again only gives me /app when it should be returning an absolute path

Can you confirm the same behavior when not using browserify-fs? Just use plain old fs. (Note you can do this straight from the Chrome dev tools console).
Looking at browserify-fs's page, it looks like it implements a kind of virtual file system using a dependency called level-filesystem (which uses level db). So the files you're expecting to get created aren't. They're being created within a level db database. You could probably find a level db file somewhere that has the information you're trying to write directly to the file system in it.
For simple writing/reading of a JSON file, I'd recommend https://github.com/sindresorhus/electron-config.

node.js and ncp module - fails to copy single file

I am using Node.js v6.3.1 and ncp v2.0.0
I can only get ncp to copy the contents of a directory, but not a single file within that directory.
Here is the code copying the contents of a directory recursively that works:
var ncp = require("ncp").ncp;
ncp("source/directory/", "destination/directory/", callback);
...and here is the same code but with a file as the source:
var ncp = require("ncp").ncp;
ncp("source/directory/file.txt", "destination/directory/", callback);
From this all I can think is that ncp was specifically designed to copy directories recursively, not single files maybe?
I had thought about using something like fileSystem's read/write stream functions as described here but really for consistency I was hoping to stick with ncp.
Update:
I have found another package called node-fs-extra which does what I want without the need for me to add event handlers to the operations, like I would have to do with the fileSystem read/write solution.
Here is the code that is working:
var fsExtra = require("fs-extra");
fsExtra.copy("source/directory/file.txt", "destination/directory/file.txt", callback);
Obviously this still is inconsistent, but at least is a little less verbose.

Ok I have figured out what I was doing wrong.
I was trying to copy a file into a directory, where as I needed to copy and name the file inside a directory.
So here is my original code that does not work:
var ncp = require("ncp");
ncp("source/directory/file.txt", "destination/directory/", callback);
...and here is the fixed code working, notice the inclusion of a file name in the destination directory:
var ncp = require("ncp");
ncp("source/directory/file.txt", "destination/directory/file.txt", callback);
So it looks like ncp wont just take the file as is, but needs you to specify the file name at the other end to successfully copy. I guess I was assuming that it would just copy the file with the same name into the destination directory.

I have found another package called node-fs-extra which does what I want without the need for me to add event handlers to the operations, like I would have to do with the fileSystem read/write solution.
Here is the code that is working:
var fsExtra = require("fs-extra");
fsExtra.copy("source/directory/file.txt", "destination/directory/file.txt", callback);
Obviously this still is inconsistent, but at least is a little less verbose.

FS won't create file

I want to create a simple console application that would compare two files based on their filename and output the result into a new file.
My problem is that NodeJS refuses to create a new file if it doesn't exist, and acts like it doesn't exist even if I create it manually.
compare = (data) -> # data is being read from process.stdin
fname = "#{data}_compare.txt"
stdout.write "Attempting to compare #{data}" # stdout = process.stdout
fs.writeFileSync fname, 'A test.'
NodeJS returns Error: ENOENT, no such file or directory in both cases (when I want it to create the file, as well as when the file already exists).
I want the file to be created in the same folder from where the application is run, so path shouldn't be an issue at all (and indeed is correct in the error message).
I tried to specify {flags: 'w'} too, but as far as I know, that's the default value anyways, so it changed noting.
I'm running on Windows 10, tried running command prompt under administrator too, still nothing. Any idea what could be causing this?

The data variable is read from stdin and therefore contains a newline at the end. This is probably what's causing the non-descriptive ENOENT error.
You can remove the newline (and any other whitespace that user might have accidentally entered) with data = data.trim()
This would be better than the substring solution since the newline is 2 characters only on Windows and 1 character elsewhere.

Make sure the path exists (not necessarily the file itself, but the folder structure), and that the process user has write permissions.

An ENOENT error tells you that a component of the specified pathname does not exist -- no entity (file or directory) could be found by the given path.
Make sure you are putting the 'dot':
'./path/to/file'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Alternative to fs.readdirSync large directory directory in Node - node.js

You would need to either use a child process (easiest) that creates a directory listing and parse that or write your own streamable binding to scandir() (on nix) and/or whatever the equivalent is on Windows and use that. For the latter, you may want to use the libuv code (nix, Windows) as a guide.

Related

How do I get the filename of an open std::fs::File in Rust?

Multiple Excel files using SSIS [duplicate]

fs.writeFile() and fs.readFile() strange behavior

node.js and ncp module - fails to copy single file

FS won't create file

Categories

Resources