How do yoy recursively compare two directories in node - node.js

Id like to perform a comparison of two directories and all files within sub folders. The folder structure will be the same for both directories the files may be different. Call them directory A and directory B.
From that id like to create a directory C and directory D. All files in B that are newer than A or that are not found in A should copy over to C. Files missing from B that are found in A should be copied to directory D.
Id like to use node and either a library or run some other CLI tool like git perhaps that can do what I described without too much effort.
What would be some good approaches to accomplish this?

Get the list of filenames of both directories as two arrays, then find the difference between them.
const _ = require('lodash');
const fs = require('fs');
const aFiles = fs.readdirSync('/path/to/A');
const bFiles = fs.readdirSync('/path/to/B');
_.difference(aFiles, bFiles).forEach(v => {
// Files missing from B that are found in A should be copied to directory D
// Move file v to directory D
});
_.difference(bFiles, aFiles).forEach(v => {
// Files missing from A that are found in B should be copied to directory C
// Move file v to directory C
});

There's an npm package for this called dir-compare:
const dircompare = require('dir-compare');
const options = { compareSize: true };
const path1 = "dir1";
const path2 = "dir2";
const res = dircompare.compareSync(path1, path2, options)
console.log(res);

Related

How to compare filenames with difference in special character encoding?

I am working with a system that syncs files between two vendors. The tooling is written in Javascript and does a transformation on file names before sending it to the destination. I am trying to fix a bug in it that is failing to properly compare file names between the origin and destination.
The script uses the file name to check if it's on destination
For example:
The following file name contains a special character that has different encoding between source and destination.
source: Chinchón.jpg // hex code: ó
destination : Chinchón.jpg // hex code: 0xf3
The function that does the transformation is:
export const normalizeText = (text:string) => text
.normalize('NFC')
.replace(/\p{Diacritic}/gu, "")
.replace(/\u{2019}/gu, "'")
.replace(/\u{ff1a}/gu, ":")
.trim()
and the comparison is happening just like the following:
const array1 = ['Chinchón.jpg'];
console.log(array1.includes('Chinchón.jpg')); // false
Do I reverse the transformation before comparing? what's the best way to do that?
If i got your question right:
// prepare dictionary
const rawDictionary = ['Chinchón.jpg']
const dictionary = rawDictionary.map(x => normalizeText(x))
...
const rawComparant = 'Chinchón.jpg'
const comparant = normalizeText(rawComparant)
console.log(rawSources.includes(comparant))

Creating an empty folder on Dropbox with Python. Is there a simpler way?

Here's my sample code which works:
import os, io, dropbox
def createFolder(dropboxBaseFolder, newFolder):
# creating a temp dummy destination file path
dummyFileTo = dropboxBaseFolder + newFolder + '/' + 'temp.bin'
# creating a virtual in-memory binary file
f = io.BytesIO(b"\x00")
# uploading the dummy file in order to cause creation of the containing folder
dbx.files_upload(f.read(), dummyFileTo)
# now that the folder is created, delete the dummy file
dbx.files_delete_v2(dummyFileTo)
accessToken = '....'
dbx = dropbox.Dropbox(accessToken)
dropboxBaseDir = '/test_dropbox'
dropboxNewSubDir = '/new_empty_sub_dir'
createFolder(dropboxBaseDir, dropboxNewSubDir)
But is there a more efficient/simpler way to do the task ?
Yes, as Ronald mentioned in the comments, you can use the files_create_folder_v2 method to create a new folder.
That would look like this, modifying your code:
import dropbox
accessToken = '....'
dbx = dropbox.Dropbox(accessToken)
dropboxBaseDir = '/test_dropbox'
dropboxNewSubDir = '/new_empty_sub_dir'
res = dbx.files_create_folder_v2(dropboxBaseDir + dropboxNewSubDir)
# access the information for the newly created folder in `res`

get location of script requiring current script

I need to do some file operations with paths relative to the script that required the current one.
Say we have the following in ~/somewhere/file2.js
const y = require('~/file1.js');
And in ~/file1.js we have:
const x = require('./other/script.js'); //relative to ~/file1.js
And we invoke it like this:
cd ~/somedir
node ~/somewhere/file2.js
then within ~/other/script.js we can do this:
console.log(__dirname); // -> ~/other
console.log(__filename); // -> ~/other/script.js
console.log(process.cwd()); // -> ~/somedir
console.log(process.argv[0]); // -> path/to/node
console.log(path.resolve('.')); // -> ~/somedir
console.log(process.argv[1]); // -> ~/somewhere/file2.js
None of these are the path I need.
How, from ~other/script.js, can I determine the location of the script that required us - i.e ~/file1.js
To put it another way.
~/somewhere/file2.js requires ~/file1.js
and
~/file1.js requires ~/other/script.js
from within ~/other/script.js I need to do file operations relative to ~/somewhere/file1.js - how can I get it's location?
I actually only need the directory in which file1.js sits, so filename or directory will work for me.
You can use module.parent.filename inside of other/script.js, or you can pass the __dirname as a parameter to your module like require('other/script.js')(__dirname) (given your module exports a function)

Spark: Traverse HDFS subfolders and find all files with name "X"

I have a HDFS path and I want to traverse through all the subfolders and find all the files within that have the name "X".
I have tried to do this:
FileSystem.get( sc.hadoopConfiguration )
.listStatus( new Path("hdfs://..."))
.foreach( x => println(x.getPath))
But this only searches for files within 1 level and I want all levels.
You need to get all the files recursively. Loop through the path and get all the files, if it is a directory call the same function once again.
Below is a simple code you can modify as your configuration and test.
var fileSystem : FileSystem = _
var configuration: Configuration = _
def init() {
configuration = new Configuration
fileSystem = FileSystem.get(configuration)
val fileStatus: Array[FileStatus] = fileSystem.listStatus(new Path(""))
getAllFiles(fileStatus)
}
def getAllFiles(fileStatus: Array[FileStatus]) {
fileStatus.map(fs => {
if (fs.isDirectory)
getAllFiles(fileSystem.listStatus(fs.getPath))
else fs
})
}
Also filter the files that contains 'X' after getting the file list.

How to find missing files?

I have several files (with the same dim) in a folder called data for certain dates:
file2011001.bin named like this "fileyearday"
file2011009.bin
file2011020.bin
.
.
file2011322.bin
certin dates(files) are missing. What I need is just loop through these files
if file2011001.bin exist ok, if not copy any file in the directory and name it file2011001.bin
if file2011002.bin exist ok, if not copy any file in the directory and name it file2011002.bin and so on untill file2011365.bin
I can list them in R:
dir<- list.files("/data/", "*.bin", full.names = TRUE)
I wonder if it is possible thru R or any other language!
Pretty much what you'd expect:
AllFiles = paste0("file", 2010:2015, 0:364, ".bin")
for(file in AllFiles)
{
if(file.exists(file))
{
## do something
}
}

Resources