We are using Azure File Shares (File shares, not GPV2, meaning we're not using blobs or queues, just File Shares) to store our files.
We need to check if a list of file paths exist of not.
Is there a "bulk" version of ShareFileClient.ExistsAsync ?
What's the best workaround otherwise ?
We tried calling Exists on each path, each call in it's own task, but it takes too long to return (for 250 paths it takes around 25 seconds):
var tasks = paths.AsParallel().Select(p => Task.Run(() =>
{
// share is a captured variable of type ShareClient
var dir = share.GetDirectoryClient(GetDirName(p));
var file = dir.GetFileClient(GetFileName(p));
var result = file.Exists();
return result.Value;
}));
As such there's no direct way to check the existence of multiple files.
However, there is a workaround:
What you can do is list the files and subdirectories in a directory using GetFilesAndDirectoriesAsync(ShareDirectoryGetFilesAndDirectoriesOptions, CancellationToken) method. Once you get the list, you will loop over the list and and can check if a file by a particular name exists in a directory or not.
This will be much faster and cost efficient because you are making a single request to get the list of files instead of calling FileClient.Exists method on each file where each method call is a separate network request.
Related
I want to search for files in specific bucket (Thousands of files) by its name.
I achieved doing that using gsutil
gsutil ls gs://bucket-name/**name.json
but in the GCS npm library I found only getFiles function which only let me list files if I only know the directory/prefix but here I want to search for the whole bucket so is there a way to do that
Thanks for your help
The API does not support searching for a suffix. Behind the scenes gsutil ls gs://bucket-name/**name.json is listing the entire bucket and then filtering in the client for objects with name ending name.json.
You can do this with getFiles, just leave prefix and delimiter unset.
If you have a very large bucket this may take up a large amount of memory, this can be mitigated by manual pagination or with getFilesStream. It will still take a long time, but that is unavoidable.
Code to do this could look like:
const [files] = await storage.bucket("bucket-name").getFiles();
files.forEach(file => {
if(file.name.endsWith("name.json") {
// do stuff
}
});
}
So, i am trying to download the contents of a directory via sftp using nodejs, and so far I am getting stuck with an error.
I am using the ssh2-sftp-client npm package and for the most part it works pretty well as i am able to connect to the server and list the files in a particular remote directory.
Using the fastGet method to download a file also works without any hassles, and since all the methods are promise based i assumed i could easily download all the files in the directory simply enough, by doing something like:
let main = async () => {
await sftp.connect(config.sftp);
let data = await sftp.list(config.remote_dir);
if (data.length) data.map(async x => {
await sftp.fastGet(`${config.remote_dir}/${x.name}`, config.base_path + x.name);
});
}
So it turns out the code above successfully downloads the first file, but then crashes with the following error message:
Error: Failed to get sandbox/demo2.txt: The requested operation cannot be performed because there is a file transfer in progress.
This seems to indicate that the promise from fastGet is resolving too early as the file transfer is supposed to be over when the next element of the file list is processed.
I tried to use the more traditional get() instead but it is using streams, and it fails with a different error. After researching it seems there's been a breaking change regarding streams in node 10.x. well in my case calling get simply fails (not even downloading the first file).
Does anyone know a workaround to this? or else, another package that can download several files by sftp?
Thanks!
I figured out, since the issue was concurrent download attempts on one client connection, i could try to manage it with one client per file download. I ended up with the following recursive function.
let getFromFtp = async (arr) => {
if (arr.length == 0) return (processFiles());
let x = arr.shift();
conns.push(new Client());
let idx = conns.length - 1;
await conns[idx].connect(config.sftp.auth);
await conns[idx]
.fastGet(`${config.sftp.remote_dir}/${x.name}`, `${config.dl_dir}${x.name}`);
await connections[idx].end();
getFromFtp(arr);
};
Notes about this function:
The array parameter is a list of files to download, presumably fetched using list() beforehand
conns was declared as an empty array and is used to contain our clients.
using array.prototype.shift(), to gradually deplete the array as we go through the file list
the processFiles() method is fired once all the files were downloaded.
this is just the POC version. of couse we need to add the error management to that.
I'm trying to organize assets(images) into folders with a unique id for each asset, the reason being that each asset will have multiple formats (thumbnails, and formats optimized for web and different viewports).
So for every asset that I upload to the folder assets-temp/ is then moved and renamed by the functions into assets/{unique-id}/original{extension}.
example: assets-temp/my-awesome-image.jpg should become assets/489023840984/original.jpg.
note: I also keep track of the files with their original name in the DB and in the original's file metadata.
The issue: The function runs and performs what I want, but it also adds a folder named assets/{uuid}/original/ with nothing in it...
The function:
exports.process_new_assets = functions.storage.object().onFinalize(async (object) => {
// Run this function only for files uploaded to the "assets-temp/" folder.
if (!object.name.startsWith('assets-temp/')) return null;
const file = bucket.file(object.name);
const fileExt = path.extname(object.name);
const destination = bucket.file(`assets/${id}/original${fileExt}`);
const metadata = {
id,
name: object.name.split('/').pop()
};
// Move the file to the new location.
return file.move(destination, {metadata});
});
I am guessing that this might happen if the operation of uploading the original image triggers two separate events: one that creates the directory assets-temp and one that creates the file assets-temp/my-awesome-image.jpg.
If I guessed right, the first operation will trigger your function with a directory object (named "assets-temp/"). This matches your first if, so the code will proceed and do
destination = bucket.file('assets/${id}/original') // fileExt being empty
and then call file.move - this will create assets/id/original/ directory.
Simply improve your 'if' to exclude a file named "assets-temp/".
According to the documentation there is no such thing as folders in cloud storage, however, it is possible to emulate them, like you can do by using the console GUI. When creating folders what really happens is that an empty object is created(zero bytes of space) but its name ends with a forward slash, also folder names can end with _$folder$ but it is my understanding that that is how things worked in older versions so for newer buckets the forward slash is enough.
I'm trying to build an archive file using the Archive package and I need to archive a whole directory.
So far I see that I can archive a single file by doing this:
io.File file = new io.File(pathToMyFile);
List<int> bytes = await file.readAsBytes();
io.FileStat stats = await file.stat();
Archive archive = new Archive();
archive.addFile(new ArchiveFile.noCompress('config.yaml', stats.size, bytes)
..mode = stats.mode
..lastModTime = stats.modified.millisecond);
List<int> data = new ZipEncoder().encode(archive, level: Deflate.NO_COMPRESSION);
await new io.File('output.war').writeAsBytes(data, flush: true);
But the to create an ArchiveFile I need the bytes representing the file, however I feel like it would be nice to have the whole directory as Bytes to do this. Is there a way to do this? It seems the Dart API on Directory is pretty limited.
How does one usually go to let's say copy a directory? Just call system cp ? I mean I would like to get a solution that would work on multiple platforms.
You do it file by file recursively. Grinder has a task for this just take a look at the implementation https://github.com/google/grinder.dart/blob/devoncarew_0.7.0-dev.1/lib/grinder_files.dart#L189
You get all files of a directory with
var fileList =
new io.Directory(path.join(io.Directory.current.absolute.path, 'lib'))
.list(recursive: true, followLinks: false);
and then you process one file after the other.
I want to parse through a root folder which is entered by the user by using multi threading and multi processing at different versions.But how can I distinguish while I am parsing through a root folder whether the next is a folder or a file?To summarize I want to learn how I can distinguish the upcoming is a file or a folder.I wanna learn this because if it is a folder then I let opening this folder to a dynamically thread and/or process.If it is a file the existing thread or process can continue its work without any necessarity to create any different thread and/or process.I hope I can express my problem.I am waiting your answers.Thank you.
You can check whether a path refers to a file or directory using the stat() function, and checking the st_mode field on the returned structure (see http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/stat.h.html).
On Windows, you can use GetFileAttributesEx to get the file attributes, which you can check to see if it is a file or a directory.
Note that whatever you use may be subject to a race condition if the file system is being updated by another thread or process at the same time, as the file/directory may be deleted and/or changed after you checked it and before you access it.
Here are some quick samples. It will be up to you to thread from multiple root locations, call these recursively, and sync all the data.
Under *nix systems;
struct dirent *entry;
while ((entry = readdir("/root")) != NULL)
{
if (entry->d_type == DT_DIR)
{
// do something
}
}
closedir(dir);
Under Windows:
WIN32_FIND_DATA findData;
HANDLE hFind = FindFirstFile(("C:\\root" + "*.*").c_str(), &findData);
do
{
if (findData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
// do something
}
} while (FindNextFile(hFind, &findData));
FindClose(hFind);