Azcopy - Copy only files without folders - azure

As the title suggests I am trying to copy all files with a specific extension, within a folder structure, to blob storage without recreating the local folder structure;
This works fine when I run the following;
azcopy cp 'H:\folder1\folder2\*.txt' 'https://storage.blob.core.windows.net/folderA/folderB/?saskey'
This copies all *.txt files to /folderB
I have tried many variations of the following;
azcopy.exe cp 'H:\folder1\*\*' 'https://storage.blob.core.windows.net/folderA/folderB/?saskey' --recursive --include-pattern '*.txt'
Regardless of what I try I end up with the following;
/folderA/folderB
/folder1/fileA.txt
/folder2/fileB.txt
I was under the impress that is what the "--recursive" switch was for, but what I am doing is either not supported or my syntax is wrong.
I have read through this;
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-files#use-wildcard-characters
I could probably script it with something similar to this;
AzCopy - Wildcards In Middle Of Pattern?
But was hoping this was built-in functionality

What you are looking for is not supported. Using --recursive would result in the subdirectory structure of the source retained in the destination. I am not aware of any flag to prevent that.
Actually that helps to avoid conflict. Let's say for example, you have files /folder1/fileA.txt and /folder2/fileA.txt in source. If you try to copy flat in destination (without subpath), that would have caused conflict since both file names are fileA.txt.

Related

Entering a proper path to files on DBFS

I uploaded files to DBFS:
/FileStore/shared_uploads/name_surname#xxx.xxx/file_name.csv
I tried to access them by pandas and I always receive information that such files don't exist.
I tried to use the following paths:
/dbfs/FileStore/shared_uploads/name_surname#xxx.xxx/file_name.csv
dbfs/FileStore/shared_uploads/name_surname#xxx.xxx/file_name.csv
dbfs:/FileStore/shared_uploads/name_surname#xxx.xxx/file_name.csv
./FileStore/shared_uploads/name_surname#xxx.xxx/file_name.csv
What is funny, when I check them by dbutils.fs.ls I see all the files.
I found this solution, and I tried it already: Databricks dbfs file read issue
Moved them to a new folder:
dbfs:/new_folder/
I tried to access them from this folder, but still, it didn't work for me. The only difference is that I copied files to a different place.
I checked as well the documentation: https://docs.databricks.com/data/databricks-file-system.html
I use Databricks Community Edition.
I don't understand what I'm doing wrong and why it's happening like that.
I don't have any other ideas.
The /dbfs/ mount point isn't available on the Community Edition (that's a known limitation), so you need to do what is recommended in the linked answer:
dbutils.fs.cp(
'dbfs:/FileStore/shared_uploads/name_surname#xxx.xxx/file_name.csv',
'file:/tmp/file_name.csv')
and then use /tmp/file_name.csv as input parameter to Pandas' functions. If you'll need to write something to DBFS, then you do other way around - write to local file /tmp/..., and copy that file to DBFS.

Node : What is the right way to delete all the files from a directory?

So I was trying to delete all my files inside a folder using node.
I came across 2 methods .
Method 1
Delete the folder using rmkdir. But if I plan on adding the images on the same folder then I use mkdir and creates the same folder again and appends the files to it.
Example: I have an Add Files and Delete ALL button. When I click deleteAll , the folder gets deleted. And when I click add then the folder gets created and the file gets added to that folder
Method 2
Using readdir , I loop through the files and stores in an array and then delete only the files instead of the folder.
Which is the best way to do it ? If its not among these then please advice me a better solution.
The rm function of ShellJS will do the trick. It works as a one-liner, and it works cross-platform, and is well tested and documented. It even supports recursive deletes.
Basically, something such as:
const { rm } = require('shelljs');
rm('-rf', '/tmp/*');
(Sample code taken from ShellJS' documentation.)

How to copy the content of a a folder which its name is partially known in centos

I have a folder which has another folder inside it (lets say test and insidetest-some random number). Now what I am trying to do is to copy the content of the insidetest-... into another folder. The problem is that I know half of the name of the folder which in test folder and I do not know the the randon number attached to it. (Just for more explanation I get the a zip file from bitbucket api and then after unzip it it has this structure. So I can never know the exact name of the folder inside test. If I knew that I could simply use sth like this:
cp home/test/* /home/myfolder/
But I cannot do it in this situation. Can anyone help?
If some part of the name is constant then use the command like this:-
cp home/test/halfname* /home/folder/ -r

Omit uploaded files with AzCopy

I have uploaded with Cloudberry Explorer some files/folders to my Azure container but now I'm gonna change Cloudberry for AzCopy.
What I need is to omit those uploaded files. I don't know if can be done with a AzCopy parameter. the files to be uploaded are stored in a server so doing it manually is impossible due for are thousands of thousands of files/folders.
thanks in advance
As it is documented in azcopy reference
--overwrite string Overwrite the conflicting files and blobs at the destination if this flag is set to true. Possible values include 'true', 'false', 'ifSourceNewer', and 'prompt'. (default "true")
So something like this should work:
azcopy.exe copy "source location" "destination location" --overwrite=false
Use /XO flag in the command. It will not copy/replace old files. Sample command,
AzCopy /Source:C:\myfolder /Dest:https://myaccount.blob.core.windows.net/mycontainer /DestKey:key /XO
If the files uploaded by another tool has different naming convention with the new ones, you could use option /Pattern to upload only the new files,
e.g. old files have naming convention like “abcxxxx”, new files have naming convention like “xyzxxx”, then please specify /Pattern:xyz* to copy the new files only.
Or use option /xo (means exclude old files) to copy new files only, note that AzCopy will compare local files' change time with the 'Last Modified Time' of the destination blobs when you specified option /xo and /xn, please make sure the uploaded old files’ ‘Last modified time’ is same or newer than the their local copies’ change time, otherwise the old files will be uploaded again when you specified option /xo. You can use option /MT to set ‘Last Modified Time’ as same as local copies’ change time during upload.
For more details, please visit http://aka.ms/azcopy
Thanks

Adding files to sourcecontrol on linux using cleartool

I have a file that i want to add to sourcecontrol on linux using cleartool .
I've followed the IBM documentation for this, i've tried this:
cleartool mkelem testScript.sh
I got an error: Can't modify directory "." because it is not checked out.
I also would like to know how can i checkout/checkin files or directories and setting activities.
You need to checkout the parent folder first.
cd /path/to/file/
cleartool mkact newfile
cleartool checkout -c "add file" .
cleartool mkelem testScript.sh
cleartool checkin -nc
The cleartool mkact would work if you are in an UCM view.
It will create and set a new activity, which will record the files and folder you will modify.
Here, the new activity newFile will record the new version of the parent folder, as well as the version 0 and 1 of the file.
You should create separate questions for .. separate questions...
Going back to the original - the reason why it isn't working is, as VonC has pointed out, you haven't checked out the parent of the file. Remember, when you run "cleartool mkelem", you are about to modify the contents of the parent directory (. in this case) by adding a new "pointer" to the element you're now creating. As with everything else in clearcase, when you want to modify the contents of an element, you have to check it out first.
One of ClearCase's greatest strength (and hardest to wrap one's head around) is the concept of an "element", IMO. "Everything" behaves similarly with an element. Making any change to an "element" (file or directory) means you have to check it out first to make that change.
In the case of a file, that's easy to grasp - you're just editing lines in a file. For a directory, it's almost as easy - you can think of a directory as just a list of pointers to data blobs. We make the name of the blob something convenient we can remember (like foo.java or myapplication.cc or README.md). But we can also change the name of the pointer (even though it points to the same data blob) by renaming a file. We can remove the pointer to the blob without impacting the blob itself by using "rmname". That's essentially what "rmname" does.
In ClearCases' case, the mkelem command is a little bit special - it creates the initial datablob, and adds a pointer to that datablob in the current directory (kind of does 2 things at once).

Resources