Unzip 1million+ archives into correct folder structure - linux

Firstly I am completely new to Linux as I set up an AWS Ubuntu instance for this project so be kind.
I have downloaded approximately 1 million .zip files containing .csv's in the following folder structure (financial data):
Main Folder
├── Exchange1
│ ├── Pair1
│ │ └── Month
│ │ └── .Zips
│ └── PairN
│ └── Month
│ └── .Zips
└── ExchangeN
├── Pair1
│ └── Month
│ └── .Zips
├── Pair2
│ └── Month
│ └── .Zips
└── PairN
└── Month
└── .Zips
and I would like to extract every zip under it's parent Pair folder disregarding the month folder so that the new structure would look like this:
Main Folder
├── Exchange1
│ ├── Pair1
│ │ └── Extracted .csv's
│ └── PairN
│ └── Extracted .csv's
└── ExchangeN
├── Pair1
│ └── Extracted .csv's
├── Pair2
│ └── Extracted .csv's
└── PairN
└── Extracted .csv's
Appreciate any help, thanks.

Run this script in your main folder.
#! /bin/sh
#find all nested zip files and assign it to an array
files=( $(find . -iname "*zip*" -exec echo {} \;) )
for i in "${files[#]}"
do
#extract the path to unzip the archive
path=$(echo "$i" | cut -d '/' -f-3)
unzip $i -d $path
done
Please be careful when you run this. Assigning the output of ls/find to an array can have weird and unexpected consequences when the directory contains filenames with a newline, pipe etc.
Quoting Greg
Unix allows almost any character in a filename, including whitespace, newlines, commas, pipe symbols, and pretty much anything else you'd ever try to use as a delimiter except NUL.
Reference
1. Why you shouldn't parse the output of ls

The simplest thing which might possibly work is:
find . -iname "*.zip" -execdir unzip -d ../ {} ";"
issued from Main Folder/.
But first try with an echo for visual control:
find ./Exchange1/Pair1 -iname "*.zip" -execdir echo unzip -d ../ {} ";"
If this looks promising, make a copy of some test folders and try over there:
find ./Exchange1/ -iname "*.zip" -execdir unzip -d ../ {} ";"
If it works, take the real files.

Related

How do you remove an entire directory except for certain subdirectories in linux?

Suppose I run the following command in linux:
$ mkdir -p mp3 jpeg/dir1 jpeg/dir2 txt
$ touch mp3/1.mp3 mp3/2.mp3 mp3/3.mp3
$ touch jpeg/1.jpeg jpeg/2.jpeg jpeg/3.jpeg
$ touch txt/1.txt txt/2.txt txt/3.txt
This will create a directory structure like:
├── jpeg
│ ├── 1.jpeg
│ ├── 2.jpeg
│ └── 3.jpeg
│ └── dir1
│ └── dir2
├── mp3
│ ├── 1.mp3
│ ├── 2.mp3
│ └── 3.mp3
└── txt
├── 1.txt
├── 2.txt
└── 3.txt
How do I invoke the linux "rm" command to remove everything in the "jpeg" directory except for "dir2" subdirectory?
So I'm looking for a command that looks something like:
rm -rf -not dir2 jpeg
But when I run that command on Centos 7, I get the following error message:
rm: invalid option -- 'n'
My target output directory structure should look like:
├── jpeg
│
│
│
│
│ └── dir2
├── mp3
│ ├── 1.mp3
│ ├── 2.mp3
│ └── 3.mp3
└── txt
├── 1.txt
├── 2.txt
└── 3.txt
Would appreciate all/any help from the linux scripting community
You can use this find command to delete everything in jpeg directory except dir2:
find jpeg -mindepth 1 -not -path 'jpeg/dir2' -prune -delete

Copy multiple directories to another multiple directories from Linux shell [duplicate]

This question already has answers here:
How to copy a file to multiple directories using the gnu cp command
(22 answers)
Closed 9 months ago.
I want to copy several directories to another directories. How do I do it from the shell command prompt? for example:
Project
├── directory1
│ └── files1
├── directory2
│ └── files2
└── directory3
└── files3
to :
Project
├── directory1
│ └── files1
├── directory2
│ └── files2
├── directory3
│ └── files3
├── directory1.copy
│ └── files1
├── directory2.copy
│ └── files2
└── directory3.copy
└── files3
tried this:
mkdir directory{1..3}.copy
cp -r directory{1..3} directory{1..3}.copy
but all directories (and files inside) copy in directory3.copy
Indeed, cp just accepts all arguments before the first as sources and the last one as destination. If you want to copy to multiple places, you need a loop.
for dir in ./*/. # or for dir in directory{1..3}
do
cp -r "$dir" "$dir.copy"
done

does exec mv command used with find deletes the files?

I want to rename a few files in a folder with one command rather than rename them one by one.
I have a folder named /u0x/XMLs where there are around 500 folders. Each of these folders has a file named PROCESSED_ADI.XML. I want to rename all these files to ADI.XML.
I tried to use find and -exec mv together:
XMLs]$ find . -name "*.XML" -exec mv {} ADI.XML \;
After the command was executed all the files have been deleted from the folders or moved from there, which I am not sure. Could someone shed some light on what went wrong and is there any way the files could be retrieved?
After the command was executed all the files have been deleted from the folders or moved from there, which I am not sure
You're renaming the files but the destination or rather the renamed file has a different location, It is the current location/path where you ran the find command.
You can use -execdir if your find supports it.
find . -name "*.XML" -execdir mv -v {} ADI.XML \;
Here is definition of execdir
-execdir command {} +
Like -exec, but the specified command is run from the subdirectory containing the matched file, which is not normally the directory in which you started find. As with
-exec, the {} should be quoted if find is being invoked from a shell. This a much more secure method for invoking commands, as it avoids race conditions during resolu-
tion of the paths to the matched files. As with the -exec action, the `+' form of -execdir will build a command line to process more than one matched file, but any given
invocation of command will only list files that exist in the same subdirectory. If you use this option, you must ensure that your $PATH environment variable does not
reference `.'; otherwise, an attacker can run any commands they like by leaving an appropriately-named file in a directory in which you will run -execdir. The same
applies to having entries in $PATH which are empty or which are not absolute directory names. If any invocation with the `+' form returns a non-zero value as exit sta-
tus, then find returns a non-zero exit status. If find encounters an error, this can sometimes cause an immediate exit, so some pending commands may not be run at all.
The result of the action depends on whether the + or the ; variant is being used; -execdir command {} + always returns true, while -execdir command {} ; returns true only
if command returns 0.
Here is a bit of demo about what/why did your files got removed deleted.
Let's create a dummy directories and file anywhere but here we will create them in /tmp
cd /tmp
Let's create the directories and files.
mkdir -p u0x/XMLs/folder_{1..10} && touch u0x/XMLs/folder_{1..10}/PROCESSED_ADI.XML
Now check what was created inside those directories, using the tree command.
tree u0x/
Output
u0x/
└── XMLs
├── folder_1
│   └── PROCESSED_ADI.XML
├── folder_10
│   └── PROCESSED_ADI.XML
├── folder_2
│   └── PROCESSED_ADI.XML
├── folder_3
│   └── PROCESSED_ADI.XML
├── folder_4
│   └── PROCESSED_ADI.XML
├── folder_5
│   └── PROCESSED_ADI.XML
├── folder_6
│   └── PROCESSED_ADI.XML
├── folder_7
│   └── PROCESSED_ADI.XML
├── folder_8
│   └── PROCESSED_ADI.XML
└── folder_9
└── PROCESSED_ADI.XML
11 directories, 10 files
Now we execute your find command, but with the -v flag/option.
find u0x/ -name "*.XML" -exec mv -v {} ADI.XML \;
Output
renamed './XMLs/folder_2/PROCESSED_ADI.XML' -> 'ADI.XML'
renamed './XMLs/folder_9/PROCESSED_ADI.XML' -> 'ADI.XML'
renamed './XMLs/folder_7/PROCESSED_ADI.XML' -> 'ADI.XML'
renamed './XMLs/folder_10/PROCESSED_ADI.XML' -> 'ADI.XML'
renamed './XMLs/folder_6/PROCESSED_ADI.XML' -> 'ADI.XML'
renamed './XMLs/folder_3/PROCESSED_ADI.XML' -> 'ADI.XML'
renamed './XMLs/folder_8/PROCESSED_ADI.XML' -> 'ADI.XML'
renamed './XMLs/folder_1/PROCESSED_ADI.XML' -> 'ADI.XML'
renamed './XMLs/folder_4/PROCESSED_ADI.XML' -> 'ADI.XML'
renamed './XMLs/folder_5/PROCESSED_ADI.XML' -> 'ADI.XML'
Now check what happened to the files using the tree command.
tree u0x/
Output
u0x/
├── ADI.XML
└── XMLs
├── folder_1
├── folder_10
├── folder_2
├── folder_3
├── folder_4
├── folder_5
├── folder_6
├── folder_7
├── folder_8
└── folder_9
11 directories, 1 file
Now If you take a good look at the above output from tree all of the PROCESSED_ADI.XML are gone from it's location/folder but there is one ADI.XML inside the parent directory/folder u0x
Like what was mentioned by #Gordon Davidson All the xml (as long as the file ends in .xml) files has been moved in one location and it was overwritten again and again so now you have only one file named. ADI.XML
Using -execdir will have the expected output you're looking for.
The output of the tree command if -execdir was used, should be:
tree u0x/
u0x/
└── XMLs
├── folder_1
│   └── ADI.XML
├── folder_10
│   └── ADI.XML
├── folder_2
│   └── ADI.XML
├── folder_3
│   └── ADI.XML
├── folder_4
│   └── ADI.XML
├── folder_5
│   └── ADI.XML
├── folder_6
│   └── ADI.XML
├── folder_7
│   └── ADI.XML
├── folder_8
│   └── ADI.XML
└── folder_9
└── ADI.XML
11 directories, 10 files
First of all, do not use a wildcard. Even with -execdir, you're creating a circumstance where all .XML files in a directory will be overwritten with the last match in that directory.
find . -type f -name 'PROCESSED_ADI.XML' -exec sh -c 'for i do echo mv "${i}" "$(dirname "$i")/ADI.XML"; done' _ {} +
Run this, as a dry run. If it looks ok, remove echo.

Move files to parent directory of current location

I have a lot of folders that have a folder inside them, with files inside. I want to move the 2nd level files into the 1st level and do so without knowing their names.
Simple example:
Before running a script:
/temp/1stlevel/test.txt
/temp/1stlevel/2ndlevel/test.rtf
After running a script:
/temp/1stlevel/test.txt
/temp/1stlevel/test.rtf
I'm getting very close but I'm missing something and I'm sure it's simple/stupid. Here's what I'm running:
find . -mindepth 3 -type f -exec sh -c 'mv -i "$1" "${1%/*}"' sh {} \;
Here's what that's getting me:
mv: './1stlevel/2ndlevel/test.rtf' and './1stlevel/2ndlevel/test.rtf' are the same file
Any suggestions?
UPDATE: George, this is great stuff, thank you! I'm learning a lot and taking notes. Using the mv command instead of the more complicated one is brilliant. Far from the first time I've been accused of doing something the hardest way possible!
However, while it works great with 1 set of folders, if I have more, it doesn't work as intended. Here's what I mean:
Before:
new
└── temp
├── Folder1
│ ├── SubFolder1
│ │ └── SubTest1.txt
│ └── Test1.txt
├── Folder2
│ ├── SubFolder2
│ │ └── SubTest2.txt
│ └── Test2.txt
└── Folder3
├── SubFolder3
│ └── SubTest3.txt
└── Test3.txt
After:
new
└── temp
└── Folder3
├── Folder1
│ ├── SubFolder1
│ └── Test1.txt
├── Folder2
│ ├── SubFolder2
│ └── Test2.txt
├── SubFolder3
├── SubTest1.txt
├── SubTest2.txt
├── SubTest3.txt
└── Test3.txt
Desired:
new
└── temp
├── Folder1
│ ├── SubFolder1
│ ├── SubTest1.txt
│ └── Test1.txt
├── Folder2
│ ├── SubFolder2
│ ├── SubTest2.txt
│ └── Test2.txt
└── Folder3
├── SubFolder3
├── SubTest3.txt
└── Test3.txt
If one wanted to get fancy*:
new
└── temp
├── Folder1
│ ├── SubTest1.txt
│ └── Test1.txt
├── Folder2
│ ├── SubTest2.txt
│ └── Test2.txt
└── Folder3
├── SubTest3.txt
└── Test3.txt
I don't need to get fancy, though, 'cause later in my script I just remove empty folders.
BTW, that took me forever in Notepad++ to draw. What did you use?
Your find . -mindepth 3 -type f -exec sh -c 'mv -i "$1" "${1%/*}"' sh {} \;
attempt is very close to being right. 
A useful technique when debugging complex commands
is to insert echo statements to see what is happening. 
So, if we say$ find . -mindepth 3 -type f -exec sh -c 'echo mv -i "$1" "${1%/*}"' sh {} \;we get
mv -i ./Folder1/SubFolder1/SubTest1.txt ./Folder1/SubFolder1
mv -i ./Folder2/SubFolder2/SubTest2.txt ./Folder2/SubFolder2
mv -i ./Folder3/SubFolder3/SubTest3.txt ./Folder3/SubFolder3
which makes perfect sense — it’s finding all the files at depth 3 (and beyond),
stripping the last level off the pathname, and moving the file there. 
But,mv (path_to_file) (path_to_directory) means
move the file into the directory.
So the command mv -i ./Folder1/SubFolder1/SubTest1.txt ./Folder1/SubFolder1
means move Folder1/SubFolder1/SubTest1.txt into Folder1/SubFolder1 —
but that’s where it already is. 
Therefore, you got error messages saying
that you were moving a file to where it already was.
As is clear from your illustration,
you want to move SubTest1.txt into Folder1. 
One quick fix is
$ find . -mindepth 3 -type f -exec sh -c 'mv -i "$1" "${1%/*}/.."' sh {} \;
which uses .. to go up from SubFolder1 to Folder1:
mv -i ./Folder1/SubFolder1/SubTest1.txt ./Folder1/SubFolder1/..
mv -i ./Folder2/SubFolder2/SubTest2.txt ./Folder2/SubFolder2/..
mv -i ./Folder3/SubFolder3/SubTest3.txt ./Folder3/SubFolder3/..
I believe that that’s bad style, although I can’t figure out quite why. 
I would prefer
$ find . -mindepth 3 -type f -exec sh -c 'mv -i "$1" "${1%/*/*}"' sh {} \;
which uses %/*/* to remove two components from the pathname of the file
to get what you really want,
mv -i ./Folder1/SubFolder1/SubTest1.txt ./Folder1
mv -i ./Folder2/SubFolder2/SubTest2.txt ./Folder2
mv -i ./Folder3/SubFolder3/SubTest3.txt ./Folder3
You can then use
$ find . -mindepth 2 -type d –delete
to delete the empty SubFolderN directories. 
If, through some malfunction, any of them is not empty,
find will leave it alone and issue a warning message.
Let me use this example to illustrate:
Tree structure:
new
└── temp
└── 1stlevel
├── 2ndlevel
│   └── text.rtf
└── test.txt
Move with:
find . -mindepth 4 -type f -exec mv {} ./*/* \;
Result after move:
new
└── temp
└── 1stlevel
├── 2ndlevel
├── test.txt
└── text.rtf
Where you run it from matters, I am running from one folder up from the temp folder, if you want to run it from the temp folder then the command would be:
find 1stlevel/ -mindepth 2 -type f -exec mv {} ./* \;
Or:
find ./ -mindepth 3 -type f -exec mv {} ./* \;
Please look closely at the section find ./ -mindepth 3, remember that -mindepth 1 means process all files except the starting-points. So if you start from temp and are after a file in temp/1st/2nd/ then you will access it with -mindepth 3 starting at temp. Please see: man find.
Now for the destination I used ./*/*, interpretation "from current (one up from temp, mine was new) directory down to temp, then 1stlevel, so:
./: => new folder
./*: => new/temp folder
./*/*: => new/temp/1stlevel
But all that is for the find command but another trick is to use the mv command only from the new folder:
mv ./*/*/*/* ./*/*
This is run from the new folder in my example (in other words from one folder up the temp folder). Make adjustments to run it at different levels.
To run from the temp folder:
mv ./*/*/* ./*
If your bordered about time since you mentioned you had a lot of files, then the mv option beats the find option. See the time results for just three files:
find:
real 0m0.004s
user 0m0.000s
sys 0m0.000s
mv:
real 0m0.001s
user 0m0.000s
sys 0m0.000s
Update:
Since OP wants a script to access multiple folders I came with this:
#!/usr/bin/env bash
for i in ./*/*/*;
do
if [[ -d "$i" ]];
then
# Move the files to the new location
mv "$i"/* "${i%/*}/"
# Remove the empty directories
rm -rf "$i"
fi
done
How to: Run from the new folder: ./move.sh, remember to make the script executable with chmod +x move.sh.
Target directory structure:
new
├── move.sh
└── temp
├── folder1
│   ├── subfolder1
│   │   └── subtext1.txt
│   └── test1.txt
├── folder2
│   ├── subfolder2
│   │   └── subtext2.txt
│   └── test1.txt
└── folder3
├── subfolder3
│   └── subtext3.txt
└── test1.txt
Get fancy result:
new
├── move.sh
└── temp
├── folder1
│   ├── subtext1.txt
│   └── text1.txt
├── folder2
│   ├── subtext2.txt
│   └── text2.txt
└── folder3
├── subtext3.txt
└── text3.txt
mv YOUR-FILE-NAME ../
Ii thould work this way if u have writing permissions
Have your script navigate to each directory where you need the files moved "up," then you can have find find each file in the directory, then move them up one directory:
$ find . -type f -exec mv {} ../. \;

bash script to rename following a pattern in subdirectories and make a copy

I am trying to do an iterative renaming of certain files in all directories.
homefolder/folder1/ouput/XXXXX_ab.png
homefolder/folder1/ouput/XXXXX_abcdefg.png
homefolder/folder2/ouput/XXXXX_ab.png
homefolder/folder2/ouput/XXXXX_abcdefg.png
homefolder/folder3/ouput/XXXXX_ab.png
homefolder/folder3/ouput/XXXXX_abcdefg.png
...
homefolder/folder500/ouput/XXXXX_ab.png
homefolder/folder500/ouput/XXXXX_abcdefg.png
I want to get the folder name (ex. folder1, folder2, ... folder500) and pass it to the two png files as a prefix and remove those five Xs at the beginning of each file.
The pattern of those png files are:
XXXXX_ab.png
XXXXX_abcdrfg.png
so only the first five characters are different in each subdirectory, which will be replaced by the name of its parent directory, those folder names.
the results will be:
homefolder/folder1/ouput/folder1_ab.png
homefolder/folder1/ouput/folder1_abcdefg.png
homefolder/folder2/ouput/folder2_ab.png
homefolder/folder2/ouput/folder2_abcdefg.png
homefolder/folder3/ouput/folder3_ab.png
homefolder/folder3/ouput/folder3_abcdefg.png
...
homefolder/folder500/ouput/folder500_ab.png
homefolder/folder500/ouput/folder500_abcdefg.png
at the end of renaming, create a copy of these two newly renamed files inside another folder in the homefolder, for example all_png_folder.
find . -iname "*_ab.png" -exec rename _ab.png folder1_ab.png '{}' \;
find . -name "*_ab.png" -exec cp {} ./all_png_folder \;
Here is a start, the copying at the end should be a trivial addition.
#!/usr/bin/env bash
files=$(find . -type f -name "*_ab.png" -or -name "*_abcdefg.png")
for file in $files; do
foldername=$(cut -d '/' -f 2 <<< $file)
# The name of the png-file minus the leading xxxxxx
pngfile=$(basename "$file" | cut -d '_' -f 2)
destinationdir=$(dirname "$file")
mv $file "$destinationdir/$foldername"'_'"$pngfile"
done
Demo
$ tree
.
├── folder1
│   └── ouput
│   ├── foo_bar.png
│   ├── xxxxx_abcdefg.png
│   └── xxxxx_ab.png
├── folder2
│   └── ouput
│   ├── xxxxx_abcdefg.png
│   └── xxxxx_ab.png
└── rename.sh
4 directories, 6 files
$ ./rename.sh
$ tree
.
├── folder1
│   └── ouput
│   ├── folder1_abcdefg.png
│   ├── folder1_ab.png
│   └── foo_bar.png
├── folder2
│   └── ouput
│   ├── folder2_abcdefg.png
│   └── folder2_ab.png
└── rename.sh

Resources