Splitting a large directory into smaller ones in Linux

Splitting a large directory into smaller ones in Linux - linux

I have a large directory named as application_pdf which contains 93k files. My use-case is to split the directory into 3 smaller subdirectories (to a different location that the original large directory) containing around 30k files each.
Can this be done directly from the commandline.
Thanks!

Using bash:
x=("path/to/dir1" "path/to/dir2" "path/to/dir3")
c=0
for f in *
do
mv "$f" "${x[c]}"
c=$(( (c+1)%3 ))
done

If you have the rename command from Perl, you could try it like this:
rename --dry-run -pe 'my #d=("dirA","dirB","dirC"); $_=$d[$N%3] . "/$_"' *.pdf
In case you are not that familiar with the syntax:
-p says to create output directories, à la mkdir -p
-e says to execute the following Perl snippet
$d[$N%3] selects one of the directories in array #d as a function of the serially incremented counter $N provided to the snippet by rename
The output value is passed back to rename by setting $_
Remove the --dry-run if it looks good. Please run on a small directory with a copy of 8-10 files first, and make a backup before trying on all your 93k files.
Test
touch {0,1,2,3,4,5,6}.pdf
rename --dry-run -pe 'my #d=("dirA","dirB","dirC"); $_=$d[$N%3] . "/$_"' *.pdf
'0.pdf' would be renamed to 'dirB/0.pdf'
'1.pdf' would be renamed to 'dirC/1.pdf'
'2.pdf' would be renamed to 'dirA/2.pdf'
'3.pdf' would be renamed to 'dirB/3.pdf'
'4.pdf' would be renamed to 'dirC/4.pdf'
'5.pdf' would be renamed to 'dirA/5.pdf'
'6.pdf' would be renamed to 'dirB/6.pdf'
More for my own reference, but if you don't have the Perl rename command, you could do it just in Perl:
perl -e 'use File::Copy qw(move);my #d=("dirA","dirB","dirC"); my $N=0; #files = glob("*.pdf"); foreach $f (#files){my $t=$d[$N++%3] . "/$f"; print "Moving $f to $t\n"; move $f,$t}'

Something like this might work:
for x in $(ls -1 originPath/*.pdf | head -30000); do
mv originPath/$x destinationPath/
done

Related

Alternative for AWK use

I'd love to have a more elegant solution for a mass rename of files, as shown below. Files were of format DEV_XYZ_TIMESTAMP.dat and we needed them as T-XYZ-TIMESTAMP.dat.
In the end, I copied them all (to be on the same side) into renamed folder:
ls -l *dat|awk '{system("cp " $10 " renamed/T-" substr($10, index($10, "_")+1))}'
So, first I listed all dat files, then picked up 10th column (file name) and executed a command using awk's system function.
The command was essentially copying of original filename into renamed folder with new file name.
New file name was created by removing (awk substring function) prefix before (including) _ and adding "T-" prefix.
Effectively:
cp DEV_file.dat renamed/T-file.dat
Is there a way to use cp or mv together with some regex rules to achieve the same in a bit more elegant way?
Thx

You may use this script:
for file in *.dat; do
f="${file//_/-}"
mv "$file" renamed/T-"${f#*-}"
done
You must avoid parsing output of ls command.

If you have rename utilitity
rename -E "s/[^_]*/T/" -e "s/_/-/g" *dat
Demo
$ls -1
ABC_DEF_TIMESTAMP.dat
DEV_XYZ_TIMESTAMP.dat
$rename -E "s/[^_]*/T/" -e "s/_/-/g" *
$ls -1
T-DEF-TIMESTAMP.dat
T-XYZ-TIMESTAMP.dat
$

This is how I would do it:
cpdir=renamed
for file in *dat; do
newfile=$(echo "$file" | sed -e "s/[^_]*/T/" -e "y/_/-/")
cp "$file" "$cpdir/$newfile"
done
The sed scripts transforms every non-underscore leading characters in a single T and then replaces every _ with -. If cpdir is not sure to exist before execution, you can simply add mkdir "$cpdir" after first line.

Rename large folder of Jpegs

I have a large folder of jpegs, which I would like to rename sequentially to image01.jpg, image02.jpg...image533jpg etc.
I have tried using the following
find ‘/myImages/‘ -maxdepth 1 -name ‘*.jpg’ | sort -n | awk 'BEGIN{ x=1 }{printf "mv \"%s\" \”/myImages/image%04d.jpg\”\n”, $0, x++ }' | bash
which I got from here: http://www.algissalys.com/how-to/how-to-quickly-rename-modify-and-scale-all-images-in-a-directory-using-linux
However, this is only returning
>
And then nothing happens, any suggestions would be great.

The easiest way to do that is with rename which you can install with homebrew using:
brew install rename
Then, you can go into your directory containing the images and run:
rename --dry-run -X -e '$_ = "$N"' *jpg
Sample Output
'a.jpg' would be renamed to '1.jpg'
'article.jpg' would be renamed to '2.jpg'
'blob-0.jpg' would be renamed to '3.jpg'
'blob-1.jpg' would be renamed to '4.jpg'
'blob-2.jpg' would be renamed to '5.jpg'
'blob-3.jpg' would be renamed to '6.jpg'
If that looks correct, you can run it again without the --dry-run to actually do it, rather than just telling you what it will do.
If you want your names zero-padded, the easiest is to let rename work out how much padding you need automatically like this:
rename --dry-run -X -N ...01 -e '$_ = "$N"' *jpg
The benefits of using rename are that:
it is simple and powerful
it will warn you before overwriting any files
it can do a dry run and tell you what would happen without actually doing anything
If you want an explanation of the command '$_ = "$N"' then read on...
The rename command is actually a Perl script, so the part I mention above is just a Perl script enclosed in single quotes. The $N is just a Perl variable that expands to be a sequentially increasing number. The Perl special variable $_ is filled with the name of the current file before your little Perl script is executed, and crucially, you are expected to set it to the name you want that input file renamed as.

You could do that with a bash script. Say you have the following in a file called rename_images.
#!/bin/bash
declare -a FILESERIES
FILESERIES=(`ls $1`)
NUM=${#FILESERIES[#]}
NEWNAME=$2
EXT=$3
for (( i=0; i<$NUM ; i++))
do
FI=${FILESERIES[$i]}
NEWFILENAME=`echo $NEWNAME$i$EXT`
mv $FI $NEWFILENAME
done
To do what you need, run the script from within the folder with all the images as follows:
./rename_images '*.jpg' image .jpg
And you should be sorted.

rename all files in folder through regular expression

I have a folder with lots of files which name has the following structure:
01.artist_name - song_name.mp3
I want to go through all of them and rename them using the regexp:
/^d+\./
so i get only :
artist_name - song_name.mp3
How can i do this in bash?

You can do this in BASH:
for f in [0-9]*.mp3; do
mv "$f" "${f#*.}"
done

Use the Perl rename utility utility. It might be installed on your version of Linux or easy to find.
rename 's/^\d+\.//' -n *.mp3
With the -n flag, it will be a dry run, printing what would be renamed, without actually renaming. If the output looks good, drop the -n flag.

Use 'sed' bash command to do so:
for f in *.mp3;
do
new_name="$(echo $f | sed 's/[^.]*.//')"
mv $f $new_name
done
...in this case, regular expression [^.].* matches everything before first period of a string.

Copy text from multiple files, same names to different path in bash (linux)

I need help copying content from various files to others (same name and format, different path).
For example, $HOME/initial/baby.desktop has text which I need to write into $HOME/scripts/baby.desktop. This is very simple for a single file, but I have 2500 files in $HOME/initial/ and the same number in $HOME/scripts/ with corresponding names (same names and format). I want append (copy) the content of file in path A to path B (which have the same name and format), to the end of file in path B without erase the content of file in path B.
Example content of $HOME/initial/*.desktop to final $HOME/scripts/*.desktop. I tried the following, but it don't work:
cd $HOME/initial/
for i in $( ls *.desktop ); do egrep "Icon" $i >> $HOME/scripts/$i; done

Firstly, I would backup $HOME/initial and $HOME/scripts, because there is lots of scope for people misunderstanding your question. Like this:
cd $HOME
tar -cvf initial.tar initial
tar -cvf scripts.tar scripts
That will put all the files in $HOME/initial into a single tarfile called initial.tar and all the files in $HOME/scripts into a single tarfile called scripts.tar.
Now for your question... in general, if you want to put the contents of FileB onto the end of FileA, the command is
cat FileB >> FileA
Note the DOUBLE ">>" which means "append" rather than single ">" which means overwrite.
So, I think you want to do this:
cd $HOME/initial/baby.desktop
cat SomeFile >> $HOME/scripts/baby.desktop/SomeFile
where SomeFile is the name of any file you choose to test with. I would test that has worked and then, if you are happy with that, go ahead and run the same command inside a loop:
cd $HOME/initial/baby.desktop
for SOURCE in *
do
DESTINATION="$HOME/scripts/baby.desktop/$SOURCE"
echo Appending "$SOURCE" to "$DESTINATION"
#cat "$SOURCE" >> "$DESTINATION"
done
When the output looks correct, remove the "#" at the start of the penultimate line and run it again.

I solved it, if some people want learn how to resolve is very simple:
using Sed
I need only the match (or pattern) line "Icon=/usr/share/some_picture.png into $HOME/initial/example.desktop to other with same name and format $HOME/scripts/example.desktop, but I had a lot of .desktop files (2500 files)
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do sed -ne '/Icon=/ p' $i >> $HOME/scripts/$i ; done
_________
If you need only copy all to other file with same name and format
using cat
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do cat $i >> $HOME/scripts/$i ; done

Move files and rename - one-liner

I'm encountering many files with the same content and the same name on some of my servers. I need to quarantine these files for analysis so I can't just remove the duplicates. The OS is Linux (centos and ubuntu).
I enumerate the file names and locations and put them into a text file.
Then I do a for statement to move the files to quarantine.
for file in $(cat bad-stuff.txt); do mv $file /quarantine ;done
The problem is that they have the same file name and I just need to add something unique to the filename to get it to save properly. I'm sure it's something simple but I'm not good with regex. Thanks for the help.

Since you're using Linux, you can take advantage of GNU mv's --backup.
while read -r file
do
mv --backup=numbered "$file" "/quarantine"
done < "bad-stuff.txt"
Here's an example that shows how it works:
$ cat bad-stuff.txt
./c/foo
./d/foo
./a/foo
./b/foo
$ while read -r file; do mv --backup=numbered "$file" "./quarantine"; done < "bad-stuff.txt"
$ ls quarantine/
foo foo.~1~ foo.~2~ foo.~3~
$

I'd use this
for file in $(cat bad-stuff.txt); do mv $file /quarantine/$file.`date -u +%s%N`; done
You'll get everyfile with a timestamp appended (in nanoseconds).

You can create a new file name composed by the directory and the filename. Thus you can add one more argument in your original code:
for ...; do mv $file /quarantine/$(echo $file | sed 's:/:_:g') ; done
Please note that you should replace the _ with a proper character which is special enough.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Splitting a large directory into smaller ones in Linux - linux

I have a large directory named as application_pdf which contains 93k files. My use-case is to split the directory into 3 smaller subdirectories (to a different location that the original large directory) containing around 30k files each. Can this be done directly from the commandline. Thanks!

Using bash: x=("path/to/dir1" "path/to/dir2" "path/to/dir3") c=0 for f in * do mv "$f" "${x[c]}" c=$(( (c+1)%3 )) done

Something like this might work: for x in $(ls -1 originPath/*.pdf | head -30000); do mv originPath/$x destinationPath/ done

Related

Alternative for AWK use

Rename large folder of Jpegs

rename all files in folder through regular expression

Copy text from multiple files, same names to different path in bash (linux)

Move files and rename - one-liner

Categories

Resources