Linux batch copy files into directories based on filename pattern

Linux batch copy files into directories based on filename pattern - linux

I have a list of almost 500 pdf files with the following filename structure:
XXXX-YYYY-MM-DD.pdf
where XXXX is a variable lenght numeric code (1 to 4 digits) always delimitated by "-", for example:
51-2016-08-22.pdf
776-2016-08-22.pdf
3881-2016-08-22.pdf
4-2016-08-22.pdf
2860-2016-08-22.pdf
The goal is to copy each file into its own directory, naming the directories like the pattern (ie: file 776-2016-08-22.pdf goes to directory 776). How can I use awk or sed to delimitate the variable lenght field?
Here's my code:
for f in *.pdf
do
FOLDERNAME=`echo $f| awk (awk or sed missing code here)`
mkdir /my/dir/structure/$FOLDERNAME
cp $f /my/dir/structure/$FOLDERNAME/
done
Thanks for your support.

You can use:
for f in *.pdf; do
d="${f%%-*}"
mkdir -p "$d" && cp "$f" "$d"
done

As rightly pointed out by ed-morton, This is NOT recommended solution as it fails in many cases. Please follow https://stackoverflow.com/a/39089589/3834860
Keeping this answer for reference.
awk -F '-' to specify delimiter and '{print $1}' for first element before delimiter.
for f in *.pdf
do
FOLDERNAME=`echo $f| awk -F '-' '{print $1}'`
mkdir /my/dir/structure/$FOLDERNAME
cp $f /my/dir/structure/$FOLDERNAME/
done

Related

How to move files where the first line contains a string?

I am currently using the following command:
grep -l -Z -E '.*?FindMyRegex' /home/user/folder/*.csv | xargs -0 -I{} mv {} /home/destination/folder
This works fine. The problem is it uses grep on the entire file.
I would like to use the grep command on the FIRST line of the file only.
I have tried to use head -1 file | at the beginning, but it did not work.

A change I would add to your script is -
for file in *.csv; do
head -1 "$file" | grep -l -Z -E '.*?FindMyRegex' | xargs -0 -I{} mv {} /home/destination/folder;
done

you can maybe try sed '1q' file.csv | grep ... to search the regexp only in the first line.

You don't need grep or find, as long as your files don't have embedded newlines.
I don't know an easy way off the top of my head to get sed to delimit with nulls.
mv $( for f in /home/user/folder/*.csv;
do sed -ns '1 { /yourPattern/F; q; }' $f;
done ) /home/destination/folder/
EDIT
Rewrote with a loop. This will run a separate instance of sed to check each file, but at least it shouldn't read beyond the first line. It will fail syntactically if there are no hits.
You might need -E depending on your regex.
-n says don't print records from the files.
-s says treat each file as a distinct input - this is so the filenames aren't always the first one.
This does require GNU sed for the F.

gawk 'FNR==1{if($0~/PATTERN/)
printf "mv %s %s\n",FILENAME, "/target";nextfile}' /path/*.csv
First of all, in your regex: .*?FindMyRegex the .*? doesn't make any sense, they could be removed.
The above awk (gawk) one-liner will build up mv file target command lines for you. You can check them, if you are satisfied with them, pipe the output to |sh , the commands are gonna be executed.
replace PATTERN by your regex pattern, and /target by the real target dir.
The one-liner is assuming that the filenames don't contain special chars (space i.e.), if it is the case, add "s to the mv cmd.

using GNU awk to find the filenames, pipe the filenames into xargs
gawk -v pattern="myRegex" '
FNR == 1 {if ($0 ~ pattern) printf "%s\0", FILENAME; nextfile}
' *.csv | xargs -0 echo mv -t destination
If it looks OK, remove "echo"

Try this Shellcheck-clean Bash code:
#! /bin/bash
shopt -s nullglob # Globs that match nothing expand to nothing
shopt -s dotglob # Globs match files whose names start with '.'
dest=/home/destination/folder
for file in *.csv ; do
head -n 1 -- "$file" | grep -qE '.*?FindMyRegex' && mv -- "$file" "$dest"
done
shopt -s nullglob prevents an error if there are no .csv files in the directory.
shopt -s dotglob ensures that files whose name starts with '.' are handled.
The -- in the options for head and mv ensures that files whose names begin with - are handled correctly.
The quotes in "$file" and "$dest" ensure that names that contain whitespace (actually $IFS) characters (including newlines) or glob metacharacters are handled correctly.
Note that the .*? in the reqular expression is probably redundant, and may not do what you think it does (grep -E doesn't do non-greedy matching).

Find files in different directories and operate on the filenames

$ ls /tmp/foo/
file1.txt file2.txt
$ ls /tmp/bar/
file20.txt
$ ls /tmp/foo/file*.txt | grep -o -E '[0-9]+' | sort -n | paste -s -d,
1,2
How to fetch the number in the filename from both the directories? in the above example, I need to get 1,2,20, its in bash shell.
UPDATE:
$ ls /tmp/foo/file*.txt /tmp/bar/file*.txt /tmp/jaz99/file*.txt /tmp/nah/file*.txt | grep -o -E '[0-9]+' | sort -n | paste -s -d,
ls: cannot access /tmp/nah/file*.txt: No such file or directory
1,2,20,30,99
in this case, it should not print 99 (as its not matched by *), and should not print the error if file not found.

You can get this done using a loop with output of find:
s=
# run a loop using find command in a process substitution
while IFS= read -d '' -r file; do
file="${file##*/}" # strip down all directory paths
s+="${file//[!0-9]/}," # remove all non-numeric characters and append comma
done < <(find /tmp/{foo,bar,nah,jaz99} -name '*.txt' -print0 2>/dev/null)
echo "${s%,}" # remove last comma from string
Output
1,2,20,30

Here's my take on this. Use arrays. No need to use external tools like sed or awk or find.
#!/usr/bin/env bash
declare -a a=()
for f in /tmp/{foo,bar,nah}/file*.txt; do
[[ $f =~ .*file([0-9]+).* ]]
a+=( ${BASH_REMATCH[1]} )
done
IFS=,
echo "${a[*]}"
The [[...]] expression populates the $BASH_REMATCH array with regex components. You can use that to extract the numbers and place them in a new temporary array, which you can express with comma separators using $IFS.
Results:
$ mkdir /tmp/foo /tmp/bar
$ touch /tmp/foo/file{1,2}.txt /tmp/bar/file20.txt
$ ./doit
1,2,20

Add leading zeros twice in filename

I have file names (from image tiles) consisting of two numbers separated by an underscore, e.g.
142_27.jpg
7_39.jpg
1_120.jpg
How can I (in linux) add leading zeros to both of these numbers? What I want is the file names as
142_027.jpg
007_039.jpg
001_120.jpg

You can use a single awk command to format filenames with leading zeroes using a printf:
for f in *.jpg; do
echo mv "$f" $(awk -F '[_.]' '{printf "%03d_%03d.%s", $1, $2, $3}' <<< "$f")
done
This will output:
mv 142_27.jpg 142_027.jpg
mv 1_120.jpg 001_120.jpg
mv 7_39.jpg 007_039.jpg
Once you're satisfied with the output, remove echo before mv command.

With perl based rename command
$ touch 142_27.jpg 7_39.jpg 1_120.jpg
$ rename -n 's/\d+/sprintf "%03d", $&/ge' *.jpg
rename(1_120.jpg, 001_120.jpg)
rename(142_27.jpg, 142_027.jpg)
rename(7_39.jpg, 007_039.jpg)
The -n option is for dry run, remove it for actual renaming
If perl based rename command is not available:
$ for f in *.jpg; do echo mv "$f" "$(echo "$f" | perl -pe 's/\d+/sprintf "%03d", $&/ge')"; done
mv 1_120.jpg 001_120.jpg
mv 142_27.jpg 142_027.jpg
mv 7_39.jpg 007_039.jpg
Change echo mv to just mv once dry run seems okay

You can do it with a little shell-script using sed:
for i in *.jpg;
do
new=`echo "$i" | sed -n -e '{ s/^\([0-9]\{0,3\}\)_\([0-9]\{0,3\}\).jpg/000\1_000\2.jpg/ };
{s/\([0-9]*\)\([0-9]\{3\}\)_\([0-9]*\)\([0-9]\{3\}\).jpg/\2_\4.jpg/p };'`;
mv "$i" "$new";
done;
I first append three leading zeros at the said places by default and afterwards cut off as many digits as necessary brginning at the start at said places so that only 3 digits are left

with bash substitution(a,b)
windows(bash)
for f in *.jpg;do a=${f%_*};b=${f#*_};mv $f $(printf "%03d_%07s" $a $b);done
linux
for f in *.jpg;do a=${f%_*};b=${f#*_};b=${b%.*};mv $f $(printf "%03d_%03d".jpg $a $b);done

linux-shell: renaming files to creation time

Good morning everybody,
for a website I'd like to rename files(pictures) in a folder from "1.jpg, 2.jpg, 3.jpg ..." to "yyyymmdd_hhmmss.jpg" - so I'd like to read out the creation times an set this times as names for the pics. Does anybody have an idea how to do that for example with a linux-shell or with imagemagick?
Thank you!

Naming based on file system date
In the linux shell:
for f in *.jpg
do
mv -n "$f" "$(date -r "$f" +"%Y%m%d_%H%M%S").jpg"
done
Explanation:
for f in *.jpg
do
This starts the loop over all jpeg files. A feature of this is that it will work with all file names, even ones with spaces, tabs or other difficult characters in the names.
mv -n "$f" "$(date -r "$f" +"%Y%m%d_%H%M%S").jpg"
This renames the file. It uses the -r option which tells date to display the date of the file rather than the current date. The specification +"%Y%m%d_%H%M%S" tells date to format it as you specified.
The file name, $f, is placed in double quotes where ever it is used. This assures that odd file names will not cause errors.
The -n option to mv tells move never to overwrite an existing file.
done
This completes the loop.
For interactive use, you may prefer that the command is all on one line. In that case, use:
for f in *.jpg; do mv -n "$f" "$(date -r "$f" +"%Y%m%d_%H%M%S").jpg"; done
Naming based on EXIF Create Date
To name the file based on the EXIF Create Date (instead of the file system date), we need exiftool or equivalent:
for f in *.jpg
do
mv -n "$f" "$(exiftool -d "%Y%m%d_%H%M%S" -CreateDate "$f" | awk '{print $4".jpg"}')"
done
Explanation:
The above is quite similar to the commands for the file date but with the use of exiftool and awk to extract the EXIF image Create Date.
The exiftool command provides the date in a format like:
$ exiftool -d "%Y%m%d_%H%M%S" -CreateDate sample.jpg
Create Date : 20121027_181338
The actual date that we want is the fourth field in the output.
We pass the exiftool output to awk so that it can extract the field that we want:
awk '{print $4".jpg"}'
This selects the date field and also adds on the .jpg extension.

Thanks to #John1024 !
I needed to rename files with different extensions in the same time, according to last modification date :
for f in *; do
fn=$(basename "$f")
mv "$fn" "$(date -r "$f" +"%Y-%m-%d_%H-%M-%S")_$fn"
done
"DSC_0189.JPG" ➜ "2016-02-21_18-22-15_DSC_0189.JPG"
"MOV_0131.avi" ➜ "2016-01-01_20-30-31_MOV_0131.avi"
If you don't want to keep original filename :
mv "$fn" "$(date -r "$pathAndFileName" +"%Y-%m-%d_%H-%M-%S")"
Hope it helps noobs as me !

Try this
for file in `ls -1 *.jpg`; do name=`stat -c %y $file | awk -F"." '{ print $1 }' | sed -e "s/\-//g" -e "s/\://g" -e "s/[ ]/_/g"`.jpg; mv $file $name; done
Though there might be an easier way.

I created a shell script; I think it's mac only, linux might need other arguments.
#!/bin/bash
BASEDIR=$1;
for file in `ls -1 $BASEDIR`; do
TIMESTAMP=`stat -f "%B" $BASEDIR/$file`;
DATENAME=`date -r $TIMESTAMP +'%Y%m%d-%H%M%S'`-$file
mv -v $BASEDIR/$file $BASEDIR/$DATENAME;
done
when called with a directory path, moves all files in that directory to prepend the creation date of that file, like
../camera/P1210232.JPG -> ../camera/20220121-103456-P1210232.JPG

Change filename based on file creation time:
exiftool "-filename<FileCreateDate" -d %Y%m%d_%H%M%S%z%%-c.%%le input.jpg

Ordering a loop in bash

I've a bash script like this:
for d in /home/test/*
do
echo $d
done
Which ouputs this:
/home/test/newer dir
/home/test/oldest dir
I'd like to order the folders by creation time so that the 'oldest dir' directory appears first in the list. I've tried ls and tree variations to no avail.
For example,
for d in `ls -d -c -1 $PWD/*`
Returns:
/home/test/oldest
dir
/home/test/newer
dir
Very close, but it does not respect the space in the directory name. My question, how would I have oldest dir on top and support the whitespace?

ls -d -c $PWD/* | while read line
do echo "$line"
done

Another technique, kind of a Schwartzian transform:
stat -c $'%Z\t%n' /home/test/* | sort -n | cut -f2- |
while IFS= read -r filename; do
# ...
This solution is fragile with filenames containing newlines.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string