Alternative for AWK use - linux

I'd love to have a more elegant solution for a mass rename of files, as shown below. Files were of format DEV_XYZ_TIMESTAMP.dat and we needed them as T-XYZ-TIMESTAMP.dat.
In the end, I copied them all (to be on the same side) into renamed folder:
ls -l *dat|awk '{system("cp " $10 " renamed/T-" substr($10, index($10, "_")+1))}'
So, first I listed all dat files, then picked up 10th column (file name) and executed a command using awk's system function.
The command was essentially copying of original filename into renamed folder with new file name.
New file name was created by removing (awk substring function) prefix before (including) _ and adding "T-" prefix.
Effectively:
cp DEV_file.dat renamed/T-file.dat
Is there a way to use cp or mv together with some regex rules to achieve the same in a bit more elegant way?
Thx

You may use this script:
for file in *.dat; do
f="${file//_/-}"
mv "$file" renamed/T-"${f#*-}"
done
You must avoid parsing output of ls command.

If you have rename utilitity
rename -E "s/[^_]*/T/" -e "s/_/-/g" *dat
Demo
$ls -1
ABC_DEF_TIMESTAMP.dat
DEV_XYZ_TIMESTAMP.dat
$rename -E "s/[^_]*/T/" -e "s/_/-/g" *
$ls -1
T-DEF-TIMESTAMP.dat
T-XYZ-TIMESTAMP.dat
$

This is how I would do it:
cpdir=renamed
for file in *dat; do
newfile=$(echo "$file" | sed -e "s/[^_]*/T/" -e "y/_/-/")
cp "$file" "$cpdir/$newfile"
done
The sed scripts transforms every non-underscore leading characters in a single T and then replaces every _ with -. If cpdir is not sure to exist before execution, you can simply add mkdir "$cpdir" after first line.

Related

Remove part of filename with common delimiter

I have a number of files with the following naming:
name1.name2.s01.ep01.RANDOMWORD.mp4
name1.name2.s01.ep02.RANDOMWORD.mp4
name1.name2.s01.ep03.RANDOMWORD.mp4
I need to remove everything between the last . and ep# from the file names and only have name1.name2.s01.ep01.mp4 (sometimes the extension can be different)
name1.name2.s01.ep01.mp4
name1.name2.s01.ep02.mp4
name1.name2.s01.ep03.mp4
This is a simpler version of #Jesse's [answer]
for file in /path/to/base_folder/* #Globbing to get the files
do
epno=${file#*.ep}
mv "$file" "${file%.ep*}."ep${epno%%.*}".${file##*.}"
#For the renaming part,see the note below
done
Note : Didn't get a grab of shell parameter expansion yet ? Check [ this ].
Using Linux string manipulation (refer: http://www.tldp.org/LDP/abs/html/string-manipulation.html) you could achieve like so:
You need to do per file-extension type.
for file in <directory>/*
do
name=${file}
firstchar="${name:0:1}"
extension=${name##${firstchar}*.}
lastchar=$(echo ${name} | tail -c 2)
strip1=${name%.*$lastchar}
lastchar=$(echo ${strip1} | tail -c 2)
strip2=${strip1%.*$lastchar}
mv $name "${strip2}.${extension}"
done
You can use rename (you may need to install it). But it works like sed on filenames.
As an example
$ for i in `seq 3`; do touch "name1.name2.s01.ep0$i.RANDOMWORD.txt"; done
$ ls -l
name1.name2.s01.ep01.RANDOMWORD.txt
name1.name2.s01.ep02.RANDOMWORD.txt
name1.name2.s01.ep03.RANDOMWORD.txt
$ rename 's/(name1.name2.s01.ep\d{2})\..*(.txt)$/$1$2/' name1.name2.s01.ep0*
$ ls -l
name1.name2.s01.ep01.txt
name1.name2.s01.ep02.txt
name1.name2.s01.ep03.txt
Where this expression matches your filenames, and using two capture groups so that the $1$2 in the replacement operation are the parts outside the "RANDOMWORD"
(name1.name2.s01.ep\d{2})\..*(.txt)$

linux rename files in bulk using bash script or command line one liner

I have a list of for example 100 files with the naming convention
<date>_<Time>_XYZ.xml.abc
<date>_<Time>_XYZ.xml
<date>_<Time>_XYZ.csv
for example
20140730_025373_XYZ.xml
20140730_015233_XYZ.xml.ab
20140730_015233_XYZ.csv
Now I want to write script which will remove anything between two underscores. for example in the above case
remove 015233 and change 20140730_015233_XYZ.xml.ab to 20140730_XYZ.xml.ab
remove 015233 and change 20140730_015233_XYZ.csv to 20140730_XYZ.csv
I have tried number of various options using rename, cut, mv but I am getting varied results, not the one which I expect.
You could use rename command if you want to rename files present inside the current directory,
rename 's/^([^_]*)_[^_]*(_.*)$/$1$2/g' *
You can use sed:
sed 's/\([^_]*\)_.*_\(.*\)/\1_\2/' files.list
You can also use cut command
cut -d'_' -f1,3 filename
for FILE in *; do mv "$FILE" "${FILE/_*_/_}"; done
And more specific is
for FILE in *.xml *.xml.ab *.csv; do mv "$FILE" "${FILE/_*_/_}"; done
Further:
for FILE in *_*_*.xml *_*_*.xml.ab *_*_*.csv; do mv "$FILE" "${FILE/_*_/_}"; done

Move files and rename - one-liner

I'm encountering many files with the same content and the same name on some of my servers. I need to quarantine these files for analysis so I can't just remove the duplicates. The OS is Linux (centos and ubuntu).
I enumerate the file names and locations and put them into a text file.
Then I do a for statement to move the files to quarantine.
for file in $(cat bad-stuff.txt); do mv $file /quarantine ;done
The problem is that they have the same file name and I just need to add something unique to the filename to get it to save properly. I'm sure it's something simple but I'm not good with regex. Thanks for the help.
Since you're using Linux, you can take advantage of GNU mv's --backup.
while read -r file
do
mv --backup=numbered "$file" "/quarantine"
done < "bad-stuff.txt"
Here's an example that shows how it works:
$ cat bad-stuff.txt
./c/foo
./d/foo
./a/foo
./b/foo
$ while read -r file; do mv --backup=numbered "$file" "./quarantine"; done < "bad-stuff.txt"
$ ls quarantine/
foo foo.~1~ foo.~2~ foo.~3~
$
I'd use this
for file in $(cat bad-stuff.txt); do mv $file /quarantine/$file.`date -u +%s%N`; done
You'll get everyfile with a timestamp appended (in nanoseconds).
You can create a new file name composed by the directory and the filename. Thus you can add one more argument in your original code:
for ...; do mv $file /quarantine/$(echo $file | sed 's:/:_:g') ; done
Please note that you should replace the _ with a proper character which is special enough.

Removing 10 Characters of Filename in Linux

I just downloaded about 600 files from my server and need to remove the last 11 characters from the filename (not including the extension). I use Ubuntu and I am searching for a command to achieve this.
Some examples are as follows:
aarondyne_kh2_13thstruggle_or_1250556383.mus should be renamed to aarondyne_kh2_13thstruggle_or.mus
aarondyne_kh2_darknessofunknow_1250556659.mp3 should be renamed to aarondyne_kh2_darknessofunknow.mp3
It seems that some duplicates might exist after I do this, but if the command fails to complete and tells me what the duplicates would be, I can always remove those manually.
Try using the rename command. It allows you to rename files based on a regular expression:
The following line should work out for you:
rename 's/_\d+(\.[a-z0-9A-Z]+)$/$1/' *
The following changes will occur:
aarondyne_kh2_13thstruggle_or_1250556383.mus renamed as aarondyne_kh2_13thstruggle_or.mus
aarondyne_kh2_darknessofunknow_1250556659.mp3 renamed as aarondyne_kh2_darknessofunknow.mp3
You can check the actions rename will do via specifying the -n flag, like this:
rename -n 's/_\d+(\.[a-z0-9A-Z]+)$/$1/' *
For more information on how to use rename simply open the manpage via: man rename
Not the prettiest, but very simple:
echo "$filename" | sed -e 's!\(.*\)...........\(\.[^.]*\)!\1\2!'
You'll still need to write the rest of the script, but it's pretty simple.
find . -type f -exec sh -c 'mv {} `echo -n {} | sed -E -e "s/[^/]{10}(\\.[^\\.]+)?$/\\1/"`' ";"
one way to go:
you get a list of your files, one per line (by ls maybe) then:
ls....|awk '{o=$0;sub(/_[^_.]*\./,".",$0);print "mv "o" "$0}'
this will print the mv a b command
e.g.
kent$ echo "aarondyne_kh2_13thstruggle_or_1250556383.mus"|awk '{o=$0;sub(/_[^_.]*\./,".",$0);print "mv "o" "$0}'
mv aarondyne_kh2_13thstruggle_or_1250556383.mus aarondyne_kh2_13thstruggle_or.mus
to execute, just pipe it to |sh
I assume there is no space in your filename.
This script assumes each file has just one extension. It would, for instance, rename "foo.something.mus" to "foo.mus". To keep all extensions, remove one hash mark (#) from the first line of the loop body. It also assumes that the base of each filename has at least 12 character, so that removing 11 doesn't leave you with an empty name.
for f in *; do
ext=${f##*.}
new_f=${base%???????????.$ext}
if [ -f "$new_f" ]; then
echo "Will not rename $f, $new_f already exists" >&2
else
mv "$f" "$new_f"
fi
done

Unix: How to delete files listed in a file

I have a long text file with list of file masks I want to delete
Example:
/tmp/aaa.jpg
/var/www1/*
/var/www/qwerty.php
I need delete them. Tried rm `cat 1.txt` and it says the list is too long.
Found this command, but when I check folders from the list, some of them still have files
xargs rm <1.txt Manual rm call removes files from such folders, so no issue with permissions.
This is not very efficient, but will work if you need glob patterns (as in /var/www/*)
for f in $(cat 1.txt) ; do
rm "$f"
done
If you don't have any patterns and are sure your paths in the file do not contain whitespaces or other weird things, you can use xargs like so:
xargs rm < 1.txt
Assuming that the list of files is in the file 1.txt, then do:
xargs rm -r <1.txt
The -r option causes recursion into any directories named in 1.txt.
If any files are read-only, use the -f option to force the deletion:
xargs rm -rf <1.txt
Be cautious with input to any tool that does programmatic deletions. Make certain that the files named in the input file are really to be deleted. Be especially careful about seemingly simple typos. For example, if you enter a space between a file and its suffix, it will appear to be two separate file names:
file .txt
is actually two separate files: file and .txt.
This may not seem so dangerous, but if the typo is something like this:
myoldfiles *
Then instead of deleting all files that begin with myoldfiles, you'll end up deleting myoldfiles and all non-dot-files and directories in the current directory. Probably not what you wanted.
Use this:
while IFS= read -r file ; do rm -- "$file" ; done < delete.list
If you need glob expansion you can omit quoting $file:
IFS=""
while read -r file ; do rm -- $file ; done < delete.list
But be warned that file names can contain "problematic" content and I would use the unquoted version. Imagine this pattern in the file
*
*/*
*/*/*
This would delete quite a lot from the current directory! I would encourage you to prepare the delete list in a way that glob patterns aren't required anymore, and then use quoting like in my first example.
You could use '\n' for define the new line character as delimiter.
xargs -d '\n' rm < 1.txt
Be careful with the -rf because it can delete what you don't want to if the 1.txt contains paths with spaces. That's why the new line delimiter a bit safer.
On BSD systems, you could use -0 option to use new line characters as delimiter like this:
xargs -0 rm < 1.txt
xargs -I{} sh -c 'rm "{}"' < 1.txt should do what you want. Be careful with this command as one incorrect entry in that file could cause a lot of trouble.
This answer was edited after #tdavies pointed out that the original did not do shell expansion.
You can use this one-liner:
cat 1.txt | xargs echo rm | sh
Which does shell expansion but executes rm the minimum number of times.
Just to provide an another way, you can also simply use the following command
$ cat to_remove
/tmp/file1
/tmp/file2
/tmp/file3
$ rm $( cat to_remove )
In this particular case, due to the dangers cited in other answers, I would
Edit in e.g. Vim and :%s/\s/\\\0/g, escaping all space characters with a backslash.
Then :%s/^/rm -rf /, prepending the command. With -r you don't have to worry to have directories listed after the files contained therein, and with -f it won't complain due to missing files or duplicate entries.
Run all the commands: $ source 1.txt
cat 1.txt | xargs rm -f | bash Run the command will do the following for files only.
cat 1.txt | xargs rm -rf | bash Run the command will do the following recursive behaviour.
Here's another looping example. This one also contains an 'if-statement' as an example of checking to see if the entry is a 'file' (or a 'directory' for example):
for f in $(cat 1.txt); do if [ -f $f ]; then rm $f; fi; done
Here you can use set of folders from deletelist.txt while avoiding some patterns as well
foreach f (cat deletelist.txt)
rm -rf ls | egrep -v "needthisfile|*.cpp|*.h"
end
This will allow file names to have spaces (reproducible example).
# Select files of interest, here, only text files for ex.
find -type f -exec file {} \; > findresult.txt
grep ": ASCII text$" findresult.txt > textfiles.txt
# leave only the path to the file removing suffix and prefix
sed -i -e 's/:.*$//' textfiles.txt
sed -i -e 's/\.\///' textfiles.txt
#write a script that deletes the files in textfiles.txt
IFS_backup=$IFS
IFS=$(echo "\n\b")
for f in $(cat textfiles.txt);
do
rm "$f";
done
IFS=$IFS_backup
# save script as "some.sh" and run: sh some.sh
In case somebody prefers sed and removing without wildcard expansion:
sed -e "s/^\(.*\)$/rm -f -- \'\1\'/" deletelist.txt | /bin/sh
Reminder: use absolute pathnames in the file or make sure you are in the right directory.
And for completeness the same with awk:
awk '{printf "rm -f -- '\''%s'\''\n",$1}' deletelist.txt | /bin/sh
Wildcard expansion will work if the single quotes are remove, but this is dangerous in case the filename contains spaces. This would need to add quotes around the wildcards.

Resources