Removing a prefix from files recursively in ssh - linux

I have a load of folders of images (a lot!) and some of the thumbnails have a 'tn' prefix, while others don't, so in order to be able to write a gallery for all, I'm trying to remove the 'tn' from the beginning of the files that have it recursively in the entire directory.
So, an offending thumbnail would have the files :
tngal001-001.jpg
tngal001-002.jpg
tngal001-003.jpg
etc...
and I need them to be :
gal001-001.jpg
gal001-002.jpg
gal001-003.jpg
or even better still... if I could get the whole tngal001- off, that'd be amazing, so, in the directory gallery I have:
gal001/thumbnails/tngal001-001.jpg
gal001/thumbnails/tngal001-002.jpg
gal001/thumbnails/tngal001-003.jpg
etc...
gal002/thumbnails/tngal002-001.jpg
gal002/thumbnails/tngal002-002.jpg
gal002/thumbnails/tngal002-003.jpg
etc...
gal003/thumbnails/tngal003-001.jpg
gal003/thumbnails/tngal003-002.jpg
gal003/thumbnails/tngal003-003.jpg
etc...
and I'd prefer to have:
gal001/thumbnails/001.jpg
gal001/thumbnails/002.jpg
gal001/thumbnails/003.jpg
etc...
gal002/thumbnails/001.jpg
gal002/thumbnails/002.jpg
gal002/thumbnails/003.jpg
etc...
gal003/thumbnails/001.jpg
gal003/thumbnails/002.jpg
gal003/thumbnails/003.jpg
etc...
I have tried find . -type f -name "tn*" -exec sh -c 'for f; do mv "$f" "{f#tn}"; done' find sh {} +
and find . -type f -exec sh -c 'for file in tn*; do mv "$file" "${file#tn}"; done' findsh {} +
but I'm not getting it quite right. I just want to understand how to strip off the letters/rename recursively, as I'm just getting my head around this stuff. All the other questions I have found seem to be talking about stripping out characters from file names and all the ascii characters and escaping spaces etc are confusing me. I would appreciate it if someone could explain it in plain(ish) english. I'm not stupid, but I am a newbie to linux! I know it's all logical once I understand what's happening.
Thanks in advance, Kirsty

find . -type f -name "tn*" -exec sh -c '
for f; do
fname=${f##*/}
mv -i -- "$f" "${f%/*}/${fname#tn*-}"
done
' sh {} +
You need to split "$f" into the parent path and filename before you start to remove the prefix from the filename. And you forgot to add a $ in your parameter expansion (${f#tn}).
${f##*/} removes the longest prefix */ and leaves the filename, e.g.
gal001/thumbnails/tngal001-001.jpg -> tngal001-001.jpg
(the same result as basename "$f")
${f%/*} removes the shortest suffix /* and leaves the parent path, e.g.
gal001/thumbnails/tngal001-001.jpg -> gal001/thumbnails
(the same result as dirname "$f")
${fname#tn*-} removes the shortest prefix tn*- from the filename, e.g.
tngal001-001.jpg -> 001.jpg
I added the -i option to prompt to overwrite an already existing file.

You can loop over all the folders and files in your gallery and then rename them as following.
Assuming you have your folder structure as
gallery/
gallery/gal001
gallery/gal002
gallery/gal003
...
gallery/gal001/thumbnails/
gallery/gal002/thumbnails/
gallery/gal003/thumbnails/
...
gallery/gal001/thumbnails/tngal001-001.jpg
gallery/gal001/thumbnails/tngal001-002.jpg
gallery/gal001/thumbnails/tngal001-002.jpg
Move to your gallery using cd gallery then run the following code
for j in *;
do
cd $j/thumbnails;
for i in *;
do
echo "Renaming $j/thumbnails/$i --> $(echo $i|sed "s/tn$j-//1")";
mv -i $i $(echo $i|sed "s/tn$j-//1");
done
cd ../..;
done
Explanation
for j in *;
loops over all the folders in gallery ie j contains gal001, gal002, gal003, etc.
cd $j/thumbnails;
moves inside 'gal001/thumbnails' direcotry.
for i in *; do
loops over all the files in the directory gal001/thumbnails and name of the file is contained in i.
echo "Renaming $j/thumbnails/$i --> $(echo $i|sed "s/tn$j-//1")"
Prints the file name and to which it is being renamed to. (Remove it if you don't want verbose).
mv $i $(echo $i|sed "s/tn$j-//1"); done
mv -i $i newname Renames $i (value of current file in loop). -i flag to prompt if the file name after rename already exist.
sed is stream editor, takes the filename by piping $i into sed,
"s/previous/new/1" replaces first occurence of previous value with new value in the stream. Here, replaces value of tn + j (which is name of directory gal001) i.e. tngal001- with "null string" (nothing between //).
cd ../.. to move back to gallery.

Related

How to delete smallest file if names are duplicate

I would like to clean up a folder with videos. I have a bunch of videos that were downloaded with different resolutions, so each file will start with the same name and then end with "_480p" or "_720p" etc.
I just want to keep the largest file of each such set.
So I am looking for a way to delete files based on
check if name before "_" is identical
if true, then delete all files except largest one
Thinking of a flexible and fast way to approach the problem, you can gather a list of files ending in "[[:digit:]]+p" and then a quick way to parse the names is to provide them on stdin to awk and let awk index an array with the file prefix (path + part of name before '_') so it will be unique for files allowing the different format size to be obtained and stored at that index.
Then it's a simply matter of comparing the stored resolution number for the file against the current file number and deleting the lesser of the two.
Your find command to locate all files in the directory below the current, recursively, could be:
find ./tmp -type f -regex "^.*[0-9]+p$"
What I would do is then pipe the filename output to a short awk script where an array stores the last seen number for a given file prefix, and then if the current record (line) resolution number if bigger than the value stored in the array, a filename using the array number is created and that file deleted with system() using rm filename. If the current line resolution number is less than what is already stored in the array for the file, you simply delete the current file.
You can do that as:
#!/usr/bin/awk -f
BEGIN { FS = "/" }
{
num = $NF # last field holds number up to 'p'
prefix = $0 # prefix is name up to "_[[:digit:]]+p
sub (/^.*_/, "", num) # isolate number
sub (/p$/, "", num) # remove 'p' at and
sub (/_[[:digit:]]+p$/, "", prefix) # isolate path and name prefix
if (prefix in a) { # current file in array a[] ?
rmfile = $0 # set file to remove to current
if (num + 0 > a[prefix] + 0) { # current number > array number
rmfile = prefix "_" a[prefix] "p" # for remove filename from array
a[prefix] = num # update array with higher num
}
system ("rm " rmfile); # delete the file
}
else
a[prefix] = num # if no num for prefix in array, store first
}
(note: the field-separator splits the fields using the directory separator so you have all file components to work with.)
Example Use/Output
With a representative set of files in a tmp/ directory below the current, e,g.
$ ls -1 tmp
a_480p
a_720p
b_1080p
b_480p
c_1080p
c_720p
Running the find command piped to the awk script named awkparse.sh would be as follows (don't forget to make the awk script executable):
$ find ./tmp -type f -regex "^.*[0-9]+p$" | ./awkparse.sh
Looking at the directory after piping the results of find to the awk script, the tmp/ directory now only contains the highest resolution (largest) files for any given filename, e.g.
$ ls -1
a_720p
b_1080p
c_1080p
This would be highly efficient. It could also handle all files in a nested directory structure where multiple directory levels hold files you need to clean out. Look things over and let me know if you have questions.
This shell script might be what you want:
previous_prefix=
for file in *_[0-9]*[0-9]p*; do
prefix=${file%_*}
resolution=${file##*_}
resolution=${resolution%%p*}
if [ "$prefix" = "$previous_prefix" ]; then
if [ "$resolution" -gt "$greater_resolution" ]; then
file_to_be_removed=$greater_file
greater_file=$file
greater_resolution=$resolution
else
file_to_be_removed=$file
fi
echo rm -- "$file_to_be_removed"
else
greater_resolution=$resolution
greater_file=$file
previous_prefix=$prefix
fi
done
Drop the echo if the output looks good.
I would try to:
list all non-smallest files (non-480p): *_720p* and *_1080p*
for each of them replace *_720p*/*_1080p* in the name with all possible smaller resolutions
and try to delete those files with rm -f, whether they exist or not
#!/bin/sh -e
shopt -s nullglob
for file in *_1080p*; do
rm -f -- "${file//_1080p/_720p}"
rm -f -- "${file//_1080p/_480p}"
done
for file in *_720p*; do
rm -f -- "${file//_720p/_480p}"
done
And here is a Bash script using nested loops to automate the above:
#!/bin/bash -e
shopt -s nullglob
res=(_1080p _720p _480p _240p)
for r in ${res[#]}; do
res=("${res[#]:1}") # remove the first element in res array
for file in *$r*; do
for r2 in ${res[#]}; do
rm -f -- "${file//$r/$r2}"
done
done
done

How do I use perl-rename to replace . with _ on linux recursively, except for extensions

I am trying to rename some files and folders recursively to format the names, and figured find and perl-rename might be the tools for it. I've managed to find most of the commands I want to run, but for the last two:
I would like for every . in a directory name to be replaced by _ and
for every . but the last in a file name to be replaced with _
So that ./my.directory/my.file.extension becomes ./my_directory/my_file.extension.
For the second task, I don't even have a command.
For the first task, I have the following command :
find . -type d -depth -exec perl-rename -n "s/([^^])\./_/g" {} +
Which renames ./the_expanse/Season 1/The.Expanse.S01E01.1080p.WEB-DL.DD5.1.H264-RARBG ./the_expanse/Season 1/Th_Expans_S01E0_1080_WEB-D_DD__H264-RARBG, so it doesn't work because each word character before an . is eaten.
If instead type :
find . -type d -depth -exec perl-rename -n "s/\./_/g" {} +, I rename ./the_expanse/Season 1/The.Expanse.S01E01.1080p.WEB-DL.DD5.1.H264-RARBG into _/the_expanse/Season 1/The_Expanse_S01E01_1080p_WEB-DL_DD5_1_H264-RARBG which doesn't work either because the current directory is replaced by _.
If someone could give me a solution to:
replace every . in a directory name by _ and
replace every . but the last in a file name with _
I'd be very grateful.
First tackling the directories with .
# find all directories and remove the './' part of each and save to a file
$ find -type d | perl -lpe 's#^(\./|\.)##g' > list-all-dir
#
# dry run
# just print the result without actual renaming
$ perl -lne '($old=$_) && s/\./_/g && print' list-all-dir
#
# if it looked fine, rename them
$ perl -lne '($old=$_) && s/\./_/g && rename($old,$_)' list-all-dir
This part s/\./_/g is for matching every . and replacing it with _
Second tackling the file extensions, rename . except . for file extension
# find all *.txt file and save or your match
$ find -type f -name \*.txt | perl -lpe 's#^(\./|\.)##g' > list-all-file
#
# dry run
$ perl -lne '($old=$_) && s/(?:(?!\.txt$)\.)+/_/g && print ' list-all-file
#
# if it looked fine, rename them
$ perl -lne '($old=$_) && s/(?:(?!\.txt$)\.)+/_/g && rename($old,$_) ' list-all-file
This part (?:(?!\.txt$)\.)+ is for matching every . except the last . before the file extension.
NOTE
Here I used .txt and you should replace it with your match. The Second code will rename input like this:
/one.one/one.one/one.file.txt
/two.two/two.two/one.file.txt
/three.three/three.three/one.file.txt
to such an output:
/one_one/one_one/one_file.txt
/two_two/two_two/one_file.txt
/three_three/three_three/one_file.txt
and you can test it here with an online regex match.

Create CSV file using file name and file contents in Linux

I have a folder with over 400K txt files.
With names like
deID.RESUL_12433287659.txt_234323456.txt
deID.RESUL_34534563649.txt_345353567.txt
deID.RESUL_44235345636.txt_537967875.txt
deID.RESUL_35234663456.txt_423452545.txt
Each file has different content
I want to grab file name and file content and put in CSV.
Something like:
file_name,file_content
deID.RESUL_12433287659.txt_234323456.txt,Content 1
deID.RESUL_34534563649.txt_345353567.txt,Content 2
deID.RESUL_44235345636.txt_537967875.txt,Content 3
deID.RESUL_35234663456.txt_423452545.txt,Content 4
I know how to grab all the files in a directory in CSV using:
find * > files.csv
How can I also grab the contents of the file?
find * is somewhat strange, find already scans recursively. find . is enough to include all find * (well, unless there is somewhat strange shell glob rules you take into account).
We would need to iterate over the files. Also it would be nice to remove newlines.
# create file for a MCVE
while IFS=' ' read -r file content; do echo "$content" > "$file"; done <<EOF
deID.RESUL_12433287659.txt_234323456.txt Content 1
deID.RESUL_34534563649.txt_345353567.txt Content 2
deID.RESUL_44235345636.txt_537967875.txt Content 3
deID.RESUL_35234663456.txt_423452545.txt Content 4
EOF
{
# I'm using `|` as the separator for columns
# output header names
echo 'file_name|file_content';
# this is the hearth of the script
# find the files
# for each file execute `sh -c 'printf "%s|%s\n" "$1" "$(cat "$1")"' -- <filename>`
# printf - nice printing
# "$(cat "$1")" - gets file content and also removes trailing empty newlines. Neat.
find . -type f -name 'deID.*' -exec sh -c 'printf "%s|%s\n" "$1" "$(cat "$1")"' -- {} \;
} |
# nice formatting:
column -t -s'|' -o ' '
will output:
file_name file_content
./deID.RESUL_44235345636.txt_537967875.txt Content 3
./deID.RESUL_35234663456.txt_423452545.txt Content 4
./deID.RESUL_34534563649.txt_345353567.txt Content 2
./deID.RESUL_12433287659.txt_234323456.txt Content 1

Replacing large string of text in Linux

I have several 1000 WP files that were injected with string such as the following:
I know I can do a replace with something like this:
find . -type f -exec sed -i 's/foo/bar/g' {} +
But I am having a problem getting the large string to be taken correctly. All the " and ' cause the string to jump out of my CLI.
Below is a sample string:
<?php if(!isset($GLOBALS["\x61\156\x75\156\x61"])) { $ua=strtolower($_SERVER["\x48\124\x54\120\x5f\125\x53\105\x52\137\x41\107\x45\1162]y4c##j0#67y]37]88y]27]28y]#%x5c%x782fr%x5c%x7825%x5%x7825s:*<%x5c%x7825j:,,Bjg!)%x5c%x7825j:>>1*!%x5c%x7825b:>1pde>u%x5c%x7825V<#65,47R25,d7R17,67R37,#%x5c%x7827!hmg%x5c%x7825!)!gj!<2,*j%x5c%x7825!-#1]#-bubE{h%x5c%x8984:71]K9]77]D4]82]K6]72]K9]78]K5]53]Kc#<%x5cujojRk3%x5c%x7860{666~7878Bsfuvso!sboepn)%x5c%x7825epnbss-x7827{ftmfV%x5c%x787f<*X&Z&S{ftmfV%x5c%x787f<*XAZASV<*w%x5c%x7825)p5c%x782f#00;quui#>.%x5c%x7825!<***f%x5c%x7827,111127-K)ebfsX%x5c%x7827u%x5c%x7825)7fmji%x5c%x7x7825)323ldfidk!~!<**qp%x5c%x7825!-uyfu%x5c%x7825)3of)fepdof%x5c%xp!*#opo#>>}R;msv}.;%x5c%x782f#%x5c%x782f#%x5c%x782f},;#-#}+;%x5c%x7%x78257-K)fujs%x5c%x7878X6<#o]o]Y%x5c%x78257;uc%x7825Z<#opo#>b%x5c%x7<!fmtf!%x5c%x7825b:>%x5c%x7825s:%x5c%x70QUUI7jsv%x5c%x78257UFH#%x5c%x7827rfs%x5c%x78256~6<%x!Ydrr)%x5c%x7825r%x5c%x%x5c%x7825%x5c%x7827Y%x5c%x78256<.msv%x5cq%x5c%x7825%x5c%x785cSFWSFT%x5c%x7860%x5c%x7825}X;!s%x5c%x782fq%x5c%x7825>U<#16,47R57,27R66,#%x5c%x782fq%x560msvd}+;!>!}%x5c%x7827;!>tpI#7>%x5c%x782f7rfs%x5c%x78256<#o]1%x5c%x782f2e:4e, $rzgpabhkfk, NULL); $qenzappyva=$rzgpabhkfk; $qenzappyva=(798-677); $rlapmcvoxs=$qenzappyva-1; ?>
EXAMPLE of what I tried:
perl -pi -e 's/<?php if(!isset($GLOBALS["\x61\156\x75\156\x61"])) { $ua=strtolower($_SERVER["\x48\124\x54\120\x5f\125\x53\105\x52\137\x41\107\x45\116\x54"]); if ((! strstr($ua,"\x6d\163\x69\145")) and (! strstr($ua,"\x72\166\x3a\61\x31"))) $GLOBALS["\x61\156\x75\156\x61"]=1; } ?><?php $rlapmcvfunction fjfgg($n){%x7825_t%x5c%x7825:osvufs:~:<*9-1-r%x5c%x7825)s%x5c%x7825>%x5c%x782c%x7824*!|!%x5c%x7824-...x2a\57\x20"; $qenzappyva=substr($rlapmcvoxs,(48535-38422),(59-47)); $qenzappyva($rrzeotjace, $rzgpabhkfk, NULL); $qenzappyva=$rzgpabhkfk; $qenzappyva=(798-677); $rlapmcvoxs=$qenzappyva-1; ?>//g' /home/......../content-grid.php
-bash: !: event not found
If the match is identical and on a separate line you can use comm
comm -23 source subtract
where subtract is the file with the contents to be removed from the source file. It's not an in place replacement so you have to create a temp file and overwrite the source after making sure it does what you need.
If you don't care about the extra newline, the simple approach using sed would be:
find . -type f -exec sed -i 's/.*\\x61\\156\\x75\\156\\x61.*$//g' {} +
sed can also handle the newline, but that is a little more complex.

Batch remove substring from filename with special characters in BASH

I have a list of files in my directory:
opencv_calib3d.so2410.so
opencv_contrib.so2410.so
opencv_core.so2410.so
opencv_features2d.so2410.so
opencv_flann.so2410.so
opencv_highgui.so2410.so
opencv_imgproc.so2410.so
opencv_legacy.so2410.so
opencv_ml.so2410.so
opencv_objdetect.so2410.so
opencv_ocl.so2410.so
opencv_photo.so2410.so
They're the product of a series of mistakes made with batch renames, and now I can't figure out how to remove the middle ".so" from each of them. For example:
opencv_ocl.so2410.so should be opencv_ocl2410.so
This is what I've tried:
# attempt 1, replace the (first) occurrence of `.so` from the filename
for f in opencv_*; do mv "$f" "${f#.so}"; done
# attempt 2, escape the dot
for f in opencv_*; do mv "$f" "${f#\.so}"; done
# attempt 3, try to make the substring a string
for f in opencv_*; do mv "$f" "${f#'.so'}"; done
# attempt 4, combine 2 and 3
for f in opencv_*; do mv "$f" "${f#'\.so'}"; done
But all of those have no effect, producing the error messages:
mv: ‘opencv_calib3d.so2410.so’ and ‘opencv_calib3d.so2410.so’ are the same file
mv: ‘opencv_contrib.so2410.so’ and ‘opencv_contrib.so2410.so’ are the same file
mv: ‘opencv_core.so2410.so’ and ‘opencv_core.so2410.so’ are the same file
...
Try this in your mv command:
mv "$f" "${f/.so/}"
First match of .so is being replaced by empty string.
a='opencv_calib3d.so2410.so'
echo "${a%%.so*}${a#*.so}"
opencv_calib3d2410.so
Where:
${a%%.so*} - the part before the first .so
${a#*.so} - the part after the first .so

Resources