Ubuntu - Remove dir and ignore filetypes - linux

I'm trying to create a cronjob for Ubuntu where:
all empty dir's should be removed
if the dir is not empty then it should be removed if the only filetypes are txt or csv files
Currently I have:
find /path -depth rmdir {} \; 2>dev/null
What do I need to delete the folders which only have txt or csv files?
I don't want to delete all txt or csv files, just those folders which do not contain other filetypes.
Additional example:
Dir1
SubDir1
SubSubDir1
File.txt
File.csv
SubDir2
SubSubDir2
File.xml
SubSubDir1 should be deleted. Since SubDir1 and Dir is now empty they should be deleted as well.
SubSubDir2 contains another filetype and should no be deleted.

You could list the number of files in a folder with something like:
find "$d" -maxdepth 1 -not -iname '*.csv' -a -not -iname '*.txt' | wc -l
If the folder is empty or the folder contains exclusively txt and csv files, it shall print 1.
And to list folders so that they don’t mess up each other if you erase the parents first:
find /path -depth -type d
All in all, you may be able to achieve what you want with:
while read d
do
if [ $(find "$d" -maxdepth 1 -not -iname '*.csv' -a -not -iname '*.txt' | wc -l) -eq 1 ]
then
rm -rf "$d"
fi
done < <(find /path -depth -type d)
But I also advocate a check somewhere so your cron doesn’t wipe your storage without your consent.

Related

Moving files with a pattern in their name to a folder with the same pattern as its name

My directory contains mix of hundreds of files and directories similar to this:
508471/
ae_lstm__ts_ 508471_detected_anomalies.pdf
ae_lstm__508471_prediction_result.pdf
mlp_508471_prediction_result.pdf
mlp__ts_508471_detected_anomalies.pdf
vanilla_lstm_508471_prediction_result.pdf
vanilla_lstm_ts_508471_detected_anomalies.pdf
598690/
ae_lstm__ts_598690_detected_anomalies.pdf
ae_lstm__598690_prediction_result.pdf
mlp_598690_prediction_result.pdf
mlp__ts_598690_detected_anomalies.pdf
vanilla_lstm_598690_prediction_result.pdf
vanilla_lstm_ts_598690_detected_anomalies.pdf
There are folders with an ID number as their names, like 508471 and 598690.
In the same path as these folders, there are pdf files that have this ID number as part of their name. I need to move all the pdf files with the same ID in their name, to their related directories.
I tried the following shell script but it doesn't do anything. What am I doing wrong?
I'm trying to loop over all the directories, find the files that have id in their name, and move them to the same dir:
for f in ls -d */; do
id=${f%?} # f value is '598690/', I'm removing the last character, `\`, to get only the id part
find . -maxdepth 1 -type f -iname *.pdf -exec grep $id {} \; -exec mv -i {} $f \;
done
#!/bin/sh
find . -mindepth 1 -maxdepth 1 -type d -exec sh -c '
for d in "$#"; do
id=${d#./}
for file in *"$id"*.pdf; do
[ -f "$file" ] && mv -- "$file" "$d"
done
done
' findshell {} +
This finds every directory inside the current one (finding, for example, ./598690). Then, it removes ./ from the relative path and selects each file that contains the resulting id (598690), moving it to the corresponding directory.
If you are unsure of what this will do, put an echo between && and mv, it will list the mv actions the script would make.
And remember, do not parse ls.
The below code should do the required job.
for dir in */; do find . -mindepth 1 -maxdepth 1 -type f -name "*${dir%*/}*.pdf" -exec mv {} ${dir}/ \;; done
where */ will consider only the directories present in the given directory, find will search only files in the given directory which matches *${dir%*/}*.pdf i.e file name containing the directory name as its sub-string and finally mv will copy the matching files to the directory.
in Unix please use below command
find . -name '*508471*' -exec bash -c 'echo mv $0 ${0/508471/598690}' {} \;
You may use this for loop from the parent directory of these pdf files and directories:
for d in */; do
compgen -G "*${d%/}*.pdf" >/dev/null && mv *"${d%/}"*.pdf "$d"
done
compgen -G is used to check if there is a match for given glob or not.

Shell script to loop and delete

Could someone help me on this.I have below folder structure as shown below .I want to loop through every folder inside the backuptest and delete all the folders except today date folder.i want it run as a cron job
Use find for this:
today="$(date +%Y-%m-%d)"
find /path/to/backuptest/Server* -mindepth 1 -maxdepth 1 -type d -not -name "$today" -exec rm -R {} \;
Edit
To not delete directories other than those containing a date structure, use something like
find /path/to/backuptest/Server* -mindepth 1 -maxdepth 1 -type d -regex ".*2016-[0-1]*[0-9]-[0-3][0-9]$" -not -name "$today"
You can get today's date in whatever format you require via the date command. For example,
TODAY=$(date +%Y-%m-%d)
You can loop over the subfolders you want with a simple wildcard match:
for d in /path/to/backuptest/*/*; do
# ...
done
You can strip the directory portion from a file name with the basename command:
name=$(basename path/to/file)
You can glue that together something like this:
#!/bin/bash
TODAY=$(date +%Y-%m-%d)
for d in /path/to/backuptest/*/*; do
test "$(basename "$d")" = "$TODAY" || rm -rf "$d"
done
Update:
If you don't actually want to purge all subfolders except today's, but rather only those matching some particular name pattern, then one way to accomplish that would be to insert that pattern into the glob in the for command. For example, here
for d in /path/to/backuptest/*/+([0-9])-+([0-9])-+([0-9]); do
test "$(basename "$d")" = "$TODAY" || rm -rf "$d"
done
the only files / directories considered for deletion are those whose names consist of three nonempty, hyphen-separated strings of decimal digits. One could write patterns that more precisely match date string format if one preferred, but it does get messier the more discriminating you want the pattern to be.
You can do it with find:
set date=`date +%Y-%m-%d`
find backuptest -type d -not -name $date -not -name "backuptest" -not -name "Server*" -exec rm -rf {} \;
This:
find backuptest -type d -not -name $date -not -name "backuptest" -not -name "Server*"
will look for directories name different than:
backuptest
Server*
$date -> current date
and remove them with:
rm -rf

cat files in subdirectories using linux commands

I have the following directories:
P922_101
P922_102
.
.
Each directory, for instance P922_101 has following subdirectories:
140311_AH8MHGADXX 140401_AH8CU4ADXX
Each subdirectory, for instance 140311_AH8MHGADXX has the following files:
1_140311_AH8MH_P922_101_1.fastq.gz 1_140311_AH8MH_P922_101_2.fastq.gz
2_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_2.fastq.gz
And files in 140401_AH8CU4ADXX are:
1_140401_AH8CU_P922_101_1.fastq.gz 1_140401_AH8CU_P922_4001_2.fastq.gz
2_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_4001_2.fastq.gz
I want to do 'cat' for the files in the subdirectories in the following way:
cat 1_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_1.fastq.gz
1_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_101_1.fastq.gz > P922_101_1.fastq.gz
which means that files ending with _1.fastq.gz should be concatenated into a single file and files ending with _2.fatsq.gz into another file.
It should be run for all files in subdirectories in all directories. Could someone give a linux solution to do this?
Since they're compressed, you should probably use gzip -dc (decompress and write to stdout) -
find /somePath -type f -name "*.fastq.gz" -exec gzip -dc {} \; | \
tee -a /someOutFolder/out.txt
You can use find for this:
find /top/path -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > one_file
find /top/path -mindepth 2 -type f -name "*_2.fastq.gz" -exec cat {} \; > another_file
This will look for all the files starting from /top/path and having a name matching the pattern _1.fastq.gz / _2.fastq.gz and cat them into the desired file. -mindepth 2 makes find look for files that are at least under the current directory; this way, files in /top/path won't be matched.
Note that you will probably need zcat instead of cat, for gz files.
As you keep adding details in comments, let's see what else we can do:
Say you have the list of directories in a file directories_list, each line containing one:
while read directory
do
find $directory -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > $directory/output
done < directories_list

In Linux terminal, how to delete all files in a directory except one or two

In a Linux terminal, how to delete all files from a folder except one or two?
For example.
I have 100 image files in a directory and one .txt file.
I want to delete all files except that .txt file.
From within the directory, list the files, filter out all not containing 'file-to-keep', and remove all files left on the list.
ls | grep -v 'file-to-keep' | xargs rm
To avoid issues with spaces in filenames (remember to never use spaces in filenames), use find and -0 option.
find 'path' -maxdepth 1 -not -name 'file-to-keep' -print0 | xargs -0 rm
Or mixing both, use grep option -z to manage the -print0 names from find
In general, using an inverted pattern search with grep should do the job. As you didn't define any pattern, I'd just give you a general code example:
ls -1 | grep -v 'name_of_file_to_keep.txt' | xargs rm -f
The ls -1 lists one file per line, so that grep can search line by line. grep -v is the inverted flag. So any pattern matched will NOT be deleted.
For multiple files, you may use egrep:
ls -1 | grep -E -v 'not_file1.txt|not_file2.txt' | xargs rm -f
Update after question was updated:
I assume you are willing to delete all files except files in the current folder that do not end with .txt. So this should work too:
find . -maxdepth 1 -type f -not -name "*.txt" -exec rm -f {} \;
find supports a -delete option so you do not need to -exec. You can also pass multiple sets of -not -name somefile -not -name otherfile
user#host$ ls
1.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt 8.txt josh.pdf keepme
user#host$ find . -maxdepth 1 -type f -not -name keepme -not -name 8.txt -delete
user#host$ ls
8.txt keepme
Use the not modifier to remove file(s) or pattern(s) you don't want to delete, you can modify the 1 passed to -maxdepth to specify how many sub directories deep you want to delete files from
find . -maxdepth 1 -not -name "*.txt" -exec rm -f {} \;
You can also do:
find -maxdepth 1 \! -name "*.txt" -exec rm -f {} \;
In bash, you can use:
$ shopt -s extglob # Enable extended pattern matching features
$ rm !(*.txt) # Delete all files except .txt files

git find and rename a string in multiple filenames and folder names

So Basically I need to find all files and folders in my github project containing the string 'persons'
find . -type f -print | grep "persons"
find . -type d -print | grep "persons"
The above works for me.
But I also need to rename all the above files and folders with 'members'
Can I do the above with a couple of commands? Instead of manually replacing them one by one
i dont know how to do a git mv oldfilename newfilename rescursively to the above
for dir in `find /DIR -type d -iname '*persons*'` ; do
git mv "${dir}" "${dir/persons/members}"
done
Will do. For the files do it with -type f.
find . -depth -name persons | while read F; do mv $F $(dirname $F)/members; done

Resources