Move directories less than 1G - linux

I am trying to move all directories less than 1GB. I am trying to use this command:
du -h -d 1 -t -1G | xargs -0 mv -it /destination/dir/
But I get an error message:
mv: cannot stat [...] File name too long
Help would be greatly appreciated :)

I'm not sure why you're using the -0 argument to xargs as this specifies that the filenames are separated by null bytes rather than spaces. The output of du won't contain any null bytes so the entire output will be treated as a single filename, causing the error that you're seeing.
Anyway, I would suggest using find:
find /path/to/source -type d -size -1024M -exec mv -it /path/to/destination {} +
If you're happy that du is already producing the output that you want and want to use it instead, you can add the -0 switch so that it uses null byte separators, then your current xargs command should work.

So here is a workaround that serves my needs. Perhaps somebody can expand on it? Anyway, IF you don't need to worry about sub directories then the following works.
du -Sb -t -1G | cut -f 2- | xargs -d "\n" mv -t /path/to/destination/

Related

How to delete X number of files in a directory

To get X number of files in a directory, I can do:
$ ls -U | head -40000
How would I then delete these 40,000 files? For example, something like:
$ "rm -rf" (ls -U | head -40000)
The tool you need for this is xargs. It will convert standard input into arguments to a command that you specify. Each line of the input is treated as a single argument.
Thus, something like this would work (see the comment below, though, ls shouldn't be parsed this way normally):
ls -U | head -40000 | xargs rm -rf
I would recommend before trying this to start with a small head size and use xargs echo to print out the filenames being passed so you understand what you'll be deleting.
Be aware if you have files with weird characters that this can sometimes be a problem. If you are on a modern GNU system you may also wish to use the arguments to these commands that use null characters to separate each element. Since a filename cannot contain a null character that will safely parse all possible names. I am not aware of a simple way to take the top X items when they are zero separated.
So, for example you can use this to delete all files in a directory
find . -maxdepth 1 -print0 | xargs -0 rm -rf
Use a bash array and slice it. If the number and size of arguments is likely to get close to the system's limits, you can still use xargs to split up the remainder.
files=( * )
printf '%s\0' "${files[#]:0:40000}" | xargs -0 rm
What about using awk as the filter?
find "$FOLDER" -maxdepth 1 -mindepth 1 -print0 \
| awk -v limit=40000 'NR<=limit;NR>limit{exit}' RS="\0" ORS="\0" \
| xargs -0 rm -rf
It will reliably remove at most 40.000 files (or folders). Reliably means regardless of which characters the filenames may contain.
Btw, to get the number of files in a directory reliably you can do:
find FOLDER -mindepth 1 -maxdepth 1 -printf '.' | wc -c
I ended up doing this since my folders were named with sequential numbers. This should also work for alphabetical folders:
ls -r releases/ | sed '1,3d' | xargs -I {} rm -rf releases/{}
Details:
list all the items in the releases/ folder in reverse order
slice off the first 3 items (which would be the newest if numeric/alpha naming)
for each item, rm it
In your case, you can replace ls -r with ls -U and 1,3d with 1,40000d. That should be the same, I believe.

Grep inside files returned from ls and head

I have a directory with a large number of files. I am attempting to search for text located in at least one of the files. The text is likely located in one of the more recent files. What is the command to do this? I thought it would look something like ls -t | head -5 | grep abaaba.
For example, if I have 5 files returned from ls -t | head -5:
- file1, file2, file3, file4, file5, I need to know which of those files contains abaaba.
It's not really clear what you are trying to do. But I assume the efficiency is your main goal. I would use something like:
ls -t | while read -r f; do grep -lF abaaba "$f" && break;done
This will print only first file containing the string and stops the search. If you want to see actual lines use -H instead of -l. And if you have regex instead of mere string drop -F which will make grep run slower however.
ls -t | while read -r f; do grep -H abaaba "$f" && break;done
Of course if you want to continue the search I'd suggest dropping "&& break".
ls -t | while read -r f; do grep -HF abaaba "$f";done
If you have some ideas about the time frame, it's good idea to try find.
find . -maxdepth 1 -type f -mtime -2 -exec grep -HF abaaba {} \;
You can raise the number after -mtime to cover more than last 2 days.
If you're just doing this interactively, and you know you don't have spaces in your filenames, then you can do:
grep abaaba $(ls -t | head -5) # DO NOT USE THIS IN A SCRIPT
If writing this in an alias or for repeat future use, do it the "proper" way that takes more typing, but that doesn't break on spaces and other things in filenames.
If you have spaces but not newlines, you can also do
(IFS=$'\n' grep abaaba $(ls -t | head -5) )

Remove files not containing a specific string

I want to find the files not containing a specific string (in a directory and its sub-directories) and remove those files. How I can do this?
The following will work:
find . -type f -print0 | xargs --null grep -Z -L 'my string' | xargs --null rm
This will firstly use find to print the names of all the files in the current directory and any subdirectories. These names are printed with a null terminator rather than the usual newline separator (try piping the output to od -c to see the effect of the -print0 argument.
Then the --null parameter to xargs tells it to accept null-terminated inputs. xargs will then call grep on a list of filenames.
The -Z argument to grep works like the -print0 argument to find, so grep will print out its results null-terminated (which is why the final call to xargs needs a --null option too). The -L argument to grep causes grep to print the filenames of those files on its command line (that xargs has added) which don't match the regular expression:
my string
If you want simple matching without regular expression magic then add the -F option. If you want more powerful regular expressions then give a -E argument. It's a good habit to use single quotes rather than double quotes as this protects you against any shell magic being applied to the string (such as variable substitution)
Finally you call xargs again to get rid of all the files that you've found with the previous calls.
The problem with calling grep directly from the find command with the -exec argument is that grep then gets invoked once per file rather than once for a whole batch of files as xargs does. This is much faster if you have lots of files. Also don't be tempted to do stuff like:
rm $(some command that produces lots of filenames)
It's always better to pass it to xargs as this knows the maximum command-line limits and will call rm multiple times each time with as many arguments as it can.
Note that this solution would have been simpler without the need to cope with files containing white space and new lines.
Alternatively
grep -r -L -Z 'my string' . | xargs --null rm
will work too (and is shorter). The -r argument to grep causes it to read all files in the directory and recursively descend into any subdirectories). Use the find ... approach if you want to do some other tests on the files as well (such as age or permissions).
Note that any of the single letter arguments, with a single dash introducer, can be grouped together (for instance as -rLZ). But note also that find does not use the same conventions and has multi-letter arguments introduced with a single dash. This is for historical reasons and hasn't ever been fixed because it would have broken too many scripts.
GNU grep and bash.
grep -rLZ "$str" . | while IFS= read -rd '' x; do rm "$x"; done
Use a find solution if portability is needed. This is slightly faster.
EDIT: This is how you SHOULD NOT do this! Reason is given here. Thanks to #ormaaj for pointing it out!
find . -type f | grep -v "exclude string" | xargs rm
Note: grep pattern will match against full file path from current directory (see find . -type f output)
One possibility is
find . -type f '!' -exec grep -q "my string" {} \; -exec echo rm {} \;
You can remove the echo if the output of this preview looks correct.
The equivalent with -delete is
find . -type f '!' -exec grep -q "user_id" {} \; -delete
but then you don't get the nice preview option.
To remove files not containing a specific string:
Bash:
To use them, enable the extglob shell option as follows:
shopt -s extglob
And just remove all files that don't have the string "fix":
rm !(*fix*)
If you want to don't delete all the files that don't have the names "fix" and "class":
rm !(*fix*|*class*)
Zsh:
To use them, enable the extended glob zsh shell option as follows:
setopt extended_glob
Remove all files that don't have the string, in this example "fix":
rm -- ^*fix*
If you want to don't delete all the files that don't have the names "fix" and "class":
rm -- ^(*fix*|*class*)
It's possible to use it for extensions, you only need to change the regex: (.zip) , (.doc), etc.
Here are the sources:
https://www.tecmint.com/delete-all-files-in-directory-except-one-few-file-extensions/
https://codeday.me/es/qa/20190819/1296122.html
I can think of a few ways to approach this. Here's one: find and grep to generate a list of files with no match, and then xargs rm them.
find yourdir -type f -exec grep -F -L 'yourstring' '{}' + | xargs -d '\n' rm
This assumes GNU tools (grep -L and xargs -d are non-portable) and of course no filenames with newlines in them. It has the advantage of not running grep and rm once per file, so it'll be reasonably fast. I recommend testing it with "echo" in place of "rm" just to make sure it picks the right files before you unleash the destruction.
This worked for me, you can remove the -f if you're okay with deleting directories.
myString="keepThis"
for x in `find ./`
do if [[ -f $x && ! $x =~ $myString ]]
then rm $x
fi
done
Another solution (although not as fast). The top solution didn't work in my case because the string I needed to use in place of 'my string' has special characters.
find -type f ! -name "*my string*" -exec rm {} \; -print

Execute command for every file in the current dir

How can i execute a certain command for every file/folder in the current folder?
I've started with this as a base script, but this seems that its only working when using temporary files, and i dont really like the ideea. Is there any other way?
FOLDER=".";
DIRS=`ls -1 "$FOLDER">/tmp/DIRS`;
echo >"/tmp/DIRS1";
while read line ; do
SIZE=`du "$FOLDER$line"`;
echo $SIZE>>"/tmp/DIRS1";
done < "/tmp/DIRS";
For anyone interested, i wanted to make a list of folders, sorted by their size. Here is the final result
FOLDER="$1";
for f in $FOLDER/*; do
du -sb "$f";
done | sort -n | sed "s#^[0-9]*##" | sed "s#^[^\./]*##" | xargs -L 1 du -sh | sed "s|$FOLDER||";
which leads to du -sb $FOLDER/* | sort -n | sed "s#^[0-9]*##" | sed "s#^[^\./]*##" | xargs -L 1 du -sh | sed "s|$FOLDER||";
Perhaps xargs, which reinvokes the command specified after it for each additional line of parameters received on stdin...
ls -1 $FOLDER | xargs du
But, in this case, why not...
du *
...? Or...
for X in *; do
du $X
done
(Personally, I use zsh, where you can modify the glob pattern to only find say regular files, or only directories, only symlinks etc - I'm pretty sure there's something similar in bash - can dig for details if you need that).
Am I missing part of your requirement?
The find command will let you execute a command for each item it finds, too. Without further arguments it will find all files and folders in the current directory, like this:
$ find -exec du -h {} \;
The {} part is the "variable" where the match is placed, here as the argument to du. \; ends the command.
It is useless to parse output of ls to cycle over files. Bash can do it with wildcard expansion.
Storing the result of du in a variable to output it to a file is also a useless use of a variable.
What I suggest:
for i in ./tmp/DIRS/*
do
du "$i" >> "/tmp/DIRS1"
done
What's wrong with something like this?
function process() {
echo "Processing $1"
}
for i in *
do
process $i
done
You can put all the "work" you want done inside the function process. This will do it for your current directory.
This works for every file in the current directory:
do
/usr/local/mp3unicode/bin/mp3unicode -s cp1251 --id3v2-encoding unicode "$file"
done
The invocation of action exec can be done by two ways:
find . -type d -exec du -ch {} \;
find . -type d -exec du -ch {} +
In the first command, the substitution {} occurs for each folder found. In the second one all the results of find are passed to exec at once, which matters, to obtain a final total.
https://www.eovao.com/en/a/bash%20find%20exec%20linux/2/bash-execute-action-on-find-(-exec)-for-each-file

find string inside a gzipped file in a folder

My current problem is that I have around 10 folders, which contain gzipped files (around on an average 5 each). This makes it 50 files to open and look at.
Is there a simpler method to find out if a gzipped file inside a folder has a particular pattern or not?
zcat ABC/myzippedfile1.txt.gz | grep "pattern match"
zcat ABC/myzippedfile2.txt.gz | grep "pattern match"
Instead of writing a script, can I do the same in a single line, for all the folders and sub folders?
for f in `ls *.gz`; do echo $f; zcat $f | grep <pattern>; done;
zgrep will look in gzipped files, has a -R recursive option, and a -H show me the filename option:
zgrep -R --include=*.gz -H "pattern match" .
OS specific commands as not all arguments work across the board:
Mac 10.5+: zgrep -R --include=\*.gz -H "pattern match" .
Ubuntu 16+: zgrep -i -H "pattern match" *.gz
You don't need zcat here because there is zgrep and zegrep.
If you want to run a command over a directory hierarchy, you use find:
find . -name "*.gz" -exec zgrep ⟨pattern⟩ \{\} \;
And also “ls *.gz” is useless in for and you should just use “*.gz” in the future.
how zgrep don't support -R
I think the solution of "Nietzche-jou" could be a better answer, but I would add the option -H to show the file name something like this
find . -name "*.gz" -exec zgrep -H 'PATTERN' \{\} \;
use the find command
find . -name "*.gz" -exec zcat "{}" + |grep "test"
or try using the recursive option (-r) of zcat
Coming in a bit late on this, had a similar problem and was able to resolve using;
zcat -r /some/dir/here | grep "blah"
As detailed here;
http://manpages.ubuntu.com/manpages/quantal/man1/gzip.1.html
However, this does not show the original file that the result matched from, instead showing "(standard input)" as it's coming in from a pipe. zcat does not seem to support outputting a name either.
In terms of performance, this is what we got;
$ alias dropcache="sync && echo 3 > /proc/sys/vm/drop_caches"
$ find 09/01 | wc -l
4208
$ du -chs 09/01
24M
$ dropcache; time zcat -r 09/01 > /dev/null
real 0m3.561s
$ dropcache; time find 09/01 -iname '*.txt.gz' -exec zcat '{}' \; > /dev/null
0m38.041s
As you can see, using the find|zcat method is significantly slower than using zcat -r when dealing with even a small volume of files. I was also unable to make zcat output the file name (using -v will apparently output the filename, but not on every single line). It would appear that there isn't currently a tool that will provide both speed and name consistency with grep (i.e. the -H option).
If you need to identify the name of the file that the result belongs to, then you'll need to either write your own tool (could be done in 50 lines of Python code) or use the slower method. If you do not need to identify the name, then use zcat -r.
Hope this helps
find . -name "*.gz"|xargs zcat | grep "pattern" should do.
zgrep "string" ./*/*
You can use above command to search for string in .gz files of dir directory where dir has following sub-directories structure:
/dir
/childDir1
/file1.gz
/file2.gz
/childDir2
/file3.gz
/file4.gz
/childDir3
/file5.gz
/file6.gz
You can use this command -
zgrep "foo" $(find . -name "*.gz")

Resources