split files greater than 500kb at particular directory

split files greater than 500kb at particular directory - linux

I want to split files which are >500kb. For this first I use find to list all such files find . -maxdepth 1 -name '*.log' -size +500k that returns "./filename" and then I write another command to split file according to my requirement split -b 500k -d -a 4 filename filename. here filename is the output of first command. Now can someone help me to combine both of them such that the output of first is input of second command.

How about a one liner?
find . -maxdepth 1 -name '*' -size +500k -exec 'split' '-b' '500k' '-d' '-a' '4' '{}' '{}' ';'

You can use a process substitution for this:
while IFS= read file
do
split -b 500k -d -a 4 "$file" "$file"
done < <(find . -maxdepth 1 -name '*.log' -size +500k)
That is: the while loop gets fed by the find output.

Related

Circumvent Argument list too long in script (for loop)

I've seen a few answers regarding this, but as a newbie, I don't really understand how to implement that in my script.
it should be pretty easy (for those who can stuff like this)
I'm using a simple
for f in "/drive1/"images*.{jpg,png}; do
but this is simply overloading and giving me
Argument list too long
How is this easiest solved?

Argument list too long workaroud
Argument list length is something limited by your config.
getconf ARG_MAX
2097152
But after discuss around differences between bash specifics and system (os) limitations (see comments from that other guy), this question seem wrong:
Regarding discuss on comments, OP tried something like:
ls "/simple path"/image*.{jpg,png} | wc -l
bash: /bin/ls: Argument list too long
This happen because of OS limitation, not bash!!
But tested with OP code, this work finely
for file in ./"simple path"/image*.{jpg,png} ;do echo -n a;done | wc -c
70980
Like:
printf "%c" ./"simple path"/image*.{jpg,png} | wc -c
Reduce line length by reducing fixed part:
First step: you could reduce argument length by:
cd "/drive1/"
ls images*.{jpg,png} | wc -l
But when number of file will grow, you'll be buggy again...
More general workaround:
find "/drive1/" -type f \( -name '*.jpg' -o -name '*.png' \) -exec myscript {} +
If you want this to NOT be recursive, you may add -maxdepth as 1st option:
find "/drive1/" -maxdepth 1 -type f \( -name '*.jpg' -o -name '*.png' \) \
-exec myscript {} +
There, myscript will by run with filenames as arguments. The command line for myscript is built up until it reaches a system-defined limit.
myscript /drive1/file1.jpg '/drive1/File Name2.png' /drive1/...
From man find:
-exec command {} +
This variant of the -exec action runs the specified command on
the selected files, but the command line is built by appending
each selected file name at the end; the total number of invoca‐
tions of the command will be much less than the number of
matched files. The command line is built in much the same way
that xargs builds its command lines. Only one instance of `{}'
Inscript sample
You could create your script like
#!/bin/bash
target=( "/drive1" "/Drive 2/Pictures" )
[ "$1" = "--run" ] && exec find "${target[#]}" -type f \( -name '*.jpg' -o \
-name '*.png' \) -exec $0 {} +
for file ;do
echo Process "$file"
done
Then you have to run this with --run as argument.
work with any number of files! (Recursively! See maxdepth option)
permit many target
permit spaces and special characters in file and directrories names
you could run same script directly on files, without --run:
./myscript hello world 'hello world'
Process hello
Process world
Process hello world
Using pure bash
Using arrays, you could do things like:
allfiles=( "/drive 1"/images*.{jpg,png} )
[ -f "$allfiles" ] || { echo No file found.; exit ;}
echo Number of files: ${#allfiles[#]}
for file in "${allfiles[#]}";do
echo Process "$file"
done

There's also a while read loop:
find "/drive1/" -maxdepth 1 -mindepth 1 -type f \( -name '*.jpg' -o -name '*.png' \) |
while IFS= read -r file; do
or with zero terminated files:
find "/drive1/" -maxdepth 1 -mindepth 1 -type f \( -name '*.jpg' -o -name '*.png' \) -print0 |
while IFS= read -r -d '' file; do

BASH: Filter list of files by return value of another command

I have series of directories with (mostly) video files in them, say
test1
1.mpg
2.avi
3.mpeg
junk.sh
test2
123.avi
432.avi
432.srt
test3
asdf.mpg
qwerty.mpeg
I create a variable (video_dir) with the directory names (based on other parameters) and use that with find to generate the basic list. I then filter based on another variable (video_type) for file types (because there is sometimes non-video files in the dirs) piping it through egrep. Then I shuffle the list around and save it out to a file. That file is later used by mplayer to slideshow through the list.
I currently use the following command to accomplish that. I'm sure it's a horrible way to do it, but it works for me and it's quite fast even on big directories.
video_dir="/test1 /test2"
video_types=".mpg$|.avi$|.mpeg$"
find ${video_dir} -type f |
egrep -i "${video_types}" |
shuf > "$TEMP_OUT"
I now would like to add the ability to filter out files based on the resolution height of the video file. I can get that from.
mediainfo --Output='Video;%Height%' filename
Which just returns a number. I have tried using the -exec functionality of find to run that command on each file.
find ${video_dir} -type f -exec mediainfo --Output='Video;%Height%' {} \;
but that just returns the list of heights, not the filenames and I can't figure out how to reject ones based on a comparison, like <480.
I could do a for next loop but that seems like a bad (slow) idea.
Using info from #mark-setchell I modified it to,
video_dir="test1"
find ${video_dir} -type f \
-exec bash -c 'h=$(mediainfo --Output="Video;%Height%" "$1"); [[ $h -gt 480 ]]' _ {} \; -print
Which works.

You can replace your egrep with the following so you are still inside the find command (-iname is case insensitive and -o represents a logical OR):
find test1 test2 -type f \
\( -iname "*.mpg" -o -iname "*.avi" -o -iname "*.mpeg" \) \
NEXT_BIT
The NEXT_BIT can then -exec bash and exit with status 0 or 1 depending on whether you want the current file included or excluded. So it will look like this:
-exec bash -c 'H=$(mediainfo -output ... "$1"); [ $H -lt 480 ] && exit 1; exit 0' _ {} \;
So, taking note of #tripleee advice in comments about superfluous exit statements, I get this:
find test1 test2 -type f \
\( -iname "*.mpg" -o -iname "*.avi" -o -iname "*.mpeg" \) \
-exec bash -c 'h=$(mediainfo ...options... "$1"); [ $h -lt 480 ]' _ {} \; -print

This Q&A was focused on one particular case, so the accepted answer is not as general as it could be.
find
If the list of files comes from find, one can use its filtering facilities, e.g. -exec:
find ${video_dir} -type f \
-exec COMMAND \; \
-print
Here
COMMAND is not enclosed in quotes -- find reads everything after -exec and up to a \;
find will expand {} to the current file name (including path -- you might find -execdir helpful, which will cd to the file's directory and replace {} with the leaf file name)
The exit code of COMMAND is treated as follows:
0 -> true
non-0 -> false
Note that you can build more complex expressions (e.g. -not -exec ...), which will be evaluated "from left to right, according to the rules of precedence ... -and is assumed where the operator is omitted." (per man find)
xargs
If the list of files comes from elsewhere (and is available on stdin), you can use xargs as follows (from
If xargs is map, what is filter? )
ls | xargs -I{} bash -c "COMMAND '{}' && echo '{}'"

Here is my solution.
#!/bin/bash
shopt -s nullglob
video_dir=(/test1 /test2)
while IFS= read -rd '' file; do
if [[ $file = *.#(mpg|avi|mpeg|mp4) ]]; then
h=$(mediainfo --Output="Video;%Height%" "$file")
(( h >= 480 )) && echo "$file"
fi
done < <(find "${video_dir[#]}" -type f -print0)
This solution you can process everything inside the while read loop.

How can I search for files in directories that contain spaces in names, using "find"?

How can I search for files in directories that contain spaces in names, using find?
i use script
#!/bin/bash
for i in `find "/tmp/1/" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
do
for j in `ls "$i" | grep sh | sed 's/\.txt//g'`
do
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
but the files and directories that contain spaces in names are not processed?

This will grab all the files that have spaces in them
$ls
more space nospace stillnospace this is space
$find -type f -name "* *"
./this is space
./more space

I don't know how to achieve you goal. But given your actual solution, the problem is not really with find but with the for loops since "spaces" are taken as delimiter between items.
find has a useful option for those cases:
from man find:
-print0
True; print the full file name on the standard output, followed by a null character
(instead of the newline character that -print uses). This allows file names
that contain newlines or other types of white space to be correctly interpreted
by programs that process the find output. This option corresponds to the -0
option of xargs.
As the man saids, this will match with the -0 option of xargs. Several other standard tools have the equivalent option. You probably have to rewrite your complex pipeline around those tools in order to process cleanly file names containing spaces.
In addition, see bash "for in" looping on null delimited string variable to learn how to use for loop with 0-terminated arguments.

Do it like this
find . -type f -name "* *"
Instead of . you can specify your path, where you want to find files with your criteria

Your first for loop is:
for i in `find "/tmp/1" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
If I understand it correctly, it is looking for all text files in the /tmp/1 directory, and then attempting to remove the file name with the sed command right? This would cause a single directory with multiple .txt files to be processed by the inner for loop more than once. Is that what you want?
Instead of using sed to get rid of the filename, you can use dirname instead. Also, later on, you use sed to get rid of the extension. You can use basename for that.
for i in `find "/tmp/1" -iname "*.txt"` ; do
path=$(dirname "$i")
for j in `ls $path | grep POD` ; do
file=$(basename "$j" .txt)
# Do what ever you want with the file
This doesn't solve the problem of having a single directory processed multiple times, but if it is an issue for you, you can use the for loop above to store the file name in an array instead and then remove duplicates with sort and uniq.

Use while read loop with null-delimited pathname output from find:
#!/bin/bash
while IFS= read -rd '' i; do
while IFS= read -rd '' j; do
find "/tmp/2/" -iname "$j.sh" -exec echo cp '{}' "$i" \;
done <(exec find "$i" -maxdepth 1 -mindepth 1 -name '*POD*' -not -name '*.txt' -printf '%f\0')
done <(exec find /tmp/1 -iname '*.txt' -not -iname '[0-9A-Za-z]*.txt' -print0)

Never used for i in $(find...) or similar as it'll fail for file names containing white space as you saw.
Use find ... | while IFS= read -r i instead.
It's hard to say without sample input and expected output but something like this might be what you need:
find "/tmp/1/" -iname "*.txt" |
while IFS= read -r i
do
i="${i%%[0-9A-Za-z]*\.txt}"
for j in "$i"/*sh*
do
j="${j%%\.txt}"
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
The above will still fail for file names that contains newlines. If you have that situation and can't fix the file names then look into the -print0 option for find, and piping it to xargs -0.

Capture the output of command line in a variable in Unix

I want to capture the output of the command below in a variable.
Command:
find . -iname 'FIL*'.TXT
The output is :
./FILE1.TXT
I want to capture './FILE1.TXT' into 'A' variable. But when I am trying
A=`find . -iname 'FIL*'.TXT`
then this command is displaying the data of the file. But I want ./FILE1.TXT value in the variable A.

# ls *.txt
test1.txt test.txt
# find ./ -maxdepth 1 -iname "*.txt"
./test1.txt
./test.txt
# A=$(find ./ -maxdepth 1 -iname "*.txt")
# echo $A
./test1.txt ./test.txt
You can ignore -maxdepth 1 if you want to. I had to use it for this example.
Or with a single file:
# ls *.txt
test.txt
# find ./ -maxdepth 1 -iname "*.txt"
./test.txt
# A=$(find ./ -maxdepth 1 -iname "*.txt")
# echo $A
./test.txt

Do you try ?
A="`find . -iname 'FIL*'.TXT`"
and
A="`find . -iname 'FIL*'.TXT -print`"

A file does not have any value, but does have a content. Use the following to display that content.
find . -iname 'FIL*'.TXT -exec cat {} \;
If you want all the contents (of all such files) in a variable, then
A=$(find . -iname 'FIL*'.TXT -exec cat {} \;)
BTW you could have used
find . -iname 'FIL*.TXT' -print0 | xargs -0 cat
If you want the names of such files in a variable, try
A=$(find . -iname 'FILE*.txt' -print)
BTW, on some several recent interactive shells (zsh, bash version 4 but not earlier versions) just write
A=**/FILE*.txt
My feeling is that the ** feature is by itself worth switching to a newer shell, but it is just my opinion.
Also, don't forget that files may have several or no names. Read about inodes ...

Find all zero-byte files in directory and subdirectories

How can I find all zero-byte files in a directory and its subdirectories?
I have done this:
#!/bin/bash
lns=`vdir -R *.* $dir| awk '{print $8"\t"$5}'`
temp=""
for file in $lns; do
if test $file = "0"; then
printf $temp"\t"$file"\n"
fi
temp=$file
done
But, I only get results in the current directory, not subdirs,
and if any file name contains a space then I get only first word followed by tab

To print the names of all files in and below $dir of size 0:
find "$dir" -size 0
Note that not all implementations of find will produce output by default, so you may need to do:
find "$dir" -size 0 -print
Two comments on the final loop in the question:
Rather than iterating over every other word in a string and seeing if the alternate values are zero, you can partially eliminate the issue you're having with whitespace by iterating over lines. eg:
printf '1 f1\n0 f 2\n10 f3\n' | while read size path; do
test "$size" -eq 0 && echo "$path"; done
Note that this will fail in your case if any of the paths output by ls contain newlines, and this reinforces 2 points: don't parse ls, and have a sane naming policy that doesn't allow whitespace in paths.
Secondly, to output the data from the loop, there is no need to store the output in a variable just to echo it. If you simply let the loop write its output to stdout, you accomplish the same thing but avoid storing it.

As addition to the answers above:
If you would like to delete those files
find $dir -size 0 -type f -delete

No, you don't have to bother grep.
find $dir -size 0 ! -name "*.xml"

Bash 4+ tested -
This is the correct way to search for size 0:
find /path/to/dir -size 0 -type f -name "*.xml"
Search for multiple file extensions of size 0:
find /path/to/dir -size 0 -type f \( -iname \*.css -o -iname \*.js \)
Note: If you removed the \( ... \) the results would be all of the files that meet this requirement hence ignoring the size 0.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

split files greater than 500kb at particular directory - linux

How about a one liner? find . -maxdepth 1 -name '*' -size +500k -exec 'split' '-b' '500k' '-d' '-a' '4' '{}' '{}' ';'

You can use a process substitution for this: while IFS= read file do split -b 500k -d -a 4 "$file" "$file" done < <(find . -maxdepth 1 -name '*.log' -size +500k) That is: the while loop gets fed by the find output.

Related

Circumvent Argument list too long in script (for loop)

BASH: Filter list of files by return value of another command

How can I search for files in directories that contain spaces in names, using "find"?

Capture the output of command line in a variable in Unix

Find all zero-byte files in directory and subdirectories

Categories

Resources