how to remove blank space of text files? - linux

I want to remove all empty lines from some text file. I can do it with:
grep '[^[:blank:]]' < file1.dat > file1.dat.nospace
But I need to do it with n-files in a directory. How can I do it?
Any help would be appreciated. Thanks!

You can use it with find
find . -name '*.dat' -exec sed -i.bak '/^[[:blank:]]*$/d' {} +

here is a way:
for filename in *.dat; do
grep '[^[:blank:]]' < $filename > $filename.nospace
done
here is a more robust way, one that works in a larger variety of circumstances:
find . -maxdepth 1 -type f -name "*.dat" | while read filename; do
grep '[^[:blank:]]' < "$filename" > "$filename.nospace"
done
here a much faster way (in execution time, but also in typing). this is the way i would actually do it:
find *.dat -printf "grep '[^[:blank:]]' < \"%f\" > \"%f.nospace\"\n" | sh
here is a more robust version of that:
find . -maxdepth 1 -type f -name "*.dat" -printf "grep '[^[:blank:]]' < \"%f\" > \"%f.nospace\"\n" | sh
ps: here's the actual correct grep for nonblank lines:
grep -v '^$' < $filename > $filename.nospace

this oneliner could probably help you:
for a in /path/to/file_pattern*; do sed "/^\s*$/d" $a > $a.nospace;done

Related

How to find files that have long chains of zero bytes?

How can I find files that have long chains of consecutive 0s (zero bytes - 0x00) as a result of disk failure? For example, how can I find files that have more than 10000 zero bytes in sequence?
Sure, I can write a program using Java or other programming language, but is there a way to do it using more or less standard Linux command line tools?
Update
You can generate test file with dd if=/dev/zero of=zeros bs=1 count=100000.
This may be a start:
find /some/starting/point -type f -size +10000 -exec \
perl -nE 'if (/\x0{10000}/) {say $ARGV; close ARGV}' '{}' +
To test for a single file, named filename:
if tr -sc '\0' '\n' < filename | tr '\0' Z | grep -qE 'Z{1000}'; then
# ...
fi
You can now use a suitable find command to filter relevant files for test.
For example, all *.txt files in PWD:
while read -rd '' filename;do
if tr -sc '\0' '\n' < "$filename" | tr '\0' Z | grep -qE 'Z{1000}'; then
# For example, simply print "$filename"
printf '%s\n' "$filename"
fi
done < <(find . -type f -name '*.txt' -print0)
Find and grep should work just fine:
grep -E "(\0)\1{1000}" <file name>
if it's a single file or a group of files in the same dir
If you want to search throughout the system there's:
find /dir/ -exec grep -E "(\0)\1{1000}" {} \; 2> /dev/null
this is very slow though, if you're looking for something faster and can do without the thousand(or large number) of zeros
I'll suggest replacing the grep with 'grep 000000000*' instead

merge find command output with another command output and redirect to file

I am looking to combine the output of the Linux find and head commands (to derive a list of filenames) with output of another Linux/bash command and save the result in a file such that each filename from the "find" occurs with the other command output on a separate line.
So for example,
- if a dir testdir contains files a.txt, b.txt and c.txt,
- and the output of the other command is some number say 10, the desired output I'm looking for is
10 a.txt
10 b.txt
10 c.txt
On searching here, I saw folks recommending paste for doing similar merging but I couldn't figure out how to do it in this scenario as paste seems to be expecting files . I tried
paste $(find testdir -maxdepth 1 -type f -name "*.text" | head -2) $(echo "10") > output.txt
paste: 10: No such file or directory
Would appreciate any pointers as to what I'm doing wrong. Any other ways of achieving the same thing are also welcome.
Note that if I wanted to make everything appear on the same line, I could use xargs and that does the job.
$find testdir -maxdepth 1 -type f -name "*.text" | head -2 |xargs echo "10" > output.txt
$cat output.txt
10 a.txt b.txt
But my requirement is to merge the two command outputs as shown earlier.
Thanks in advance for any help!
find can handle both the -exec and -print directives, you just need to merge the output:
$ find . -maxdepth 1 -type f -name \*.txt -exec echo hello \; -print | paste - -
hello ./b.txt
hello ./a.txt
hello ./all.txt
Assuming your "command" requires the filename (here's a very contrived example):
$ find . -maxdepth 1 -type f -name \*.txt -exec sh -c 'wc -l <"$1"' _ {} \; -print | paste - -
4 ./b.txt
4 ./a.txt
7 ./all.txt
Of course, that's executing the command for each file. To restrict myself to your question:
cmd_out=$(echo 10)
for file in *.txt; do
echo "$cmd_out $file"
done
Try this,
$find testdir -maxdepth 1 -type f -name "*.text" | head -2 |tr ' ' '\n'|sed -i 's/^/10/' > output.txt
You can make xargs operate on one line at a time using -L1:
find testdir -maxdepth 1 -type f -name "*.text" | xargs -L1 echo "10" > output.txt

Bash find and expression

Is there some way to make this working?
pFile=find ${destpath} (( -iname "${mFile##*/}" )) -o (( -iname "${mFile##*/}" -a -name "*[],&<>*?|\":'()[]*" )) -exec printf '.' \;| wc -c
i need pFile return the number of file with the same filename, or if there aren't, return 0.
I have to do this, because if i only use:
pFile=find ${destpath} -iname "${mFile##*/}" -exec printf '.' \;| wc -c
It doesn't return if there are same filename with metacharacter.
Thanks
EDIT:
"${mFile##*/}" have as output file name in start folder without path.
echo "${mFile##*/}" -> goofy.mp3
Exmple
in start folder i have:
goofy.mp3 - mickey[1].avi - donald(2).mkv - scrooge.3gp
In destination folder i have:
goofy.mp3 - mickey[1].avi -donald(2).mkv -donald(1).mkv -donald(3).mkv -minnie.iso
i want this:
echo pFile -> 3
With:
pFile=find ${destpath} -iname "${mFile##*/}" -exec printf '.' \;| wc -c
echo pFile -> 2
With:
pFile=find ${destpath} -name "*[],&<>*?|\":'()[]*" -exec printf '.' \;| wc -c
echo pFile -> 4
With Same file name i mean:
/path1/mickey[1].avi = /path2/mickey[1].avi
I am not sure I understood your intended semantics of ${mFile##*/}, however looking at your start/destination folder example, I have created the following use case directory structure and the script below to solve your issue:
$ find root -type f | sort -t'/' -k3
root/dir2/donald(1).mkv
root/dir1/donald(2).mkv
root/dir2/donald(2).mkv
root/dir2/donald(3).mkv
root/dir1/goofy.mp3
root/dir2/goofy.mp3
root/dir1/mickey[1].avi
root/dir2/mickey[1].avi
root/dir2/minnie.iso
root/dir1/scrooge.3gp
Now, the following script (I've used gfind to indicated that you need GNU find for this to work, but if you're on Linux, just use find):
$ pFile=$(($(gfind root -type f -printf "%f\n" | wc -l) - $(gfind root -type f -printf "%f\n" | sort -u | wc -l)))
$ echo $pFile
3
I'm not sure this solves your issue, however it does print the number you expected in your provided example.

Can this be printed on same line?

This command will count the number of files in the sub-directories.
find . -maxdepth 1 -type d |while read dir;do echo "$dir";find "$dir" -type f|wc -l;done
Which looks like
./lib64
327
./bin
118
Would it be possible to have it to look like
327 ./lib64
118 ./bin
instead?
There are a number of ways to do this... Here's something that doesn't change your code very much. (I've put it in multiple lines for readability.)
find . -maxdepth 1 -type d | while read dir; do
echo `find "$dir" -type f | wc -l` "$dir"
done
pipe into tr to remove or replace newlines. I expect you want the newline to be turned into a tab character, like this:
find . -maxdepth 1 -type d |while read dir;do
find "$dir" -type f|wc -l | tr '\n' '\t';
echo "$dir";
done
(Edit: I had them the wrong way around)
do echo -n "$dir "
The -n prevents echo from ending the line afterwards.

Finding the number of files in a directory for all directories in pwd

I am trying to list all directories and place its number of files next to it.
I can find the total number of files ls -lR | grep .*.mp3 | wc -l. But how can I get an output like this:
dir1 34
dir2 15
dir3 2
...
I don't mind writing to a text file or CSV to get this information if its not possible to get it on screen.
Thank you all for any help on this.
This seems to work assuming you are in a directory where some subdirectories may contain mp3 files. It omits the top level directory. It will list the directories in order by largest number of contained mp3 files.
find . -mindepth 2 -name \*.mp3 -print0| xargs -0 -n 1 dirname | sort | uniq -c | sort -r | awk '{print $2 "," $1}'
I updated this with print0 to handle filenames with spaces and other tricky characters and to print output suitable for CSV.
find . -type f -iname '*.mp3' -printf "%h\n" | uniq -c
Or, if order (dir-> count instead of count-> dir) is really important to you:
find . -type f -iname '*.mp3' -printf "%h\n" | uniq -c | awk '{print $2" "$1}'
There's probably much better ways, but this seems to work.
Put this in a shell script:
#!/bin/sh
for f in *
do
if [ -d "$f" ]
then
cd "$f"
c=`ls -l *.mp3 2>/dev/null | wc -l`
if test $c -gt 0
then
echo "$f $c"
fi
cd ..
fi
done
With Perl:
perl -MFile::Find -le'
find {
wanted => sub {
return unless /\.mp3$/i;
++$_{$File::Find::dir};
}
}, ".";
print "$_,$_{$_}" for
sort {
$_{$b} <=> $_{$a}
} keys %_;
'
Here's yet another way to even handle file names containing unusual (but legal) characters, such as newlines, ...:
# count .mp3 files (using GNU find)
find . -xdev -type f -iname "*.mp3" -print0 | tr -dc '\0' | wc -c
# list directories with number of .mp3 files
find "$(pwd -P)" -xdev -depth -type d -exec bash -c '
for ((i=1; i<=$#; i++ )); do
d="${#:i:1}"
mp3s="$(find "${d}" -xdev -type f -iname "*.mp3" -print0 | tr -dc "${0}" | wc -c )"
[[ $mp3s -gt 0 ]] && printf "%s\n" "${d}, ${mp3s// /}"
done
' "'\\0'" '{}' +

Resources