Printing the number of lines

Printing the number of lines - linux

I have a directory that contains only .txt files. I want to print the number of lines for every file. When I write cat file.txt | wc -l the number of lines appears but when I want to make a script it's more complicated. I have this code:
for fis in `ls -R $1`
do
echo `cat $fis | wc -l`
done
I tried: wc -l $fis , with awk,grep and it doesn't work. It tells that:
cat: fis1: No such file or directory
0
How can I do to print the number of lines?

To find files recursively in subdirectories, use the find command, not ls -R, which is mainly intended for human reading.
find "$1" -type f -exec wc -l {} +
The problems with looping over the output of ls -R are:
Filenames with whitespace won't be parsed correctly.
It prints other output beside just the filenames.

Not the problem here, but the echo command is more than needed:
You can use
wc -l "${fis}"
What goes wrong?
You have a subdir called fis1. Look to the output of ls:
# ls -R fis1
fis1:
file1_in_fis1.txt
When you are parsing this output, your script will try
echo `cat fis1: | wc -l`
The cat will tell you No such file or directory and wc counts 0.
As #Barmar explained, ls prints additional output you do not want.
Do not try to patch your attempt by | grep .txt and if [ -f "${fis}"]; then .., these will fail with filename with spaces.txt. So use find or shopt (and accept the answer of #Barmar or #Cyrus).

Related

Bash script to move first N files with specific name

I'm trying to move only 100 files with a specific extensions (from the current directory to the parent directory), but the following attempt of mine does not work
for file in $(ls -U | grep *.txt | tail -100)
do
mv $file ../
done
Can you point me to the correct approach?

Since you didn't quote *.txt, the shell expanded it to all the filenames ending in .txt. So your command is something like:
ls -U | grep file1.txt file2.txt file3.txt ... | tail -100
Since grep has filename arguments, it ignores its standard input. It outputs all the lines matching file1.txt in the remaining files. There's probably no matches, so nothing is piped to tail -100. And even if there were matches, the output would be the lines from the files, not filenames, so it wouldn't be useful for the mv command.
You can loop over the filenames directly, and use a counter variable to stop after 100 files.
counter=0
for file in *.txt
do
if (( counter >= 100 ))
then break
fi
mv "$file" ../
((counter++))
done
This avoids the pitfalls of parsing the output of ls.

this will do the job:
ls -U *.txt | tail -100 | while read filename; do mv "$filename" ../; done
while read filename respect spaces in the filename.

Run this in the text file directory:
#!/bin/bash
for txt_file in ./*.txt; do
((c++==100)) && break
mv "$txt_file" ../
done

Modify ls output to display [+] in front of directories

I am looking for a way to modify the ls output in that way that every directory displays [+] in front of the directory name. Ideally doing via bashrc.
me#computer[~]$ ls
[+]directory [+]directory
[+]directory file.png
file file.txt
readme
Currently I am just customizing the color output:
LS_COLORS=$LS_COLORS:'di=1;37;4' ; export LS_COLORS

This might help you, but it gives you only one column output:
ls | sed -r "$(find -maxdepth 1 -type d | cut -d/ -f2 | sed "1 d; 2~1 { s:.*:s/^\\(&\\)$/[+]\\\\1/;:g}")"
It works by piping the output of ls through sed and the sed script is dynamically build using a pipe that converts a list of directories to a list of S/^dirname$/[+]dirname/; sed script lines.
Just try out all the parts individually to see how it works.
For example when run in /etc the outputs starts likes this:
[+]acpi
adduser.conf
[+]adobe
[+]akonadi
aliases
aliases.db
You might want to alias the command in your bashrc.
And you might want to look into the tree command.

You can use :
ls -l : directories will start with d.
ls -p : a slash will be added into directory name like dir/
ls -F : will also add a slash after dir names and other marks to other file types (*, etc)
ls -d */ : As advised in comments, will list only dir names with a slash at the end. Remove -d to see also sub dir contents.
In terms of manipulating ls output you could go like :
ls -l |awk '/^d/{print "[+]"$NF}; /^[^d]/{print $NF}' |column
You can also use find and avoid parsing ls since had been said that parsing ls might break if file names contain strange chars like new lines etc.
find in this format will produce output identical to above ls:
find . -maxdepth 1 -printf '%Y %f\n' |awk '/^d/{print "[+]"$NF}; /^[^d]/{print $NF}' |column

you should also try this using a bash script
#!/usr/bin/env bash
myls() {
for i in *;do
[[ -d "${i}" ]] && {
printf "%s\n" "[+] ${i}"
continue;
}
printf "%s\n" "${i}"
done
}
source the script in your .bashrc file. Whenever you want to use this, just call myls in the directory.
you should note that it does not give you a colored output

Ordering a loop in bash

I've a bash script like this:
for d in /home/test/*
do
echo $d
done
Which ouputs this:
/home/test/newer dir
/home/test/oldest dir
I'd like to order the folders by creation time so that the 'oldest dir' directory appears first in the list. I've tried ls and tree variations to no avail.
For example,
for d in `ls -d -c -1 $PWD/*`
Returns:
/home/test/oldest
dir
/home/test/newer
dir
Very close, but it does not respect the space in the directory name. My question, how would I have oldest dir on top and support the whitespace?

ls -d -c $PWD/* | while read line
do echo "$line"
done

Another technique, kind of a Schwartzian transform:
stat -c $'%Z\t%n' /home/test/* | sort -n | cut -f2- |
while IFS= read -r filename; do
# ...
This solution is fragile with filenames containing newlines.

Problems with Grep Command in bash script

I'm having some rather unusual problems using grep in a bash script. Below is an example of the bash script code that I'm using that exhibits the behaviour:
UNIQ_SCAN_INIT_POINT=1
cat "$FILE_BASENAME_LIST" | uniq -d >> $UNIQ_LIST
sed '/^$/d' $UNIQ_LIST >> $UNIQ_LIST_FINAL
UNIQ_LINE_COUNT=`wc -l $UNIQ_LIST_FINAL | cut -d \ -f 1`
while [ -n "`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`" ]; do
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
CURRENT_DUPECHK_FILE=$FILE_DUPEMATCH-$CURRENT_LINE
grep $CURRENT_LINE $FILE_LOCTN_LIST >> $CURRENT_DUPECHK_FILE
MATCH=`grep -c $CURRENT_LINE $FILE_BASENAME_LIST`
CMD_ECHO="$CURRENT_LINE matched $MATCH times," cmd_line_echo
echo "$CURRENT_DUPECHK_FILE" >> $FILE_DUPEMATCH_FILELIST
let UNIQ_SCAN_INIT_POINT=UNIQ_SCAN_INIT_POINT+1
done
On numerous occasions, when grepping for the current line in the file location list, it has put no output to the current dupechk file even though there have definitely been matches to the current line in the file location list (I ran the command in terminal with no issues).
I've rummaged around the internet to see if anyone else has had similar behaviour, and thus far all I have found is that it is something to do with buffered and unbuffered outputs from other commands operating before the grep command in the Bash script....
However no one seems to have found a solution, so basically I'm asking you guys if you have ever come across this, and any idea/tips/solutions to this problem...
Regards
Paul

The `problem' is the standard I/O library. When it is writing to a terminal
it is unbuffered, but if it is writing to a pipe then it sets up buffering.
try changing
CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
to
CURRENT LINE=`sed "$UNIQ_SCAN_INIT_POINT"'q;d' $UNIQ_LIST_FINAL`

Are there any directories with spaces in their names in $FILE_LOCTN_LIST? Because if they are, those spaces will need escaped somehow. Some combination of find and xargs can usually deal with that for you, especially xargs -0

A small bash script using md5sum and sort that detects duplicate files in the current directory:
CURRENT="" md5sum * |
sort |
while read md5sum filename;
do
[[ $CURRENT == $md5sum ]] && echo $filename is duplicate;
CURRENT=$md5sum;
done

you tagged linux, some i assume you have tools like GNU find,md5sum,uniq, sort etc. here's a simple example to find duplicate files
$ echo "hello world">file
$ md5sum file
6f5902ac237024bdd0c176cb93063dc4 file
$ cp file file1
$ md5sum file1
6f5902ac237024bdd0c176cb93063dc4 file1
$ echo "blah" > file2
$ md5sum file2
0d599f0ec05c3bda8c3b8a68c32a1b47 file2
$ find . -type f -exec md5sum "{}" \; |sort -n | uniq -w32 -D
6f5902ac237024bdd0c176cb93063dc4 ./file
6f5902ac237024bdd0c176cb93063dc4 ./file1

Linux: Removing files that don't contain all the words specified

Inside a directory, how can I delete files that lack any of the words specified, so that only files that contain ALL the words are left? I tried to write a simple bash shell script using grep and rm commands, but I got lost. I am totally new to Linux, any help would be appreciated

How about:
grep -L foo *.txt | xargs rm
grep -L bar *.txt | xargs rm
If a file does not contain foo, then the first line will remove it.
If a file does not contain bar, then the second line will remove it.
Only files containing both foo and bar should be left
-L, --files-without-match
Suppress normal output; instead print the name of each input
file from which no output would normally have been printed. The
scanning will stop on the first match.
See also #Mykola Golubyev's post for placing in a loop.

list=`Word1 Word2 Word3 Word4 Word5`
for word in $list
grep -L $word *.txt | xargs rm
done

Addition to the answers above: Use the newline character as delimiter to handle file names with spaces!
grep -L $word $file | xargs -d '\n' rm

grep -L word | xargs rm

To do the same matching filenames (not the contents of files as most of the solutions above) you can use the following:
for file in `ls --color=never | grep -ve "\(foo\|bar\)"`
do
rm $file
done
As per comments:
for file in `ls`
shouldn't be used. The below does the same thing without using the ls
for file in *
do
if [ x`echo $file | grep -ve "\(test1\|test3\)"` == x ]; then
rm $file
fi
done
The -ve reverses the search for the regexp pattern for either foo or bar in the filename.
Any further words to be added to the list need to be separated by \|
e.g. one\|two\|three

First, remove the file-list:
rm flist
Then, for each of the words, add the file to the filelist if it contains that word:
grep -l WORD * >>flist
Then sort, uniqify and get a count:
sort flist | uniq -c >flist_with_count
All those files in flsit_with_count that don't have the number of words should be deleted. The format will be:
2 file1
7 file2
8 file3
8 file4
If there were 8 words, then file1 and file2 should be deleted. I'll leave the writing/testing of the script to you.
Okay, you convinced me, here's my script:
#!/bin/bash
rm -rf flist
for word in fopen fclose main ; do
grep -l ${word} *.c >>flist
done
rm $(sort flist | uniq -c | awk '$1 != 3 {print $2} {}')
This removes the files in the directory that didn't have all three words:

You could try something like this but it may break
if the patterns contain shell or grep meta characters:
(in this example one two three are the patterns)
for f in *; do
unset cmd
for p in one two three; do
cmd="fgrep \"$p\" \"$f\" && $cmd"
done
eval "$cmd" >/dev/null || rm "$f"
done

This will remove all files that doesn't contain words Ping or Sent
grep -L 'Ping\|Sent' * | xargs rm

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Printing the number of lines - linux

Related

Bash script to move first N files with specific name

Modify ls output to display [+] in front of directories

Ordering a loop in bash

Problems with Grep Command in bash script

Linux: Removing files that don't contain all the words specified

Categories

Resources