Bash: List file size using ls , awk and grep - linux

I entered the following commands on the terminal:
ls -l someFile | awk '{print $5}' | grep [0-9]
Based on my understanding, the ls -l prints the lists of directories and files with long format (including the permissions) within the current directory, so piping the awk (to print desired field from the list, which is the size of file) and grep [0-9] gave me the correct output that I want, which is the file size highlighted in red.
For example:
drwxr-xr-x 2 zdgx6 students 103 Feb 23 2017 delosreyes_hw1
Output is 103 (since that's the size and the font is red)
However, when I tried this on my Bash script
like this: echo " $file size: $( ls -l "$file" | awk '{print $5}' | grep [0-9] ) $regfile "
it outputs the file size correctly but it's not highlighted in red. So I assume that my syntax may have been wrong but i didn't get any errors.
Any idea why that might be?

echo "$(tput setaf 1)$(stat -c '%B' file.log)$(tput sgr0)"
512
You should not use ls to parse the details of any file, you can use stat instead. You can issue stat --help to check the details of various flags provided by stat command. Also by doing this, you do not have to use any additional pipe to feed into awk.

Gnu grep will highlight the matched part of each line if:
you supply the command-line argument --color=always [Note 1], or
stdout is a terminal and you either supply the command-line argument --color=auto or do not specify --color (because auto is the default setting).
Running grep inside $(...) in order to capture the output as part of a bash expansion means that stdout will be redirected to a pipe, which is not a terminal. So unless you specify --color=always, match coloring will be disabled. That's usually want you want when you are processing the output of grep.
So you could "fix" this by using the --color=always option, but really the simpler solution is to send colour control codes directly, since that is the only reason you are using grep.
Colour codes can be sent in a reasonably portable way using the tput utility, which is part of the ncurses package and will generally be installed on any Linux/BSD system. You'll want the following codes:
tput bold # Sets boldface (Otherwise, the colour will be washed out)
tput setaf 1 # 1 is red. 2 is green, 3 yellow, 4 blue, 5 magenta, 6 cyan and 7 white
After you output the highlighted text, you'll need to reset the console to normal:
tput sgr0 # Normal colour and style
So you could do, for example:
echo "$file size: $(tput bold)$(tput setaf 1)$(ls -l "$file" | awk '{print $5}')$(tput sgr0) $regfile"
If you were doing that a lot, you might want to save the tput outputs in bash variables:
bold_red=$(tput bold)$(tput setaf 1)
reset_col=$(tput sgr0)
echo "$file size: $(tput bold)$(tput setaf 1)$(ls -l "$file" | awk '{print $5}')$(tput sgr0) $regfile"
You could also hard-code typical console codes if you know what they are:
printf "%s size: \033[1;31m%s\033m %s\n" "$file" "$(ls -l "$file" | awk '{print $5}')" "$regfile"
Notes
For the benefit of those of us with a different notion of English orthography, grep allows both --color and --colour. To avoid confusion, I used the first one in the text here, although I typically use the second out of habit.

Thanks all! I find that using the grep --color=always was the simplest solution for my question. The other suggestions worked as well but we haven't discussed those commands in class.
Here's my code:
echo " $file $( ls -l "$file" | awk '{print $5}' | grep --color=always [0-9] ) $regfile "

Related

bash script: calculate sum size of files

I'm working on Linux and need to calculate the sum size of some files in a directory.
I've written a bash script named cal.sh as below:
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
echo $line
done<`ls -l | grep opencv | awk '{print $5}'`
However, when I executed this script ./cal.sh, I got an error:
./cal.sh: line 6: `ls -l | grep opencv | awk '{print $5}'`: ambiguous redirect
And if I execute it with sh cal.sh, it seems to work but I will get some weird message at the end of output:
25
31
385758: File name too long
Why does sh cal.sh seem to work? Where does File name too long come from?
Alternatively, you can do:
du -cb *opencv* | awk 'END{print $1}'
option -b will display each file in bytes and -c will print the total size.
Ultimately, as other answers will point out, it's not a good idea to parse the output of ls because it may vary between systems. But it's worth knowing why the script doesn't work.
The ambiguous redirect error is because you need quotes around your ls command i.e.:
while IFS='' read -r line || [[ -n "$line" ]]; do
echo $line
done < "`ls -l | grep opencv | awk '{print $5}'`"
But this still doesn't do what you want. The "<" operator is expecting a filename, which is being defined here as the output of the ls command. But you don't want to read a file, you want to read the output of ls. For that you can use the "<<<" operator, also known as a "here string" i.e.:
while IFS='' read -r line || [[ -n "$line" ]]; do
echo $line
done <<< "`ls -l | grep opencv | awk '{print $5}'`"
This works as expected, but has some drawbacks. When using a "here string" the command must first execute in full, then store the output of said command in a temporary variable. This can be a problem if the command takes long to execute or has a large output.
IMHO the best and most standard method of iterating a commands output line by line is the following:
ls -l | grep opencv | awk '{print $5} '| while read -r line ; do
echo "line: $line"
done
I would recommend against using that pipeline to get the sizes of the files you want - in general parsing ls is something that you should avoid. Instead, you can just use *opencv* to get the files and stat to print the size:
stat -c %s *opencv*
The format specifier %s prints the size of each file in bytes.
You can pipe this to awk to get the sum:
stat -c %s *opencv* | awk '{ sum += $0 } END { if (sum) print sum }'
The if is there to ensure that no input => no output.

optimize xargs argument enumeration

Can this usage of xargs argument enumaration be optimized better?
The aim is to inject single argument in the middle of the actual command.
I do:
echo {1..3} | xargs -I{} sh -c 'for i in {};do echo line $i here;done'
or
echo {1..3} | for i in $(xargs -n1);do echo line $i here; done
I get:
line 1 here
line 2 here
line 3 here
which is what I need but I wondered if loop and temporary variable could be avoided?
You need to separate the input to xargs by newlines:
echo {1..3}$'\n' | xargs -I% echo line % here
For array expansions, you can use printf:
ar=({1..3})
printf '%s\n' "${ar[#]}" | xargs -I% echo line % here
(and if it's just for output, you can use it without xargs:
printf 'line %s here\n' "${ar[#]}"
)
Try without xargs. For most situations xargs is overkill.
Depending on what you really want you can choose a solution like
# Normally you want to avoid for and use while, but here you want the things splitted.
for i in $(echo {1 2 3} );do
echo line $i here;
done
# When you want 1 line turned into three, `tr` can help
echo {1..3} | tr " " "\n" | sed 's/.*/line & here/'
# printf will repeat itself when there are parameters left
printf "line %s here\n" $(echo {1..3})
# Using the printf feature you can avoid the echo
printf "line %s here\n" {1..3}
Maybe this?
echo {1..3} | tr " " "\n" | xargs -n1 sh -c ' echo "line $0 here"'
The tr replaces the spaces with newlines, so xargs sees three lines. I would not be surprised if there were a better (more efficient) solution, but this one is quite simple.
Please note I have modified my previous answer to remove the use of {}, which was suggested in the comments to eliminate a potential code injection vulnerability.
There is a not well known feature of GNU sed. You can add the e flag to the s command and then sed executes whatever is in the pattern space and replaces the pattern space with the output if that command.
If you are really only interested in the output of the echo commands, you might try this GNU sed example, which eliminates the temporary variable, the loop (and the xargs as well):
echo {1..3} | sed -r 's/([^ ])+/echo "line \1 here"\n/ge
it fetches one token (i.e. whatever is separated by the spaces)
replaces it with echo "line \1 here"\n command, with \1 replaced by the token
then executes echo
puts the output of the echo command back into pattern space
that means it outputs the result of the three echos
But an even better way to get the desired output is to skip the execution and do the transformation directly in sed, like this:
echo {1..3} | sed -r 's/([^ ])+ ?/line \1 here\n/g'

is it possible run a linux command on the output of a previous command ASSUMING that the previous command comes first?

I know that I can use `` to get the output of a command, for example:
echo `ls`
but is there a way for me to use the ls command first and then run echo on it? For example: ls <some special redirection> echo? I tried ls > echo and it does not do what I want.
The reason I am asking is that sometimes I write complicated commands to get certain output for example: bjobs -u username01 | grep normal | awk '{print $1}' is a simple "complicated" command (sometimes they are 6 or 7 changed together). Now, I am currently having to do
Mycommand `(complicated string of commands)`
but I would much rather just do
(complicated string of commands) <some special redirection> Mycommand
is this possible?
You may use xargs
ls | xargs echo
When you need more actions on your result, you can parse them:
Parsing ls should be avoided, just rewriting the example:
ls | while read file; do
echo I found ${file}
done
This construction can be useful for more difficult parsing:
echo "red ford 2012
blue mustang 1998" | while read color car year; do
echo "My ${color} ${car} is from the year ${year}"
done

BASH: can grep on command line, but not in script

Did this million times already, but this time it's not working
I try to grep "$TITLE" from a file. on command line it's working, "$TITLE" variable is not empty, but when i run the script it finds nothing
*title contains more than one word
echo "$TITLE"
cat PAGE.$TEMP.2 | grep "$TITLE"
what i've tried:
echo "cat PAGE.$TEMP.2 | grep $TITLE"
to see if title is not empty and file name is actually there
Are you sure that $TITLE does not have leading or trailing whitespace which is not in the file? Your fix with the string would strip out whitespace before execution, so it would not see it.
For example, with a file containing 'Line one':
/home/user1> TITLE=' one '
/home/user1> grep "$TITLE" text.txt
/home/user1> cat text.txt | grep $TITLE
Line one
Try echo "<$TITLE>", or echo "$TITLE"|od -xc which sould enable you to spot errant chars.
This command
echo "cat PAGE.$TEMP.2 | grep $TITLE"
echoes a string that starts with 'cat'. It does not run a command. You would want
echo "$( cat PAGE.$TEMP.2 | grep $TITLE )"
although that is identical in functionality to the simpler
cat PAGE.$TEMP.2 | grep $TITLE
And as pointed out by others, there is no need to pipe a single file using cat; grep can read from files just fine:
grep "$TITLE" "PAGE.$TEMP.2"
(Your default behavior should be to quote parameter expansions, unless you can show it is incorrect to do so.)
Works for me:
~> cat test.dat
abc
cda
xyz
~> export GRP=cda
~> cat test.dat | grep $GRP
cda
Edit:
Also the proper way to use grep is:
~> grep $GRP test.dat

Linux using grep to print the file name and first n characters

How do I use grep to perform a search which, when a match is found, will print the file name as well as the first n characters in that file? Note that n is a parameter that can be specified and it is irrelevant whether the first n characters actually contains the matching string.
grep -l pattern *.txt |
while read line; do
echo -n "$line: ";
head -c $n "$line";
echo;
done
Change -c to -n if you want to see the first n lines instead of bytes.
You need to pipe the output of grep to sed to accomplish what you want. Here is an example:
grep mypattern *.txt | sed 's/^\([^:]*:.......\).*/\1/'
The number of dots is the number of characters you want to print. Many versions of sed often provide an option, like -r (GNU/Linux) and -E (FreeBSD), that allows you to use modern-style regular expressions. This makes it possible to specify numerically the number of characters you want to print.
N=7
grep mypattern *.txt /dev/null | sed -r "s/^([^:]*:.{$N}).*/\1/"
Note that this solution is a lot more efficient that others propsoed, which invoke multiple processes.
There are few tools that print 'n characters' rather than 'n lines'. Are you sure you really want characters and not lines? The whole thing can perhaps be best done in Perl. As specified (using grep), we can do:
pattern="$1"
shift
n="$2"
shift
grep -l "$pattern" "$#" |
while read file
do
echo "$file:" $(dd if="$file" count=${n}c)
done
The quotes around $file preserve multiple spaces in file names correctly. We can debate the command line usage, currently (assuming the command name is 'ngrep'):
ngrep pattern n [file ...]
I note that #litb used 'head -c $n'; that's neater than the dd command I used. There might be some systems without head (but they'd pretty archaic). I note that the POSIX version of head only supports -n and the number of lines; the -c option is probably a GNU extension.
Two thoughts here:
1) If efficiency was not a concern (like that would ever happen), you could check $status [csh] after running grep on each file. E.g.: (For N characters = 25.)
foreach FILE ( file1 file2 ... fileN )
grep targetToMatch ${FILE} > /dev/null
if ( $status == 0 ) then
echo -n "${FILE}: "
head -c25 ${FILE}
endif
end
2) GNU [FSF] head contains a --verbose [-v] switch. It also offers --null, to accomodate filenames with spaces. And there's '--', to handle filenames like "-c". So you could do:
grep --null -l targetToMatch -- file1 file2 ... fileN |
xargs --null head -v -c25 --

Resources