Remove multiple spaces in ls -l output - linux

I need to display the filesize and the filename. Like this:
4.0K Desktop
I'm extracting these two fields using cut from the ls -l output:
ls -lhS | cut -d' ' -f5,9
Due to multiple spaces in the ls -l output, I'm getting a few erroneous outputs, like:
4.0K 19:54
4.0K 19:55
6
18:39
31
25
How should I fix this?
I need to accomplish this task using pipes only and no bash scripting ( output could be multiple pipes ) and preferably no sed, awk.
If no alternative to sed or awk is available- use of sed is OK.

You can avoid parsing ls output and use the stat command which comes as part of GNU coreutils in bash for detailed file information.
# -c --format=FORMAT
# use the specified FORMAT instead of the default; output a newline after each use of FORMAT
# %n File name
# %s Total size, in bytes
stat -c '%s %n' *

You can use translate character command before using cut.
ls -lhS | tr -s ' ' | cut -d' ' -f 5,9

Or you could just submit to awk:
$ ls -lhS | awk '$0=$5 OFS $9'
ie. replace whole record $0 with fields $5 and $9 separated by output field separator OFS.

Related

String delimiter is affecting some instances of its individual characters

So basically, I'm trying to print out a human readable list of file and directory sizes (without the current directory listed) with the following code
du -arh | sort -nr | tail -n +2 | awk -F"./" '{print $1 $2 $3}' | head -n $NUM
The NUM variable is just an argument given for the amount of items listed.
The output of the above without the awk delimiter command is
4.0K ./url-list
4.0K ./testurl.sh
4.0K ./diskhogger.sh
4.0K ./backup/url-list
4.0K ./backup.sh
However, adding the awk command outputs
4.0K url-list
4.0K testurl.sh
4.0K diskhogger.sh
4.0K backuurl-list
4.0K backup.sh
A similar output occurs whenever there is a notable subdirectory.
./Library/Cache - LibrarCache, etc.
To be clear, I am attempting to cut out the "./" at the beginning of the file name without affecting the other forward slashes. My preferred output would be:
4.0K url-list
4.0K testurl.sh
4.0K diskhogger.sh
4.0K backup/url-list
4.0K backup.sh
where the "backup/url-list" isn't affected.
Is the '.' in my delimiter a special character I don't know about? If not, what exactly is going on here?
I'm new to shell so any info on this would be great.
Thanks!
Edited for clarity.
you can get rid of some other commands if you're using awk
du -arh | sort -nr | awk -v len="$NUM" 'NR>1{gsub("\\.\\/",""); print} NR==len{exit}'
or simply use
... | sed 's_\./__' | head ...
-F"./" - is treated as "any character followed by forward slash".It's taken as regex pattern where . means "any character" in terms of regex expressions. To use "dot followed by slash" ./ as a delimiter use one of the following approaches:
. within character class
awk -F"[.]/" '{print $1,$2,$3}'
. escaped
awk -F'\\./' '{print $1,$2,$3}'
Yes. awk treats the delimiter as a regular expression. A "." in a regular expression matches any character. Hence awk will split your lines up everywhere there is a character preceding a "/".
If you wish to match a literal ".", the easiest way of doing it is to put it in square brackets, to make it a character class, matching only the character "."
You end up with
du -arh | sort -nr | tail -n +2 | awk -F"[.]/" '{print $1 $2}' | head -n $NUM
However, note that this is not your only problem. If you have a directory whose name ends with a "." and that directory contains files, you will have more than one entry of "./" on some lines in the results from "du". (e.g. a file named "bar" in a directory named "foo." gives you "foo./bar".) A better solution is therefore to use the sub() function in awk to replace the first instance of "./" with "".
du -arh | sort -nr | tail -n +2 | awk '{sub("./",""); print}' | head -n $NUM

how to capture the 1st command output in terminal and to print that variable using the last command in the same line

actual output comes as
$ grep -Hcw count copy_hb_script
copy_hb_script:5
I'm using the below command to get the expected out put but I'm failing
grep -Hcw count copy_hb_script | awk '{print $1}' |xargs ls -ld | awk '{print $8 " " $9 }'
getting out put is
03:49 copy_hb_script
Missing the count of the file, is there any alternate to get the time stamp with count of the file like below
03:49 copy_hb_script:5
You can avoid parsing ls output and use the stat command in bash for detailed file information.
# 'stat -c' produces output as 2016-09-15 16:03:40.655456000 +0530
# Stripping off extra information after the '.' using string-manipulation
# Running the grep with the count together with the previous command
modDate=$(stat -c %y copy_hb_script); echo "${modDate%.*}" "$(grep -Hcw count copy_hb_script)"
Produces an output as
016-09-15 16:03:40 copy_hb_script:5

Bash shell script for finding file size

Consider:
var=`ls -l | grep TestFile.txt | awk '{print $5}'`
I am able to read file size, but how does it work?
Don't parse ls
size=$( stat -c '%s' TestFile.txt )
Yes, so basically you could divide it into 4 parts:
ls -l
List the current directory content (-l for long listing format)
| grep TestFile.txt
Pipe the result and look for the file you are interested in
| awk '{print $5}
Pipe the result to awk program which cuts (by using spaces as separator) the fifth column which happens to be the file size in this case (but this can be broken by spaces in the filename, for example)
var=`...`
The backquotes (`) enclose commands. The output of the commands gets stored in the var variable.
NOTE: You can get the file size directly by using du -b TestFile.txt or stat -c %s TestFile.txt

How to return substring from a linux command

I'm connecting to an exadata and want to get information about "ORACLE_HOME" variable inside them. So i'm using this command:
ls -l /proc/<pid>/cwd
this is the output:
2 oracle oinstall 0 Jan 23 21:20 /proc/<pid>/cwd -> /u01/app/database/11.2.0/dbs/
i need the get the last part :
/u01/app/database/11.2.0 (i dont want the "/dbs/" there)
i will be using this command several times in different machines. So how can i get this substring from whole output?
Awk and grep are good for these types of issues.
New:
ls -l /proc/<pid>/cwd | awk '{print ($NF) }' | sed 's#/dbs/##'
Old:
ls -l /proc/<pid>/cwd | awk '{print ($NF) }' | egrep -o '^.+[.0-9]'
Awk prints the last column of the input which is your ls command and then grep grabs the beginning of that string up the last occurrence of numbers and dots. This is a situational solution and perhaps not the best.
Parsing the output of ls is generally considered sub-optimal. I would use something more like this instead:
dirname $(readlink -f /proc/<pid>/cwd)

Grep - returning both the line number and the name of the file

I have a number of log files in a directory. I am trying to write a script to search all the log files for a string and echo the name of the files and the line number that the string is found.
I figure I will probably have to use 2 grep's - piping the output of one into the other since the -l option only returns the name of the file and nothing about the line numbers. Any insight in how I can successfully achieve this would be much appreciated.
Many thanks,
Alex
$ grep -Hn root /etc/passwd
/etc/passwd:1:root:x:0:0:root:/root:/bin/bash
combining -H and -n does what you expect.
If you want to echo the required informations without the string :
$ grep -Hn root /etc/passwd | cut -d: -f1,2
/etc/passwd:1
or with awk :
$ awk -F: '/root/{print "file=" ARGV[1] "\nline=" NR}' /etc/passwd
file=/etc/passwd
line=1
if you want to create shell variables :
$ awk -F: '/root/{print "file=" ARGV[1] "\nline=" NR}' /etc/passwd | bash
$ echo $line
1
$ echo $file
/etc/passwd
Use -H. If you are using a grep that does not have -H, specify two filenames. For example:
grep -n pattern file /dev/null
My version of grep kept returning text from the matching line, which I wasn't sure if you were after... You can also pipe the output to an awk command to have it ONLY print the file name and line number
grep -Hn "text" . | awk -F: '{print $1 ":" $2}'

Resources