String delimiter is affecting some instances of its individual characters - linux

So basically, I'm trying to print out a human readable list of file and directory sizes (without the current directory listed) with the following code
du -arh | sort -nr | tail -n +2 | awk -F"./" '{print $1 $2 $3}' | head -n $NUM
The NUM variable is just an argument given for the amount of items listed.
The output of the above without the awk delimiter command is
4.0K ./url-list
4.0K ./testurl.sh
4.0K ./diskhogger.sh
4.0K ./backup/url-list
4.0K ./backup.sh
However, adding the awk command outputs
4.0K url-list
4.0K testurl.sh
4.0K diskhogger.sh
4.0K backuurl-list
4.0K backup.sh
A similar output occurs whenever there is a notable subdirectory.
./Library/Cache - LibrarCache, etc.
To be clear, I am attempting to cut out the "./" at the beginning of the file name without affecting the other forward slashes. My preferred output would be:
4.0K url-list
4.0K testurl.sh
4.0K diskhogger.sh
4.0K backup/url-list
4.0K backup.sh
where the "backup/url-list" isn't affected.
Is the '.' in my delimiter a special character I don't know about? If not, what exactly is going on here?
I'm new to shell so any info on this would be great.
Thanks!
Edited for clarity.

you can get rid of some other commands if you're using awk
du -arh | sort -nr | awk -v len="$NUM" 'NR>1{gsub("\\.\\/",""); print} NR==len{exit}'
or simply use
... | sed 's_\./__' | head ...

-F"./" - is treated as "any character followed by forward slash".It's taken as regex pattern where . means "any character" in terms of regex expressions. To use "dot followed by slash" ./ as a delimiter use one of the following approaches:
. within character class
awk -F"[.]/" '{print $1,$2,$3}'
. escaped
awk -F'\\./' '{print $1,$2,$3}'

Yes. awk treats the delimiter as a regular expression. A "." in a regular expression matches any character. Hence awk will split your lines up everywhere there is a character preceding a "/".
If you wish to match a literal ".", the easiest way of doing it is to put it in square brackets, to make it a character class, matching only the character "."
You end up with
du -arh | sort -nr | tail -n +2 | awk -F"[.]/" '{print $1 $2}' | head -n $NUM
However, note that this is not your only problem. If you have a directory whose name ends with a "." and that directory contains files, you will have more than one entry of "./" on some lines in the results from "du". (e.g. a file named "bar" in a directory named "foo." gives you "foo./bar".) A better solution is therefore to use the sub() function in awk to replace the first instance of "./" with "".
du -arh | sort -nr | tail -n +2 | awk '{sub("./",""); print}' | head -n $NUM

Related

Remove multiple spaces in ls -l output

I need to display the filesize and the filename. Like this:
4.0K Desktop
I'm extracting these two fields using cut from the ls -l output:
ls -lhS | cut -d' ' -f5,9
Due to multiple spaces in the ls -l output, I'm getting a few erroneous outputs, like:
4.0K 19:54
4.0K 19:55
6
18:39
31
25
How should I fix this?
I need to accomplish this task using pipes only and no bash scripting ( output could be multiple pipes ) and preferably no sed, awk.
If no alternative to sed or awk is available- use of sed is OK.
You can avoid parsing ls output and use the stat command which comes as part of GNU coreutils in bash for detailed file information.
# -c --format=FORMAT
# use the specified FORMAT instead of the default; output a newline after each use of FORMAT
# %n File name
# %s Total size, in bytes
stat -c '%s %n' *
You can use translate character command before using cut.
ls -lhS | tr -s ' ' | cut -d' ' -f 5,9
Or you could just submit to awk:
$ ls -lhS | awk '$0=$5 OFS $9'
ie. replace whole record $0 with fields $5 and $9 separated by output field separator OFS.

Bash shell script for finding file size

Consider:
var=`ls -l | grep TestFile.txt | awk '{print $5}'`
I am able to read file size, but how does it work?
Don't parse ls
size=$( stat -c '%s' TestFile.txt )
Yes, so basically you could divide it into 4 parts:
ls -l
List the current directory content (-l for long listing format)
| grep TestFile.txt
Pipe the result and look for the file you are interested in
| awk '{print $5}
Pipe the result to awk program which cuts (by using spaces as separator) the fifth column which happens to be the file size in this case (but this can be broken by spaces in the filename, for example)
var=`...`
The backquotes (`) enclose commands. The output of the commands gets stored in the var variable.
NOTE: You can get the file size directly by using du -b TestFile.txt or stat -c %s TestFile.txt

get the first word as result of ls -l

I need to use ls -l and I would like to have as result just the first word of the file name for instance for a result like this
-rw-r--r-- 1 root root 9 Sep 21 23:11 best file 1.txt
I would like to have only
best
as result because I need to put this value into a variable. It is ok as well if there is another way instead of using ls -l.
...sorry to bother you again...if the file is under a sub-directory, how can I hide the folder from the result? Thanks
You don't need to use ls -l (L).
Instead, use ls -1 (number one), that just outputs the names of the files, and then filter out the first column with cut:
ls -1 | cut -d' ' -f1
^
number one, not letter L
To store the value into a variable, do:
var=$(ls -1 | cut -d' ' -f1)
Note it is not a good thing to parse ls: the number of columns may vary, etc. You can read more about the topic in Why you shouldn't parse the output of ls
Update
Note there is no even need to use -1 (one), ls alone suffices:
ls | cut -d' ' -f1
As BroSlow comments below, "because they are EOL (end of line) separated across a pipe".
If you have only one row to output, this will work fine:
var=`ls -l | awk '{ print $9 }'`
echo ${var}
Or you need to use grep to filter your output for the correct file.
set -- $(ls -l)
echo ${11} # Assumes the file is the FIRST one listed.
Should do the trick. But I'm not sure if that's really what you want. For one thing, ls -l also prints an extra header line. Why do you say that you need to use ls -l? If you could state the actual problem, maybe we can find a much better solution together...
awk can pick the first word for you;
ls | awk '{print $1}'
Try:
ls -al|awk 'NR==4{ print $9 }'
Row number 4 will have first line of files. $9 indicates column 9 which will have desired word.

How to return substring from a linux command

I'm connecting to an exadata and want to get information about "ORACLE_HOME" variable inside them. So i'm using this command:
ls -l /proc/<pid>/cwd
this is the output:
2 oracle oinstall 0 Jan 23 21:20 /proc/<pid>/cwd -> /u01/app/database/11.2.0/dbs/
i need the get the last part :
/u01/app/database/11.2.0 (i dont want the "/dbs/" there)
i will be using this command several times in different machines. So how can i get this substring from whole output?
Awk and grep are good for these types of issues.
New:
ls -l /proc/<pid>/cwd | awk '{print ($NF) }' | sed 's#/dbs/##'
Old:
ls -l /proc/<pid>/cwd | awk '{print ($NF) }' | egrep -o '^.+[.0-9]'
Awk prints the last column of the input which is your ls command and then grep grabs the beginning of that string up the last occurrence of numbers and dots. This is a situational solution and perhaps not the best.
Parsing the output of ls is generally considered sub-optimal. I would use something more like this instead:
dirname $(readlink -f /proc/<pid>/cwd)

how to compare output of two ls in linux

So here is the task which I can't solve. I have a directory with .h files and a directory with .i files, which have the same names as the .h files. I want just by typing a command to have all .h files which are not found as .i files. It's not a hard problem, I can do it in some programming language, but I'm just curious how it will look like in cmd :). To be more specific here is the algo:
get file names without extensions from ls *.h
get file names without extensions from ls *.i
compare them
print all names from 1 that are not met in 2
Good luck!
diff \
<(ls dir.with.h | sed 's/\.h$//') \
<(ls dir.with.i | sed 's/\.i$//') \
| grep '$<' \
| cut -c3-
diff <(ls dir.with.h | sed 's/\.h$//') <(ls dir.with.i | sed 's/\.i$//') executes ls on the two directories, cuts off the extensions, and compares the two lists. Then grep '$<' finds the files that are only in the first listing, and cut -c3- cuts off the "< " characters that diff inserted.
ls ./dir_h/*.h | sed -r -n 's:.*dir_h/([^.]*).h$:dir_i/\1.i:p' | xargs ls 2>&1 | \
grep "No such file or directory" | awk '{print $4}' | sed -n -r 's:dir_i/([^:]*).*:dir_h/\1:p'
ls -1 dir1/*.hh dir2/*.ii | awk -F"/" '{print $NF}' |awk -F"." '{a[$1]++;b[$0]}END{for(i in a)if(a[i]==1 && b[i".hh"]) print i}'
explanation:
ls -1 dir1/*.hh dir2/*.ii
above will list all the files *.hh and *.ii files in both the directories.
awk -F"/" '{print $NF}'
above will just print the file name excluding the complete path of the file.
awk -F"." '{a[$1]++;b[$0]}END{for(i in a)if(a[i]==1 && b[i".hh"]) print i}'
above will create two associative arrays one with file name and one with excluding the extension.
if both hh and ii files exist the value in the assosciative array will 2 if there is only one file then the value will be 1.so we need array item whose value is 1 and it should be a header file (.hh).
this can be checked using the asso..array b which is done in the END block.
Assuming bash is your shell:
for file in $( ls dir_with_h/*.h ); do
name=${file%\.h}; # trim trailing ".h" file extension
name=${name#dir_with_h/}; # trim leading folder name
if [ ! -e dir_with_i/${name}.i ]; then
echo ${name};
fi
done
Undoubtedly this can be ported to virtually all other shells. I find this less cryptic than some other approaches (although this is surely my problem) but it is a little wordy. As such. a shell script might help recall it.

Resources