Shell Script to extract only file name - linux

I have file format like this (publishfile.txt)
drwxrwx---+ h655201 supergroup 0 2019-04-24 09:16 /data/xyz/invisible/se/raw_data/OMEGA
drwxrwx---+ h655201 supergroup 0 2019-04-24 09:16 /data/xyz/invisible/se/raw_data/sample
drwxrwx---+ h655201 supergroup 0 2019-04-24 09:16 /data/xyz/invisible/se/raw_data/sample(1)
I just want to extract the name OMEGA****, sample, sample(1) How
can I do that I have used basename in my code but it doesn't work in for loop. Here is my sample code
for line in $(cat $BASE_PATH/publishfile.txt)
do
FILE_PATH=$(echo "line"| awk '{print $NF}' )
done
FILE_NAME=($basename $FILEPATH)
But this code also doesn't wor when used outside for loop

awk -F / '{ print $NF }' "$BASE_PATH"/publishfile.txt
This simply says that the delimiter is a slash and we want the last field from each line.
You should basically never run Awk on each input line in a shell while read loop (let alone a for loop); Awk itself does this by default, much faster and better than the shell.

In your code above you have a typo. Your code reads:
FILE_NAME=($basename $FILEPATH)
but it should read
FILE_NAME=$(basename $FILEPATH)
That should work fine in or outside of a loop

Try this:
cat $BASE_PATH/publishfile.txt | awk '{print $7}' | sed 's/.*\///'
the output will be:
OMEGA
sample
sample(1)
UPDATE: I guess cat x.txt | sed 's/.*\///' will still work, if all your files, folders contain at least 1 slash (/).
For the commands used, the manuals are: cat, awk, sed

Related

String delimiter is affecting some instances of its individual characters

So basically, I'm trying to print out a human readable list of file and directory sizes (without the current directory listed) with the following code
du -arh | sort -nr | tail -n +2 | awk -F"./" '{print $1 $2 $3}' | head -n $NUM
The NUM variable is just an argument given for the amount of items listed.
The output of the above without the awk delimiter command is
4.0K ./url-list
4.0K ./testurl.sh
4.0K ./diskhogger.sh
4.0K ./backup/url-list
4.0K ./backup.sh
However, adding the awk command outputs
4.0K url-list
4.0K testurl.sh
4.0K diskhogger.sh
4.0K backuurl-list
4.0K backup.sh
A similar output occurs whenever there is a notable subdirectory.
./Library/Cache - LibrarCache, etc.
To be clear, I am attempting to cut out the "./" at the beginning of the file name without affecting the other forward slashes. My preferred output would be:
4.0K url-list
4.0K testurl.sh
4.0K diskhogger.sh
4.0K backup/url-list
4.0K backup.sh
where the "backup/url-list" isn't affected.
Is the '.' in my delimiter a special character I don't know about? If not, what exactly is going on here?
I'm new to shell so any info on this would be great.
Thanks!
Edited for clarity.
you can get rid of some other commands if you're using awk
du -arh | sort -nr | awk -v len="$NUM" 'NR>1{gsub("\\.\\/",""); print} NR==len{exit}'
or simply use
... | sed 's_\./__' | head ...
-F"./" - is treated as "any character followed by forward slash".It's taken as regex pattern where . means "any character" in terms of regex expressions. To use "dot followed by slash" ./ as a delimiter use one of the following approaches:
. within character class
awk -F"[.]/" '{print $1,$2,$3}'
. escaped
awk -F'\\./' '{print $1,$2,$3}'
Yes. awk treats the delimiter as a regular expression. A "." in a regular expression matches any character. Hence awk will split your lines up everywhere there is a character preceding a "/".
If you wish to match a literal ".", the easiest way of doing it is to put it in square brackets, to make it a character class, matching only the character "."
You end up with
du -arh | sort -nr | tail -n +2 | awk -F"[.]/" '{print $1 $2}' | head -n $NUM
However, note that this is not your only problem. If you have a directory whose name ends with a "." and that directory contains files, you will have more than one entry of "./" on some lines in the results from "du". (e.g. a file named "bar" in a directory named "foo." gives you "foo./bar".) A better solution is therefore to use the sub() function in awk to replace the first instance of "./" with "".
du -arh | sort -nr | tail -n +2 | awk '{sub("./",""); print}' | head -n $NUM

Remove multiple spaces in ls -l output

I need to display the filesize and the filename. Like this:
4.0K Desktop
I'm extracting these two fields using cut from the ls -l output:
ls -lhS | cut -d' ' -f5,9
Due to multiple spaces in the ls -l output, I'm getting a few erroneous outputs, like:
4.0K 19:54
4.0K 19:55
6
18:39
31
25
How should I fix this?
I need to accomplish this task using pipes only and no bash scripting ( output could be multiple pipes ) and preferably no sed, awk.
If no alternative to sed or awk is available- use of sed is OK.
You can avoid parsing ls output and use the stat command which comes as part of GNU coreutils in bash for detailed file information.
# -c --format=FORMAT
# use the specified FORMAT instead of the default; output a newline after each use of FORMAT
# %n File name
# %s Total size, in bytes
stat -c '%s %n' *
You can use translate character command before using cut.
ls -lhS | tr -s ' ' | cut -d' ' -f 5,9
Or you could just submit to awk:
$ ls -lhS | awk '$0=$5 OFS $9'
ie. replace whole record $0 with fields $5 and $9 separated by output field separator OFS.

get the first word as result of ls -l

I need to use ls -l and I would like to have as result just the first word of the file name for instance for a result like this
-rw-r--r-- 1 root root 9 Sep 21 23:11 best file 1.txt
I would like to have only
best
as result because I need to put this value into a variable. It is ok as well if there is another way instead of using ls -l.
...sorry to bother you again...if the file is under a sub-directory, how can I hide the folder from the result? Thanks
You don't need to use ls -l (L).
Instead, use ls -1 (number one), that just outputs the names of the files, and then filter out the first column with cut:
ls -1 | cut -d' ' -f1
^
number one, not letter L
To store the value into a variable, do:
var=$(ls -1 | cut -d' ' -f1)
Note it is not a good thing to parse ls: the number of columns may vary, etc. You can read more about the topic in Why you shouldn't parse the output of ls
Update
Note there is no even need to use -1 (one), ls alone suffices:
ls | cut -d' ' -f1
As BroSlow comments below, "because they are EOL (end of line) separated across a pipe".
If you have only one row to output, this will work fine:
var=`ls -l | awk '{ print $9 }'`
echo ${var}
Or you need to use grep to filter your output for the correct file.
set -- $(ls -l)
echo ${11} # Assumes the file is the FIRST one listed.
Should do the trick. But I'm not sure if that's really what you want. For one thing, ls -l also prints an extra header line. Why do you say that you need to use ls -l? If you could state the actual problem, maybe we can find a much better solution together...
awk can pick the first word for you;
ls | awk '{print $1}'
Try:
ls -al|awk 'NR==4{ print $9 }'
Row number 4 will have first line of files. $9 indicates column 9 which will have desired word.

How to return substring from a linux command

I'm connecting to an exadata and want to get information about "ORACLE_HOME" variable inside them. So i'm using this command:
ls -l /proc/<pid>/cwd
this is the output:
2 oracle oinstall 0 Jan 23 21:20 /proc/<pid>/cwd -> /u01/app/database/11.2.0/dbs/
i need the get the last part :
/u01/app/database/11.2.0 (i dont want the "/dbs/" there)
i will be using this command several times in different machines. So how can i get this substring from whole output?
Awk and grep are good for these types of issues.
New:
ls -l /proc/<pid>/cwd | awk '{print ($NF) }' | sed 's#/dbs/##'
Old:
ls -l /proc/<pid>/cwd | awk '{print ($NF) }' | egrep -o '^.+[.0-9]'
Awk prints the last column of the input which is your ls command and then grep grabs the beginning of that string up the last occurrence of numbers and dots. This is a situational solution and perhaps not the best.
Parsing the output of ls is generally considered sub-optimal. I would use something more like this instead:
dirname $(readlink -f /proc/<pid>/cwd)

How to get the latest filename alone in a directory?

I am using
ls -ltr /homedir/mydirectory/work/ |tail -n 1|cut -d ' ' -f 10
But this is a very crude way of getting the desired result.And also its unreliable.
The output I get on simply executing
ls -ltr /homedir/mydirectory/work/ |tail -n 1
is
-rw-r--r-- 1 user pusers 1764 Apr 1 12:06 firstfile.xml
So here I get the file name.
But if the output on doing the above command is like
-rw-r--r-- 100 user pusers 1764 Apr 1 12:06 firstfile.xml
the first command fails ! And understandably as I am cutting the result from the 10th character which does not hold valid now.
So how to refine it.
Why do you use the -l flag for ls if you don't need it? Make ls simply output the filenames if you don't need more information instead of trying to "parse" its non-unified output (raping poor text processing utilities...).
LAST_MODIFIED_FILE=`ls -tr | tail -n 1`
If you really want to achieve this using your method, then, use awk instead of cut
ls -ltr /var/log/ |tail -n 1| awk '{print $9}'
Extended user user529758 answer which can give result as per file name
use below commnad as per the file name
ls -tr Filename* | tail -n 1

Resources