Unix: Sort 'ls' by return value of program - linux

how can I use program as a key to sort in Unix shell? In other words to sort output of 'ls' (or any other program) by return value of a program applied on each line.

I'll give two example solutions:
A one-line command that is simpler and therefore something I'd try use first.
A bash script that allows sorting a list by output from an arbitrary bash function that reads each line of the list as input.
Example 1 (without executing command on each line)
If the question is how to, in general, sort outputs of programs like ls, below is an example specific to ls that sorts by inode. However, every program may have its own idiosyncrasies when generating its output so this example may have to be adapted:
ls -ail /home/user/ | tail -n+2 | tr -s ' ' | sort -t' ' -k1,1 -g
Here are the different parts of this command broken down:
ls -ail /home/user/
Lists all (-a) files in directory /home/user/ in list (-l) format with inode (-i).
tail -n+1
Cuts off first line from ls output.
tr -s ' '
Combines (-s) multiple spaces (' ') for sort.
sort -t ' ' -k 1 -g
Sorts list by first (1) field of integers (-g) separated by one space (' ').
Example 2 (executing command with each line as input)
Here is a more adaptable example in a bash script I worked up to show how the list of files generated from ls -a1 can be fed into bash function getinode which uses stat to output the inode for each file. A while loop repeats this process for each file, saving in comma-delimited format the data by repeatedly appending a variable named OUTPUT which at the end is sorted by sort using the first field.
The important part is that the function getinode can be anything, so long as it outputs a string. I set up getinode to receive a file path as input (first argument $1) and to then output the inode to stdout via echo $INODE. The script calls getinode via $(getinode "$FILEPATH").
#!/bin/bash
# Usage: lsinodesort.sh [file]
# Refs/attrib:
# [1]: How to sort a csv file by sorting on a single field. https://stackoverflow.com/a/44744800
# [2]: How to read a while loop variable. https://stackoverflow.com/a/16854326
WORKDIR="$1" # read directory from first argument
getinode() {
# Usage: getinode [path]
INODE="$(stat "$1" --format=%i)"
echo $INODE
}
if [ -d "$WORKDIR" ]; then
LINES="$(ls -a1 "$WORKDIR")" # save `ls` output to variable LINES
else
exit 1; # not a valid directory
fi
while read line; do
path="$WORKDIR"/"$line" # Determine path.
if [ -f "$path" ]; then # Check if path is a file.
FILEPATH="$path"
FILENAME="$(basename "$path")" # Determine filename from path.
FILEINODE=$(getinode "$FILEPATH") # Get inode.
OUTPUT="$FILEINODE"",""$FILENAME""\n""$OUTPUT" ; # Append inode and file name to OUTPUT
fi
done <<< "$LINES" # See [2].
OUTPUT=$(printf "${OUTPUT}" | sort -t, -k1,1) # sort OUTPUT. See [1]
OUTPUT="inode","filename""\n""$OUTPUT"
printf "${OUTPUT}\n" # print final OUTPUT.
When I run it on my own home folder I get output like this:
inode,filename
3932162,.bashrc
3932165,.bash_logout
3932382,.zshrc
3932454,.gitconfig
3933234,.bash_aliases
3933512,.profile
3933612,.viminfo

I'm not sure to understand your question, so I'll try to rephrase it first.
If I'm not mistaken, you want to sort the output of a program (it may be ls or any other command in a Unix shell).
I'll suggest using the pipeline feature available on Unix shell.
For instance, you can sort the output of the ls command using :
ls /home | sort
This feature is available but not limited to the ls command.
By the way, there are optional flags you can use for sorting ls command results if that's your specific use case :
ls -S # for sorting by file size
ls -t # for sorting by modification time
You can also append the --reverse or -r flag for displaying the result in reverse order.
As for the sort function, there are also flags allowing to customize your result as per your needs :
sort -n # for sorting numerically instead of alphabetically
sort -k5 # for sorting based on the 5th column
sort -t "," # for using the comma as a field separator
You can combine all of them like that for sorting the output of ‘ls -l‘ command on the basis of field 2,5 (Numeric) and 9 (Non-Numeric/alphabetically).
ls -l /home/$USER | sort -t "," -nk2,5 -k9
sort function examples

Related

How to create a dynamic command in bash?

I want to have a command in a variable that runs a program and specifies the output filename for it depending on the number of files exits (to work on a new file each time).
Here is what I have:
export MY_COMMAND="myprogram -o ./dir/outfile-0.txt"
However I would like to make this outfile number increases each time MY_COMMAND is being executed. You may suppose myprogram creates the file soon enough before the next call. So the number can be retrieved from the number of files exists in the directory ./dir/. I do not have access to change myprogram itself or the use of MY_COMMAND.
Thanks in advance.
Given that you can't change myprogram — its -o option will always write to the file given on the command line, and assuming that something also out of your control is running MY_COMMAND so you can't change the way that MY_COMMAND gets called, you still have control of MY_COMMAND
For the rest of this answer I'm going to change the name MY_COMMAND to callprog mostly because it's easier to type.
You can define callprog as a variable as in your example export callprog="myprogram -o ./dir/outfile-0.txt", but you could instead write a shell script and name that callprog, and a shell script can do pretty much anything you want.
So, you have a directory full of outfile-<num>.txt files and you want to output to the next non-colliding outfile-<num+1>.txt.
Your shell script can get the numbers by listing the files, cutting out only the numbers, sorting them, then take the highest number.
If we have these files in dir:
outfile-0.txt
outfile-1.txt
outfile-5.txt
outfile-10.txt
ls -1 ./dir/outfile*.txt produces the list
./dir/outfile-0.txt
./dir/outfile-1.txt
./dir/outfile-10.txt
./dir/outfile-5.txt
(using outfile and .txt means this will work even if there are other files not name outfile)
Scrape out the number by piping it through the stream editor sed … capture the number and keep only that part:
ls -1 ./dir/outfile*.txt | sed -e 's:^.*dir/outfile-\([0-9][0-9]*\)\.txt$:\1:'
(I'm using colon : instead of the standard slash / so I don't have to escape the directory separator in dir/outfile)
Now you just need to pick the highest number. Sort the numbers and take the top
| sort -rn | head -1
Sorting with -n is numeric, not lexigraphic sorting, -r reverses so the highest number will be first, not last.
Putting it all together, this will list the files, edit the names keeping only the numeric part, sort, and get just the first entry. You want to assign that to a variable to work with it, so it is:
high=$(ls -1 ./dir/outfile*.txt | sed -e 's:^.*dir/outfile-\([0-9][0-9]*\)\.txt$:\1:' | sort -rn | head -1)
In the shell (I'm using bash) you can do math on that, $[high + 1] so if high is 10, the expression produces 11
You would use that as the numeric part of your filename.
The whole shell script then just needs to use that number in the filename. Here it is, with lines broken for better readability:
#!/bin/sh
high=$(ls -1 ./dir/outfile*.txt \
| sed -e 's:^.*dir/outfile-\([0-9][0-9]*\)\.txt$:\1:' \
| sort -rn | head -1)
echo "myprogram -o ./dir/outfile-$[high + 1].txt"
Of course you wouldn't echo myprogram, you'd just run it.
you could do this in a bash function under your .bashrc by using wc to get the number of files in the dir and then adding 1 to the result
yourfunction () {
dir=/path/to/dir
filenum=$(expr $(ls $dir | wc -w) + 1)
myprogram -o $dir/outfile-${filenum}.txt
}
this should get the number of files in $dir and append 1 to that number to get the number you need for the filename. if you place it in your .bashrc or under .bash_aliases and source .bashrc then it should work like any other shell command
You can try exporting a function for MY_COMMAND to run.
next_outfile () {
my_program -o ./dir/outfile-${_next_number}.txt
((_next_number ++ ))
}
export -f next_outfile
export MY_COMMAND="next_outfile" _next_number=0
This relies on a "private" global variable _next_number being initialized to 0 and not otherwise modified.

bash - Diff a command with a file (specific)

so its pretty hard to describe for me what I want to do, but I'll try it:
(Because of some private information I changed the names)
I want to "diff" a command output with a text file created from me.
The command output looks like:
'Blabla1' '12.34.56.78' (24 objects + dependencies), STATUS: 'RUNNING'
'Blabla3' '12.34.56.89' (89 objects + dependencies), STATUS: 'RUNNING'
And the txtfile:
Blabla1
Blabla2
If it finds Blabla1 anywhere in the command output its fine. But you see, he will not find Blabla2 anywhere in the command output and this difference I want as an output.
I hope you understand what I mean and you could possible help me.
Greetings,
Can
UPDATE::::
#hek2mgl
So my command is:
./factory.sh listapplications | grep -i running
This command shows this:
'ftp' '1' (7 objects + dependencies), STATUS: 'RUNNING' - 'XSD Da
'abc' '5.1.0' (14 objects + dependencies), STATUS: 'RUNNING' - '2017-10-13: Fix fuer Bug 2150'
'name' '1.0.2' (5 objects + dependencies), STATUS: 'RUNNING'
And I want to compare that output with my textfile:
ftp
abc
name
missing
alsomissing
So if I compare this 2 now it should check if he finds the words from my textfile ANYWHERE in the command output. If it does find it anywhere -> not output.
And as you see he'll not find "missing" and "alsomissing". I want this two as an output at the end.
What you might be interested in is grep in combination with 'process substitution'. If your file with patterns is file.txt and your command to execute is cmd then you can use
grep -o -F -f file.txt <(cmd) | grep -v -F -f - file.txt
This will output the patterns is file.txt which are not matched in the output of cmd.
In case of the Blabla example, the above line will output
Blabla2
How it works is the following. The first part will search for all patterns listed in file.txt in the output of cmd and will only output the matched parts. This means that
% grep -o -F -f file.txt <(cmd)
Blabla1
This output is now piped to another command that will try to find all lines in file.txt which do not match any of the patterns comming from the pipe (-f -)
% grep -o -F -f file.txt <(cmd) | grep -v -F -f - file.txt
Blabla2
So ... this seems to do it, using bash process substitution:
$ cat file1
'Blabla1' '12.34.56.78' (24 objects + dependencies), STATUS: 'RUNNING'
'Blabla3' '12.34.56.89' (89 objects + dependencies), STATUS: 'RUNNING'
$ cat file2
Blabla1
Blabla2
$ grep -vFf <(awk '{gsub(/[^[:alnum:]]/,"",$1);print $1}' file1) file2
Blabla2
The awk script takes the first field, strips non-alphanumeric characters from it (i.e. the single quotes) and outputs just that first field. The grep option -f uses the "virtual" file created by the aforementioned process substitution as a list of fixed strings to search for within the input file (file2), and the -v reverses the search, showing you only what was not found.
If the regex in the gsub() is too greedy, you might replace it with something like $1=substr($1,2,length($1)-2).
You could alternately do this in (POSIX) awk alone, without relying on bash process substitution:
$ awk 'NR==FNR{a[substr($1,2,length($1)-2)];next} $1 in a{next} 1' file1 file2
Blabla2
This reads the stripped first field of file1 into the keys of an array, then for each line of file2 checks for the existence of that key in the array, skipping lines that match and printing any left over. (The 1 at the end of the script is short-hand for "print this line".)
You can also use awk only:
awk '
# Store patterns of text.file in an array (p)atterns.
# Initialize their count of occurrence with 0
NR==FNR{
p[$0]=0
next
}
# Replace the quotes around BlaBla... in cmd output.
# Increase the count of occurrence of the pattern
{
gsub("'\''", "")
p[$1]++
}
# At the end of the input print those patterns which
# did not appear in cmd output, meaning their count of
# occurrence is zero.
END{
for(i in p){
if(p[i]==0){
print i
}
}
}' text.file cmd.txt
PS: Alternatively you use process substitution instead of storing the command output in a file. Replace cmd.txt by <(cmd) then.

How can we increment a string variable within a for loop

#! /bin/bash
for i in $(ls);
do
j=1
echo "$i"
not expected Output:-
autodeploy
bin
config
console-ext
edit.lok
need Output like below if give input 2 it should print "bin" based on below condition, but I want out put like Directory list
1.)autodeploy
2.)bin
3.)config
4.)console-ext
5.)edit.lok
and if i like as input:- 2 then it should print "bin"
Per BashFAQ #1, a while read loop is the correct way to read content line-by-line:
#!/usr/bin/env bash
enumerate() {
local line i
i=0
while IFS= read -r line; do
((++i))
printf '%d.) %s\n' "$i" "$line"
done
}
ls | enumerate
However, ls is not an appropriate tool for programmatic use; the above is acceptable if the results of ls are only for human consumption, but not if they're going to be parsed by a machine -- see Why you shouldn't parse the output of ls(1).
If you want to list files and let the user choose among them by number, pass the results of a glob expression to select:
select filename in *; do
echo "$filename" && break
done
I don't understand what you mean in your question by like Directory list, but following your example, you do not need to write a loop:
ls|nl -s '.)' -w 1
If you want to avoid ls, you can do the following (but be careful - this only works if the directory entries do not contain white spaces (because this would make fmt to break them into two lines):
echo *|fmt -w 1 |nl -s '.)' -w 1

Recursive search grep

I'm trying to search through HDFS for parquet files and list them out. I'm using this, which works great. It looks through all of the subdirectories in /sources.works_dbo and gives me all the parquet files:
hdfs dfs -ls -R /sources/works_dbo | grep ".*\.parquet$"
However; I just want to return the first file it encounters per subdirectory, so that each subdirectory only appears on a single line in my output. Say I had this:
sources/works_dbo/test1/file1.parquet
sources/works_dbo/test1/file2.parquet
sources/works_dbo/test2/file3.parquet
When I run my command I expect the output to look like this:
sources/works_dbo/test1/file1.parquet
sources/works_dbo/test2/file3.parquet
... | awk '!seen[gensub(/[^/]+$/,"",1)]++' file
sources/works_dbo/test1/file1.parquet
sources/works_dbo/test2/file3.parquet
The above uses GNU awk for gensub(), with other awks you'd use a variable and sub():
awk '{path=$0; sub(/[^/]+$/,"",path)} !seen[path]++'
It will work for any mixture of any length of paths.
You can use sort -u (unique) with / as the delimiter and using the first three fields as key. The -s option ("stable") makes sure that the file retained is the first one encountered for each subdirectory.
For this input
sources/works_dbo/test1/file1.parquet
sources/works_dbo/test1/file2.parquet
sources/works_dbo/test2/file3.parquet
the result is
$ sort -s -t '/' -k 1,3 -u infile
sources/works_dbo/test1/file1.parquet
sources/works_dbo/test2/file3.parquet
If the subdirectories are of variable length, this awk solution may come in handy:
hdfs dfs -ls -R /sources/works_dbo | awk '
BEGIN{FS="/"; OFS="/";}
{file=$NF; // file name is always the last field
$NF=""; folder=$0; // chomp off the last field to cache folder
if (!(folder in seen_dirs)) // cache the first file per folder
seen_dirs[folder]=file;
}
END{
for (f in seen_dirs) // after we've processed all rows, print our cache
print f,seen_dirs[f];
}'
Using Perl:
hdfs dfs -ls -R /sources/works_dbo | grep '.*\.parquet$' | \
perl -MFile::Basename -nle 'print unless $h{ dirname($_) }++'
In the perl command above:
-M loads File::Basename module;
-n causes Perl to apply the expression passed via -e for each input line;
-l preserves the line terminator;
$_ is the default variable keeping the currently read line;
dirname($_) returns the directory part for the path specified by $_;
$h is a hash where keys are directory names, and values are integers 0, 1, 2 etc;
the line is printed to the standard output, unless the directory name is seen in the previous iterations, i.e. the hash value $h{ dirname($_) } is non-zero.
By the way, instead of piping the result of hdfs dfs -ls -R via grep, you can use the find command:
hdfs dfs -find /sources/works_dbo -name '*.parquet'

How to escape square brackets in a ls output

I'm experiencing some problems to escape square brackets in any file name.
I need to compare two list. The ls output is the first list and the second is the ARQ02.
#!/bin/bash
exec 3< <(ls /home/lint)
while read arq <&3; do
var=`grep -e "$arq" ARQ02`
if [ "$?" -ne 0 ] ; then
echo "$arq" >> result
fi
done
exec 3<&-
Sorry for my bad english.
Your immediate problem is that you must instruct grep to interpret the search term as a literal rather than a regular expression, using the -F option:
var=$(grep -Fe "$arq" ARQ02)
That way, any regex metacharacters that happen to be in the output from ls /home/lint - such as [ and ] - will still be treated as literals and won't break the grep invocation.
That said, it looks like your command could be streamlined, such as by using the output from ls /home/lint directly as the set of search strings to pass to grep at once, using the -f option:
grep -Ff <(ls /home/lint) ARQ02 > result
<(...) is a so-called process substitution, which, simply put, presents the output from a command as if it were a (temporary) file, which is what -f expects: a file containing the search terms for grep.
Alternatively, if:
the lines of ARQ02 contain only filenames that fully match (some of) the filenames in the output from ls /home/lint, and
you don't mind sorting or want to sort the matches stored in result,
consider HuStmpHrrr's helpful answer.
i have to assume my interpretation is correct. based on that, i can raise a oneliner easily solve your solution. there are 2 assumption i need to make here: your file name doesn't contain carriage return and you are using modern bash:
comm -23 <(printf "%s\n" * | sort) <(sort ARQ02)
in bash <() emits a subshell and pipe the stdout as a file. comm is the command to compute difference of 2 input stream.
to explain in details,
comm
-23 # suppress files unique in ARQ02 and files in common
<(printf "%s\n" * | # print all the files in local folder with new line breaker
sort) # sort them
<(sort ARQ02)
it's necessary to sort as comm only compare incrementally.

Resources