how do I access or sort the files of a directory in another order that the alphabetical order? - linux

I run the following commands in linux on a pdf file to convert its pages to image files. However, it runs twice over the pdf file
pdftoppm -H 700 -f 30 -l 40 -png rl.pdf top
pdftoppm -y 700 -f 30 -l 40 -png rl.pdf bottom
output would be (the list of output files):
bottom-001.png
bottom-002.png
top-001.png
top-002.png
However, I want to access and process them in the following order (for ffmpeg):
top-001.png
bottom-001.png
top-002.png
bottom-002.png
To reach this goal you may suggest another way for naming the output files or run another script on the output files to sort them out.

sort -n -t- -s -k2
Sort numerically using - as separator on the second field. Stable sort so that top is on top.
Alternatively sort the first field in reverse:
sort -t- -k2n -k1r
For example the following command:
echo 'bottom-001.png
bottom-002.png
top-001.png
top-002.png' | sort -t- -k2n -k1r
outputs:
top-001.png
bottom-001.png
top-002.png
bottom-002.png

Another solution in this case is adding a suffix (in alphabetical order) to the output files of each command and move them to a new directory:
pdftoppm -H 450 -f 30 -l 40 -png rl.pdf page
for file in *.png; do
mv "$file" "out/${file%.png}_a.png"
done
pdftoppm -y 700 -f 30 -l 40 -png rl.pdf page
for file in *.png; do
mv "$file" "out/${file%.png}_b.png"
done

Variants of sorting with ls command
$ ls --help
...
-r, --reverse reverse order while sorting
-S sort by file size, largest first
--sort=WORD sort by WORD instead of name: none (-U), size (-S),
time (-t), version (-v), extension (-X)
-t sort by modification time, newest first
-u with -lt: sort by, and show, access time;
with -l: show access time and sort by name;
otherwise: sort by access time, newest first
-U do not sort; list entries in directory order
-v natural sort of (version) numbers within text
-X sort alphabetically by entry extension

Related

Linux commands to get Latest file depending on file name

I am new to linux. I have a folder with many files in it and i need to get the latest file depending on the file name. Example: I have 3 files RAT_20190111.txt RAT_20190212.txt RAT_20190321.txt . I need a linux command to move the latest file here RAT20190321.txt to a specific directory.
If file pattern remains the same then you can try below command :
mv $(ls RAT*|sort -r|head -1) /path/to/directory/
As pointed out by #wwn, there is no need to use sort, Since the files are lexicographically sortable ls should do the job already of sorting them so the command will become :
mv $(ls RAT*|tail -1) /path/to/directory
The following command works.
ls | grep -v '/$' |sort | tail -n 1 | xargs -d '\n' -r mv -- /path/to/directory
The command first splits output of ls with newline. Then sorts it, takes the last file and then it moves this to the required directory.
Hope it helps.
Use the below command
cp ls |tail -n 1 /data...

Why does du give different results?

I use du command very often. Recently, I used it to find out 5 heaviest files in root directory of my server, I used the following command:
sudo du -ah / | sort -nr | head -n 5
Result was:
1016K /var/cache/apt/archives/fonts-dejavu-core_2.35-1_all.deb
1016K /bin/bash
1008K /usr/src/linux-aws-headers-4.4.0-1052/fs
1008K /usr/src/linux-aws-headers-4.4.0-1049/fs
1004K /var/awslogs/lib/python2.7/site-packages/botocore/data/ec2/2016-09-15/
I then removed -h, and observed an entirely different result:
sudo du -a / | sort -nr | head -n 5
Result:
2551396 /
1189240 /usr
894000 /var
541836 /usr/lib
406276 /var/lib
From the man page of du,
-h, --human-readable
print sizes in human readable format (e.g., 1K 234M 2G)
According to my understanding, including or excluding of -h should not really cause change in the results, just the size format.
Could you help me understand why this would happen?
1016K is numerically greater than 2M. sort -n just extracts 1016 and 2.
Try sort -h if your sort supports it.
The sort command does not take K/M/G into account, and sorts all "K" before all "M".
Probably whith -h your directories' sizes are something like 2G or 4G, while files remain 1016K or 1008K, and 2 is smaller than 1016 hence directories get sorted out to the end of a list.
Without -h total sizes of directories (actually: directory trees) appear clearly greater than sizes of files inside, hence directories are sorted at the top of a list.

grep - limit number of files read

I have a directory with over 100,000 files. I want to know if the string "str1" exists as part of the content of any of these files.
The command:
grep -l 'str1' * takes too long as it reads all of the files.
How can I ask grep to stop reading any further files if it finds a match? Any one-liner?
Note: I have tried grep -l 'str1' * | head but the command takes just as much time as the previous one.
Naming 100,000 filenames in your command args is going to cause a problem. It probably exceeds the size of a shell command-line.
But you don't have to name all the files if you use the recursive option with just the name of the directory the files are in (which is . if you want to search files in the current directory):
grep -l -r 'str1' . | head -1
Use grep -m 1 so that grep stops after finding the first match in a file. It is extremely efficient for large text files.
grep -m 1 str1 * /dev/null | head -1
If there is a single file, then /dev/null above ensures that grep does print out the file name in the output.
If you want to stop after finding the first match in any file:
for file in *; do
if grep -q -m 1 str1 "$file"; then
echo "$file"
break
fi
done
The for loop also saves you from the too many arguments issue when you have a directory with a large number of files.

Merge sort gzipped files

I have 40 files of 2GB each, stored on an NFS architecture. Each file contains two columns: a numeric id and a text field. Each file is already sorted and gzipped.
How can I merge all of these files so that the resulting output is also sorted?
I know sort -m -k 1 should do the trick for uncompressed files, but I don't know how to do it directly with the compressed ones.
PS: I don't want the simple solution of uncompressing the files into disk, merging them, and compressing again, as I don't have sufficient disk space for that.
This is a use case for process substitution. Say you have two files to sort, sorta.gz and sortb.gz. You can give the output of gunzip -c FILE.gz to sort for both of these files using the <(...) shell operator:
sort -m -k1 <(gunzip -c sorta.gz) <(gunzip -c sortb.gz) >sorted
Process substitution substitutes a command with a file name that represents the output of that command, and is typically implemented with either a named pipe or a /dev/fd/... special file.
For 40 files, you will want to create the command with that many process substitutions dynamically, and use eval to execute it:
cmd="sort -m -k1 "
for input in file1.gz file2.gz file3.gz ...; do
cmd="$cmd <(gunzip -c '$input')"
done
eval "$cmd" >sorted # or eval "$cmd" | gzip -c > sorted.gz
#!/bin/bash
FILES=file*.gz # list of your 40 gzip files
# (e.g. file1.gz ... file40.gz)
WORK1="merged.gz" # first temp file and the final file
WORK2="tempfile.gz" # second temp file
> "$WORK1" # create empty final file
> "$WORK2" # create empty temp file
gzip -qc "$WORK2" > "$WORK1" # compress content of empty second
# file to first temp file
for I in $FILES; do
echo current file: "$I"
sort -k 1 -m <(gunzip -c "$I") <(gunzip -c "$WORK1") | gzip -c > "$WORK2"
mv "$WORK2" "$WORK1"
done
Fill $FILES the easiest way with the list of your files with bash globbing (file*.gz) or with a list of 40 filenames (separated with white blanks). Your files in $FILES stay unchanged.
Finally, the 80 GB data are compressed in $WORK1. While processing this script no uncompressed data where written to disk.
Adding a differently flavoured multi-file merge within a single pipeline - it takes all (pre-sorted) files in $OUT/uniques, sort-merges them and compresses the output, lz4 is used due to it's speed:
find $OUT/uniques -name '*.lz4' |
awk '{print "<( <" $0 " lz4cat )"}' |
tr "\n" " " |
(echo -n sort -m -k3b -k2 " "; cat -; echo) |
bash |
lz4 \
> $OUT/uniques-merged.tsv.lz4
It is true there are zgrep and other common utilities that play with compressed files, but in this case you need to sort/merge uncompressed data and compress the result.

Linux sorting "ls -al" output by date

I want to sort the output of the "ls -al" command according to date. I am able to easily do that for one column with command:
$ ls -al | sort -k6 -M -r
But how to do it for both collumn 6 and 7 simultaneously? The command:
$ ls -al | sort -k6 -M -r | sort -k7 -r
prints out results I do not understand.
The final goal would be to see all the files from the most recently modified (or v.v.).
Here is the attached example for the data to be sorted and the command used:
With sort, if you specify -k6, the key starts at field 6 and extends to the end of the line. To truncate it and only use field 6, you should specify -k6,6. To sort on multiple keys, just specify -k multiple times. Also, you need to apply the M modifier only to the month, and the n modifier to the day. So:
ls -al | sort -k 6,6M -k 7,7n -r
Do note Charles' comment about abusing ls though. Its output cannot be reliably parsed. A good demonstration of this is that the image you've posted shows the month/date in columns 4 and 5, so it's not clear why you want to sort on columns 6 and 7.
The final goal would be to see all the files from the most recently modified
ls -t
or (for reverse, most recent at bottom):
ls -tr
The ls man page describes this in more details, and lists other options.
You could try ls -lsa -it -r
sample
enter image description here

Resources