Unusual behaviour of linux's sort command - linux

On Linux shell the result of echo -e "arrays2 2\narrays 2\narrays3 2" | sort is
arrays 2
arrays2 2
arrays3 2
and the result of echo -e "arrays2 28\narrays 28\narrays3 28" | sort is
arrays2 28
arrays 28
arrays3 28
Why in the second case the string arrays2 28 appears on first line.
Is this a bug, or I miss something?
I tried this on RHEL4 and Ubuntu 11.04.
Thanks.

The behaviour is locale-dependent:
echo -e "arrays2 28\narrays 28\narrays3 28" | LANG=C sort
prints
arrays 28
arrays2 28
arrays3 28
While
echo -e "arrays2 28\narrays 28\narrays3 28" | LANG=de_DE.UTF-8 sort
prints
arrays2 28
arrays 28
arrays3 28
(Note that the locale must be installed for this to have this effect, if the locale doesn't exist, the behaviour will be the same as with LANG=C).

If you change the locale from en_US.utf8 to the old default, it works the way you expect:
echo -e "aaa\nfoo\narrays2 28\narrays 28\narrays3 28" | LC_ALL=C sort -
aaa
arrays 28
arrays2 28
arrays3 28
foo

Related

converting 4 digit year to 2 digit in shell script

I have file as:
$cat file.txt
1981080512 14 15
2019050612 17 18
2020040912 19 95
Here the 1st column represents dates as YYYYMMDDHH
I would like to write the dates as YYMMDDHH. So the desire output is:
81080512 14 15
19050612 17 18
20040912 19 95
My script:
while read -r x;do
yy=$(echo $x | awk '{print substr($0,3,2)}')
mm=$(echo $x | awk '{print substr($0,5,2)}')
dd=$(echo $x | awk '{print substr($0,7,2)}')
hh=$(echo $x | awk '{print substr($0,9,2)}')
awk '{printf "%10s%4s%4s\n",'$yy$mm$dd$hh',$2,$3}'
done < file.txt
It is printing
81080512 14 15
81080512 17 18
Any help please. Thank you.
Please don't kill me for this simple answer, but what about this:
cut -c 3- file.txt
You simply cut the first two digits by showing character 3 till the end of every line (the -c switch indicates that you need to cut characters (not bytes, ...)).
You can do it using single GNU AWK's substr as follows, let file.txt content be then
1981080512 14 15
2019050612 17 18
2020040912 19 95
then
awk '{$1=substr($1,3);print}' file.txt
output
81080512 14 15
19050612 17 18
20040912 19 95
Explanation: I used substr function to get 3rd and onward characters from 1st column and assign it back to said column, then I print such changed line.
(tested in gawk 4.2.1)

Remove first n "words" from string variable in Bash

I want to remove the first 4 words from my string variable "DATES".
Does someone have a simple solution for this?
Here my example:
DATES="31 May 2021 10:22:01 30 May 2021 10:23:01 29 May 2021 10:24:01"
WC=$(echo $DATES | wc -w)
DATE_COUNT=$(( $WC / 4 - 1 ))
for i in {0..$DATE_COUNT}
do
YEAR=$(echo $DATES | awk '{print $3}')
MONTH=$(echo $DATES | awk '{print $2}')
MONTH=$( date --date="$(printf "01 %s" $MONTH)" +"%m")
DAY=$(echo $DATES | awk '{print $1}')
TIME=$(echo $DATES | awk '{print $4}' | sed 's/://g')
DATE_ARRAY[$i]="$YEAR$MONTH$DAY$TIME"
#Remove first 4 words from string
done
Use cut.
DATES="31 May 2021 10:22:01 30 May 2021 10:23:01 29 May 2021 10:24:01"
echo $DATES | cut -d' ' -f 5-
Output:
30 May 2021 10:23:01 29 May 2021 10:24:01
You can even use it for a cleaner solution than awk, like this:
YEAR=$(echo $DATES | cut -d' ' -f 3)
General version to remove n first words
remove_n_first_words(){
echo $2 | cut -d' ' -f $(($1+1))-
}
remove_n_first_words 4 "$DATES"
Using bash regex operator =~:
$ [[ $DATES =~ ^(([^ ]+ +){4})(.*) ]] && echo ${BASH_REMATCH[3]}
30 May 2021 10:23:01 29 May 2021 10:24:01
Maybe use read ?
DATES="31 May 2021 10:22:01 30 May 2021 10:23:01 29 May 2021 10:24:01"
read -ra dates <<< "$DATES"; echo "${dates[#]:4}"
Or just store the data in an array directly.
DATES=(31 May 2021 10:22:01 30 May 2021 10:23:01 29 May 2021 10:24:01)
echo "${DATES[#]:4}"
To get the total words/elements like with wc -c
echo "${#DATES[*]}"

Illogical number priority in file names in BASH [duplicate]

This question already has answers here:
How to loop over files in natural order in Bash?
(7 answers)
Closed 1 year ago.
It so happens that I wrote a script in BASH, part of which is supposed to take files from a specified directory in numerical order. Obviously, files in that directory are named as follows: 1, 2, 3, 4, 5, etc. The thing is, I discovered that while running this script with 10 files in the directory, something that appears quite illogical to me, occurs, as the script takes files in strange order: 10, 1, 2, 3, etc.
How do I make it run from minimum value of name of a file to maximum in decimals?
Also, I am using the following line of code to define loop and path:
for file in /dir/*
Don't know if it matters, but I'm using Fedora 33 as OS.
Directories are sorted by alphabetical order. So "10" is before "2".
If I list 20 files whose names correspond to the 20 first integers, I get:
1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9
I can call the function 'sort -n' so I'll sort them numerically rather than alphabetically. The following command:
for i in $(ls | sort -n) ; do echo $i ; done
produces the following output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
i.e. your command:
for file in /dir/*
should be rewritten:
for file in "dir/"$(ls /dir/* | sort -n)
If you have GNU sort then use the -V flag.
for file in /dir/* ; do echo "$file" ; done | sort -V
Or store the data in an array.
files=(/dir/*); printf '%s\n' "${files[#]}" | sort -V
As an aside, if you have the option and work once ahead of time is preferable to sorting every time, you could also format the names of your directories with leading zeroes. This is frequently a better design when possible.
I made both for some comparisons.
$: echo [0-9][0-9]/ # perfect list based on default string sort
00/ 01/ 02/ 03/ 04/ 05/ 06/ 07/ 08/ 09/ 10/ 11/ 12/ 13/ 14/ 15/ 16/ 17/ 18/ 19/ 20/
That also filters out any non-numeric names, and any non-directories.
$: for d in [0-9][0-9]/; do echo "${d%/}"; done
00
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
If I show both single- and double-digit versions (I made both)
$: shopt -s extglob
$: echo #(?|??)
0 00 01 02 03 04 05 06 07 08 09 1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9
Only the single-digit versions without leading zeroes get out of order.
The shell sorts the names by the locale order (not necessarily the byte value) of each individual character. Anything that starts with 1 will go before anything that starts with 2, and so on.
There's two main ways to tackle your problem:
sort -n (numeric sort) the file list, and iterate that.
Rename or recreate the target files (if you can), so all numbers are the same length (in bytes/characters). Left pad shorter numbers with 0 (eg. 01). Then they'll expand like you want.
Using sort (properly):
mapfile -td '' myfiles <(printf '%s\0' * | sort -zn)
for file in "${myfiles[#]}"; do
# what you were going to do
sort -z for zero/null terminated lines is common but not posix. It makes processing paths/data that contains new lines safe. Without -z:
mapfile -t myfiles <(printf '%s\n' * | sort -n)
# Rest is the same.
Rename the target files:
#!/bin/bash
cd /path/to/the/number/files || exit 1
# Gets length of the highest number. Or you can just hardcode it.
length=$(printf '%s\n' * | sort -n | tail -n 1)
length=${#length}
for i in *; do
mv -n "$i" "$(printf "%.${length}d" "$i")"
done
Examples for making new files with zero padded numbers for names:
touch {000..100} # Or
for i in {000..100}; do
> "$i"
done
If it's your script that made the target files, something like $(printf %.Nd [file]) can be used to left pad the names before you write to them. But you need to know the length in characters of the highest number first (N).

how can i cut off the strings from an output in Bash shell?

The command i run is as follows:
rpm -qi setup | grep Install
The output of the command:
Install Date: Do 30 Jul 2020 15:55:28 CEST
I would like to edit this output further more in order to remain with just:
30 Jul 2020
And the rest of the output not to be displayed.
What best editing way in bash can i possibly simply get this end result?
Use grep -Po like so (-P = use Perl regex engine, and -o = print just the match, not the entire line):
echo '**Install Date: Do 30 Jul 2020 15:55:28 CEST**' | grep -Po '\d{1,2}\s+\w{3}\s+\d{4}'
You can also use cut like so (-d' ' = split on blanks, -f4-6 =
print fields 4 through 6):
echo '**Install Date: Do 30 Jul 2020 15:55:28 CEST**' | cut -d' ' -f4-6
Output:
30 Jul 2020
You can do it using just rpmqueryformat and bashprintf:
$ printf '%(%d %b %Y)T\n' $(rpm -q --queryformat '%{INSTALLTIME}\n' setup)
29 Apr 2020

Using variables with sed

I'm trying to delete a part of a file using sed in Linux (Ubuntu). Specifically, I want to delete the first lines of a log file until the first occurrence of the current system date (using the pattern '10 Jan 13').
So, I store the date in a variable
root#server:/# VAR_DATE=`date -R | cut -c6-11`
And after that, I use sed
root#server:/# cat log_file.txt | sed -n -e '/$VAR_DATE/,$p'
But it doesn't work. I've tried a lot of combinations with the same result:
root#server:/# cat log_file.txt | sed -n -e '/"$VAR_DATE"/,$p'
root#server:/# cat log_file.txt | sed -n -e '/"${VAR_DATE}"/,$p'
root#server:/# cat log_file.txt | sed -n -e "/$VAR_DATE/,$p"
What I'm doing wrong?
Use double quotes so the variable $vardate gets expanded by the shell and escape the last $ so it's not expanded by the shell sed -n "/$vardate/,\$p" file:
$ cat file
6 Jan 13
7 Jan 13
8 Jan 13
9 Jan 13
10 Jan 13
11 Jan 13
12 Jan 13
13 Jan 13
$ vardate="10 Jan 13"
$ sed -n "/$vardate/,\$p" file
10 Jan 13
11 Jan 13
12 Jan 13
13 Jan 13

Resources