Bash script sorting of lines in file - linux

Hi im writing a bash script I need to remove blank space and sort the number in ascending anyone can help?
for line in `sed '/^$/d' $userInput`;
myarray[$index]="$line"
index=$(($index+1))
Currently im using the code above to remove blank space but I am not able to sort it.
$userInput is the file. the file contain few lines of number e.g.
1,4,5,6,2,3

If you have perl, and assuming index is zero-based, you could do:
declare -a myarray=($(
perl -F, -laE '/\S/ and say join ",", sort map {$_+0} #F' "$userInput"
))
index=${#myarray[#]}
The Perl script does:
read input from file $userInput
-F, : set comma as delimiter for autosplit option
-l : don't treat newline character as part of line
-a : autosplit input lines into array #F
-E : program follows
/\S/ and : require input line contains non-whitespace (if not true, following command is skipped)
map {$_+0} #F : try to convert elements of#F to numeric (ie. strip whitespace) (denote result as #r1)
sort #r1 : sort the elements of #r1 (denote result as #r2)
join ",", #r2 : construct comma-delimited string from elements of #r2 (denote result as #r3)
say #r3 : output #r3 with trailing newline
The lines output by the Perl script are used as elements of a new bash array myarray. Finally, we set index from the number of elements in myarray.

Related

awk command to split filename based on substring

I have a directory in that file names are like
Abc_def_ijk.txt-1
Abc_def_ijk.txt-2
Abc_def_ijk.txt-3
Abc_def_ijk.txt-4
Abc_def_ijk.txt-5
Abc_def_ijk.txt-6
Abc_def_ijk.txt-7
Abc_def_ijk.txt-8
Abc_def_ijk.txt-9
I like to divide into 4 variables as below
v1=Abc_def_ijk.txt-1,Abc_def_ijk.txt-5,Abc_def_ijk.txt-9
V2=Abc_def_ijk.txt-2,Abc_def_ijk.txt-6
V3=Abc_def_ijk.txt-3,Abc_def_ijk.txt-7
V4=Abc_def_ijk.txt-4,Abc_def_ijk.txt-8
If no of files increase it will goto any of above variables. I'm looking for awk one liners to achieve above.
I would do it using GNU AWK following way, let file.txt content be
Abc_def_ijk.txt-1
Abc_def_ijk.txt-2
Abc_def_ijk.txt-3
Abc_def_ijk.txt-4
Abc_def_ijk.txt-5
Abc_def_ijk.txt-6
Abc_def_ijk.txt-7
Abc_def_ijk.txt-8
Abc_def_ijk.txt-9
then
awk '{arr[NR%4]=arr[NR%4] "," $0}END{print substr(arr[1],2);print substr(arr[2],2);print substr(arr[3],2);print substr(arr[0],2)}' file.txt
output
Abc_def_ijk.txt-1,Abc_def_ijk.txt-5,Abc_def_ijk.txt-9
Abc_def_ijk.txt-2,Abc_def_ijk.txt-6
Abc_def_ijk.txt-3,Abc_def_ijk.txt-7
Abc_def_ijk.txt-4,Abc_def_ijk.txt-8
Explanation: I store lines in array arr and decide where to put given line based on numer of line (NR) modulo (%) four (4). I do concatenate to what is currently stored (empty string if nothing so far) with , and content of current line ($0), this result in leading , which I remove using substr function, i.e. starting at 2nd character.
(tested in GNU Awk 5.0.1)

How do I concatenate each line of 2 variables in bash?

I have 2 variables, NUMS and TITLES.
NUMS contains the string
1
2
3
TITLES contains the string
A
B
C
How do I get output that looks like:
1 A
2 B
3 C
paste -d' ' <(echo "$NUMS") <(echo "$TITLES")
Having multi-line strings in variables suggests that you are probably doing something wrong. But you can try
paste -d ' ' <(echo "$nums") - <<<"$titles"
The basic syntax of paste is to read two or more file names; you can use a command substitution to replace a file anywhere, and you can use a here string or other redirection to receive one of the "files" on standard input (where the file name is then conventionally replaced with the pseudo-file -).
The default column separator from paste is a tab; you can replace it with a space or some other character with the -d option.
You should avoid upper case for your private variables; see also Correct Bash and shell script variable capitalization
Bash variables can contain even very long strings, but this is often clumsy and inefficient compared to reading straight from a file or pipeline.
Convert them to arrays, like this:
NUMS=($NUMS)
TITLES=($TITLES)
Then loop over indexes of whatever array, lets say NUMS like this:
for i in ${!NUMS[*]}; {
# and echo desired output
echo "${NUMS[$i]} ${TITLES[$i]}"
}
Awk alternative:
awk 'FNR==NR { map[FNR]=$0;next } { print map[FNR]" "$0} ' <(echo "$NUMS") <(echo "$TITLE")
For the first file/variable (NR==FNR), set up an array called map with the file number record as the index and the line as the value. Then for the second file, print the entry in the array as well as the line separated by a space.

How do I add a new column with a specific word to a file in linux?

I have a file with one column containing 2059 ID numbers.
I want to add a second column with the word 'pop1' for all the 2059 ID numbers.
The second column will just mean that the ID number belongs to population 1.
How can I do this is linux using awk or sed?
The file currently has one column which looks like this
45958
480585
308494
I want it to look like:
45958 pop1
480585 pop1
308494 pop1
Maybe not the most elegant solution, and it doesn't use sed or awk, but I would do that:
while read -r line; do echo ""$line" pop1" >> newfile; done < test
This command will append stuff in the file 'newfile', so be sure that it's empty or it doesn't exist before executing the command.
Here is the resource I used, on reading a file line by line : https://www.cyberciti.biz/faq/unix-howto-read-line-by-line-from-file/
A Perl solution.
$ perl -lpi -e '$_ .= " pop1"' your-file-name
Command line options:
-l : remove newline from input and replace it on output
-p : put each line of input into $_ and print $_ at the end of each iteration
-i : in-place editing (overwrite the input file)
-e : run this code for each line of the input
The code ($_ .= " pop1") just appends your string to the input record.

Merge values for same key

Is that possible to use awk to values of same key into one row?
For instance
a,100
b,200
a,131
a,102
b,203
b,301
Can I convert them to a file like this:
a,100,131,102
b,200,203,301
You can use awk like this:
awk -F, '{a[$1] = a[$1] FS $2} END{for (i in a) print i a[i]}' file
a,100,131,102
b,200,203,301
We use -F, to use comma as delimiter and use array a to keep aggregated value.
Reference: Effective AWK Programming
If Perl is an option,
perl -F, -lane '$a{$F[0]} = "$a{$F[0]},$F[1]"; END{for $k (sort keys %a){print "$k$a{$k}"}}' file
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace.
-e execute the perl code
-F autosplit modifier, in this case splits on ,
#F is the array of words in each line, indexed starting with $F[0]
$F[0] is the first element in #F (the key)
$F[1] is the second element in #F (the value)
%a is a hash which stores a string containing all matches of each key
tl;dr
If you presort the input, it is possible to use sed to join the lines, e.g.:
sort foo | sed -nE ':a; $p; N; s/^([^,]+)([^\n]+)\n\1/\1\2/; ta; P; s/.+\n//; ba'
A bit more explanation
The above one-liner can be saved into a script file. See below for a commented version.
parse.sed
# A goto label
:a
# Always print when on the last line
$p
# Read one more line into pattern space and join the
# two lines if the key fields are identical
N
s/^([^,]+)([^\n]+)\n\1/\1\2/
# Jump to label 'a' and redo the above commands if the
# substitution command was successful
ta
# Assuming sorted input, we have now collected all the
# fields for this key, print it and move on to the next
# key
P
s/.+\n//
ba
The logic here is as follows:
Assume sorted input.
Look at two consecutive lines. If their key fields match, remove the key from the second line and append the value to the first line.
Repeat 2. until key matching fails.
Print the collected values and reset to collect values for the next key.
Run it like this:
sort foo | sed -nEf parse.sed
Output:
a,100,102,131
b,200,203,301
With datamash
$ datamash -st, -g1 collapse 2 <ip.txt
a,100,131,102
b,200,203,301
From manual:
-s, --sort
sort the input before grouping; this removes the need to manually pipe the input through 'sort'
-t, --field-separator=X
use X instead of TAB as field delimiter
-g, --group=X[,Y,Z]
group via fields X,[Y,Z]
collapse
comma-separated list of all input values

Convert data into desired form using linux

I have data in a tab separated file in the following form (filename.tsv):
#a 0 Espert A trius
#b 9 def J
I want to convert the data into the following form (I am introducing here in every second line):
##<a>
<0 Espert> <abc> <A trius>.
##<b>
<9 def> <abc> <J>.
I am introducing in every line. I know to do the same using python using csv module. But I am trying to learn linux commands, is there a way to do the same in linux terminal using linux commands like grep?
awk seems like the right tool for the job:
awk '{
printf "##<%s>\n<%s %s> <abc> <%s%s%s>.\n",
substr($1,2),
$2,
$3,
$4,
(length($5) ? " " : ""),
$5
}' filename.tsv
awk loops over all lines in the input file and breaks each line into fields by runs of tabs and/or spaces; $1 refers to the first field, $2, to the second, ...
printf functions the same as in C: a format (template) string containing placeholders is followed by corresponding arguments to substitute for the placeholders.
substr($1,2) returns the substring of the 1st field starting at the 2nd character (i.e., a for the 1st line, b for the 2nd) - note that indices in awk are 1-based.
(length($5) ? " " : "") is a C-style ternary expression that returns a single space if the 5th field is nonempty, and an empty string otherwise.

Resources