How to sort and ignore spaces? - linux

I'm trying to sort a file but I can't get the results I want.
I have this file :
742550111 aaa aaa aaa aaa aaa 2008 3 1 1
5816470687 aa a dissertation for the 933 2 2 2
Each field is separated by a tabulation, and I would like to sort on the second column.
When I try sort test.txt -t\t -k 2, the output is the same as in the file.
But the output I want to have is :
5816470687 aa a dissertation for the 933 2 2 2
742550111 aaa aaa aaa aaa aaa 2008 3 1 1
I think that's because sort ignores the spaces between the words.
So I tried with this command : LC_ALL=C sort test.txt -t\t -k 2, but it still doesn't work.
Do you have any ideas ?

Bash replaces $'\t' with a real tab:
LC_ALL=C sort file -t $'\t' -k 2
Output:
5816470687 aa a dissertation for the 933 2 2 2
742550111 aaa aaa aaa aaa aaa 2008 3 1 1

Related

How to compare two columns in same file and store the difference in new file with the unchanged column according to it?

Row Actual Expected
1 AAA BBB
2 CCC CCC
3 DDD EEE
4 FFF GGG
5 HHH HHH
I want to compare actual and expected and store the difference in a file. Like
Row Actual Expected
1 AAA BBB
3 DDD EEE
4 FFF GGG
I have used awk -F, '{if ($2!=$3) {print $1,$2,$3}}' Sample.csv It will only compare Int values not String value
You can use AWK to do this
awk '{if($2!=$3) print $0}' oldfile > newfile
where
$2 and $3 are second and third columns
!= means second and third columns does not match
$0 means whole line
> newfile redirects to new file
I prefer an awk solution (can handle more fields and easier to understand), but you could use
sed -r '/\t([^ ]*)\t\1$/d' Sample.csv
Assuming the file uses tab or some other delimiter to separate the columns, then tsv-filter from eBay's TSV Utilities supports this type of field comparison directly. For the file above:
$ tsv-filter --header --ff-str-ne 2:3 file.tsv
Row Actual Expected
1 AAA BBB
3 DDD EEE
4 FFF GGG
The --ff-str-ne option compares two fields in a row for non-equal strings.
Disclaimer: I'm the author.

Mainframe Sort - Getting count based on field

My requirement is to get the count based on the field.
For example:
AAA 1234
AAA 111
...
AAA 112
BBB 123
BBB 123
...
BBB 333
CCC 333
Output should be:
AAA 2000
BBB 300
CCC 1
I am using the Sort card:
SORT FIELDS=(1,3,CH,A)
OUTFIL REMOVECC,NODETAIL,
SECTIONS=(1,3,TRAILER3=(1,3,X,COUNT=(M10,LENGTH=10)))
But I need the count to be left justified. Currently the count is displaying with leading spaces.
How can I make these count results left justified?

Insert a line before specific ID and renumerate ID column

A file contains ID, Name and other columns. I want to insert a row containing name with details before a specific ID. Then ID column should be updated with proper ID sequence.
Example
Sample File content:
Header1
Header2
1 AAA ...
2 BBB ...
3 CCC ...
4 XXX ...
5 YYY ...
6 ZZZ ...
Footer
I want to insert MMM ... before ID #4 i.e. before a row 4 XXX ...
Desired output:
Header1
Header2
1 AAA ...
2 BBB ...
3 CCC ...
4 MMM ...
5 XXX ...
6 YYY ...
7 ZZZ ...
Footer
I could do proper insert using following command but not sure how to update ID column with proper numbering.
sed '/^\s*4/ i 4 MMM ...' file
It would be appreciable if you could help me solve this problem.
One option can be:
awk '/^4/ {print ++i, "MMM"} /^[0-9]/ {$1=++i} 1' file
Explanation
/^4/ {print ++i, "MMM"} on line starting with 4, print MMM with an incremental value.
/^[0-9]/ {$1=++i} on lines starting with number, set first field to an incremental value.
1 print line
Test
$ awk '/^4/ {print ++i, "MMM"} /^[0-9]/ {$1=++i} 1' file
Header1
Header2
1 AAA ...
2 BBB ...
3 CCC ...
4 MMM
5 XXX ...
6 YYY ...
7 ZZZ ...
Footer
$ awk '/^4 /{print "4 MMM ..."; inc=1} /^[[:digit:]]/{$1+=inc} 1' file
Header1
Header2
1 AAA ...
2 BBB ...
3 CCC ...
4 MMM ...
5 XXX ...
6 YYY ...
7 ZZZ ...
Footer

How to replace the character I want in a line

1 aaa bbb aaa
2 aaa ccccccccc aaa
3 aaa xx aaa
How to replace the second aaa to yyy for each line
1 aaa bbb yyy
2 aaa ccccccccc yyy
3 aaa xx yyy
Issuing the following command will solve your problem.
:%s/\(aaa.\{-}\)aaa/\1yyy/g
Another way would be with \zs and \ze, which mark the beginning and end of a match in a pattern. So you could do:
:%s/aaa.*\zsaaa\ze/yyy
In other words, find "aaa" followed by anything and then another "aaa", and replace that with "yyy".
If you have three "aaa"s on a line, this won't work, though, and you should use \{-} instead of *. (See :h non-greedy)

Add a line counter to lines matching a pattern

I need to prepend a line counter to lines matching specific patterns in a file, while still outputting the lines that do not match this pattern.
For example, if my file looks like this:
aaa 123
bbb 456
aaa 666
ccc 777
bbb 999
and the patterns I want to count are 'aaa' and 'ccc', I'd like to get the following output:
1:aaa 123
bbb 456
2:aaa 666
3:ccc 777
bbb 999
Preferably I'm looking for a Linux one-liner. Shell or tool doesn't matter as long it's installed by default in most distros.
With awk:
awk '{if ($1=="aaa" || $1=="ccc") {a++; $0=a":"$0}} {print}' file
1: aaa 123
bbb 456
2: aaa 666
3: ccc 777
bbb 999
Explanation
Loop through lines checking whether first field is aaa or ccc. If so, append the line ($0) with the variable a and auto increment it. Finally, print the line in all cases: if the pattern was matched will have a in the beginning, otherways just the original line.
Use the following code. The following approach is in perl
open FH,"<abc.txt";
$incremental_val = 1;
while(my $line = <FH>){
chomp($line);
if($line =~ m/^aaa / || $line =~ m/^ccc /){
print "$incremental_val : $line\n";
$incremental_val++;
next;
}
print "$line\n";
}
close FH;
The output will be as follows.
1 : aaa 123
bbb 456
2 : aaa 666
3 : ccc 777
bbb 999

Resources