BASH order strings by the last "fields" (after "/" symbol") - string

I'm looking for a method to sort line alphabetically by their last "field".
So:
if my output is (maybe by a grep command):
mike/downloads.png
mike/public/system.png
mike/root/alphabet.png
the result should be:
root/alphabet.png
downloads.png
public/system.png
beacuse "alphabet" , "downloads" and "system" are order alphabetically.
should I firts cut and sort them with " cut -f2 -d"/" | sort " ? and then merge the rest of the path?
or there is an easier way?
Any helps will be appreciated.
Thanks
(example modified)

Sort has a -t parameter to specify the field delimiter, and -k to specify the field to sort on so you can write:
sort -t/ -k 3

Thank you all! I have finally found what i was looking for
first
awk -F'/' '{print $NF,$0}'
then
sort
and finally
sed -n 's/[^/]*\///p'
and the output will be
folder/file.png
file.png
folder/folder2/file.png

As the number of fields is dynamic you could append the last field to the start of the line before sorting and remove it after:
$ awk -F'/' '{print $NF,$0}' file | sort | awk '{print $2}'
mike/root/alphabet.png
mike/downloads.png
mike/public/system.png

This specifies the third field with a field delimiter of /
sort -t'/' -k 3

Related

Identify duplicate lines in a file that have 2 fields using linux command line

i have a file composed of 2 fields that contains long list of entries where the first fields is the id.the second field is a counter
what i want is to display the duplicated id
example of the file:
tXXXXXXXXXX 12345
tXXXXXXXXXX 53321
tXXXXXXXXXXXX 422642
i know the logic of how i solve this problem that i need to do an iteration or a loop in the file but i do not know how to write the syntax of the command.
i will appreciate any help
You can use this :
perl -ne '++$i;print $i," ",$_ if $line{$_}++' FILENAME
If you mean you just want a list of duplicate IDs in the file, then this can be easily achieved with cut, sort and uniq.
cat <filename> | cut -f1 -d ' ' | sort | uniq -d
If you want to print all the lines with duplicate IDs on, the below can be used:
FILE=/tmp/sdhjkhsfds ; for dupe in $(cat $FILE | cut -f1 -d ' ' | sort | uniq -d); do cat $FILE | grep $dupe | awk '{print $1, $2}'; done

can't make pipe operator function properly - linux

I'm trying to get the second column of a file, get the first 10 results and sort it in alphanumerical order but it doesn't seem to work.
cut -f2 file.txt | head -10 | sort -d
I get this output:
NM_000242
NM_000525
NM_001005850
NM_001136557
NM_001204426
NM_001204836
NM_001271762
NM_001287216
NM_006952
NM_007253
If I sort the file first and get the first 10 lines of the sorted file it works
cut -f2 refGene.txt | sort -d | head -10
I get this output:
NM_000014
NM_000015
NM_000016
NM_000017
NM_000018
NM_000019
NM_000020
NM_000021
NM_000022
NM_000023
I don't want to sort the file and get the sorted result, I'd like to get the first 10 lines first and then sort them in alphanumerical order. What did I miss here?
Thanks
Well, it works correctly NM_000525 is before NM_001005850, and the later is before NM_00695.
But if you need to sort the second part (after the _) numerically, then you can do:
cut -f2 file.txt | head -10 | sort -t_ -k1,1 | sort -s -t_ -k2 -n
-s is a stable sort
Assuming the format is the same in the whole file (two letters _ numbers)
EDIT: Even shorter version would be:
cut -f2 file.txt | head -10 | sort -t_ -k1,1 -k2n
Explanation:
-t_ use _ as separator of fields (for selection on which field to sort)
-k1,1 sort alphabetically from first field (without ,1 it would sort also the second field)
-k2n sort numerically on the second field
So first it will sort by first field (using alphanumeric sorting) and then using the second field (using numeric, so it will convert string to a number and sort that)

Get last n characters of one field and complete second field of a string in Linux

I have 2 lines in a file :
MUMBAI,918889986665,POSTPAID,CRBT123,CRBT,SYSTEM,151004,MONTHLY,160201,160302
MUMBAI,912398456781,POSTPAID,SEGP,SEGP30,SMS,151004,MONTHLY,160201,160302
I wanted to cut field 2 and 4 in above lines. Condition is: from field 2, I need only ten digits.
Desired output:
8889986665,CRBT
2398456781,SEGP30
I am trying below command :
cut -d',' -f2 test.txt | cut -c3-12 && cut -d',' -f4 test.txt
My output:
8889986665
2398456781
CRBT
SEGP30
Kindly help me to achieve desired output.
Solution 2:
Here is the solution which will serve the purpose:
cut -d',' -f2,4 1 | sed 's/.*\([0-9]\{10\}\),\(.*\)/\1,\2/'
8889986665,CRBT123
2398456781,SEGP
cut will give us the second and forth field.
Inside sed, .* to skip the initial characters until the first pattern ahead is encountered.
First pattern is 10 digits followed by a semicolon:
\([0-9]\{10\}\),
Second pattern is rest of the line: \(.*\)
Now we print both the patterns with semicolon in between: \1,\2
Note that the number 10 can replaced by number of characters to be
extracted before the delimiter , [0-9] can be replaced by . if
these characters can be any type of characters.
Solution 1:
Using cut will be easiest for you in this case.
You first need to get desired fields (2,4) filtered from the line and then do more filtering (only 10 characters from field #2)
$ cut -d',' -f2,4 test.txt | cut -c3-
8889986665,CRBT123
2398456781,SEGP
This is job best done using awk:
awk -F, -v n=10 '{print substr($2, length($2)-n+1, n) FS $5}' file
8889986665,CRBT
2398456781,SEGP30
substr command will print last n characters in 2nd column.
sed -r 's/[^,]+,..([^,]+,)([^,]+,)([^,]+),.*/\1\3/' file
8889986665,CRBT123
2398456781,SEGP
cat test.txt | cut -f 2,4 -d ","
assuming your file is test.txt

How to use sed and wc command to handle whitespace

If I have a CSV file and I want to know the number of columns, I'll use the following command:
head -1 CSVFile.csv | sed 's/,/\t/g' | wc -w
However, whenever each column has a column name with a space in it, the command doesn't work and gives me a nonsense figure.
What would be the way to edit this command such that it gives me the correct number of columns?
e.g. in my file I could have column name (t - ZK) or (e - 22)
For example my file could be (first 2 row);
ZZ(v - 1),Tat(t - 1000)
1.1240128401924,2929292929
You are piping the sed output to wc -w which would return the number of words in the output. So if a field header contains spaces, those would be considered as different words.
You can use awk:
head -1 CSVFile.csv | awk -F, '{print NF}'
This would return the number of columns in the file (assuming the file is comma-delimited).
Maybe use the last line instead of the first. Change "head" to "tail". That would be a quick, easy solution.
Try using awk
awk -F, 'NR==1 {print NF; exit}' CSVFile.csv
If you wish to use chain of head, sed and wc
Try using sed replace deliminator as newline \n instead of tab \t and then count number of lines using wc -l instead of counting number of words with wc -w
head -1 CSVFile.csv | sed 's/,/\n/g' | wc -l
perl -ane 'print scalar(#F)-1 if($.==1)' your_file
Assuming there is no "," in header name (like field1,"Surname,name",field3, ...)
sed "1 s/[^,]//g;q" CSVFile.csv | wc -c
Could also be made only in sed but a bit heavy for counting.

unix - count of columns in file

Given a file with data like this (i.e. stores.dat file)
sid|storeNo|latitude|longitude
2|1|-28.03720000|153.42921670
9|2|-33.85090000|151.03274200
What would be a command to output the number of column names?
i.e. In the example above it would be 4. (number of pipe characters + 1 in the first line)
I was thinking something like:
awk '{ FS = "|" } ; { print NF}' stores.dat
but it returns all lines instead of just the first and for the first line it returns 1 instead of 4
awk -F'|' '{print NF; exit}' stores.dat
Just quit right after the first line.
This is a workaround (for me: I don't use awk very often):
Display the first row of the file containing the data, replace all pipes with newlines and then count the lines:
$ head -1 stores.dat | tr '|' '\n' | wc -l
Unless you're using spaces in there, you should be able to use | wc -w on the first line.
wc is "Word Count", which simply counts the words in the input file. If you send only one line, it'll tell you the amount of columns.
You could try
cat FILE | awk '{print NF}'
Perl solution similar to Mat's awk solution:
perl -F'\|' -lane 'print $#F+1; exit' stores.dat
I've tested this on a file with 1000000 columns.
If the field separator is whitespace (one or more spaces or tabs) instead of a pipe:
perl -lane 'print $#F+1; exit' stores.dat
If you have python installed you could try:
python -c 'import sys;f=open(sys.argv[1]);print len(f.readline().split("|"))' \
stores.dat
This is usually what I use for counting the number of fields:
head -n 1 file.name | awk -F'|' '{print NF; exit}'
select any row in the file (in the example below, it's the 2nd row) and count the number of columns, where the delimiter is a space:
sed -n 2p text_file.dat | tr ' ' '\n' | wc -l
Proper pure bash way
Simply counting columns in file
Under bash, you could simply:
IFS=\| read -ra headline <stores.dat
echo ${#headline[#]}
4
A lot quicker as without forks, and reusable as $headline hold the full head line. You could, for sample:
printf " - %s\n" "${headline[#]}"
- sid
- storeNo
- latitude
- longitude
Nota This syntax will drive correctly spaces and others characters in column names.
Alternative: strong binary checking for max columns on each rows
What if some row do contain some extra columns?
This command will search for bigger line, counting separators:
tr -dc $'\n|' <stores.dat |wc -L
3
If there are max 3 separators, then there are 4 fields... Or if you consider:
each separator (|) is prepended by a Before and followed by an After, trimed to 1 letter by word:
tr -dc $'\n|' <stores.dat|sed 's/./b&a/g;s/ab/a/g;s/[^ab]//g'|wc -L
4
Counting columns in a CSV file
Under bash, you may use csv loadable plugins:
enable -f /usr/lib/bash/csv csv
IFS= read -r line <file.csv
csv -a fields <<<"$line"
echo ${#fields[#]}
4
For more infos, see How to parse a CSV file in Bash?.
Based on Cat Kerr response.
This command is working on solaris
awk '{print NF; exit}' stores.dat
you may try:
head -1 stores.dat | grep -o \| | wc -l

Resources