Finding different groups between 0 and 1000 that are in the file - linux

I have a file with 7 fields separated with a :. In field 4 it has the group number. I want to display the group numbers within 0-1000. If there is a duplicate, I only want to print one copy of it along with the other group numbers that don't have a duplicate.
I have to use grep, awk, sort and uniq.
I don't know the first place to start. Can someone please help me?

awk to the rescue!
$ awk -F: '$4>=0 && $4<=1000 && !a[$4]++' file
conditions are trivial, the array indexed by $4 will have nonzero value for the duplicates and not printed, only the first value of duplicates will have zero (before the ++ increment) value and printed.

Related

sed/awk | single digits to two digits (zero) after a second slash

maybe someone can help me briefly...
for example in file.txt...
nw-3001-e0z-4581a/2/5
sed 's/\<[0-9]\>/0&/' file.txt ...
nw-3001-e0z-4581a/02/5
but I want the filled zero only after the second slash, the first number should remain a single digit
thanks in advance! greetz
Could you please try following, written and tested with shown samples. Simply setting field separator and output field separator as / for awk program and then simply adding 0 before 3rd column(if there is only single digit present in it) and print the line.
echo "nw-3001-e0z-4581a/2/5" | awk 'BEGIN{FS=OFS="/"} {$3=sprintf("%02d",$3)} 1'
You can use
awk 'BEGIN{FS=OFS="/"} $NF ~ /^[0-9]$/ {$NF="0"$NF}1' file.txt
Details:
BEGIN{FS=OFS="/"} - sets input/output field separator to /
$NF ~ /^[0-9]$/ - if last field is a single digit
{$NF="0"$NF} - prepend last field with 0
1 - print tjhe result.
Using sed:
sed -rn 's#(^.*/)(.*/)([[:digit:]]{1}$)#\1\20\3#p' <<< "nw-3001-e0z-4581a/2/5"
Split the string into 3 sections using regular expressions (-r). Ensure that the last section has one digit only with [[:digit:]]{1} and substitute the line for the first and second sections, followed by "0" and the third section, printing the result.
$ sed 's:/:&0:2' file
nw-3001-e0z-4581a/2/05
If that's not all you need then edit your question to show more truly representative sample input/output including cases that doesn't work for.

AWK - Show lines where column contains a specific string

I have a document (.txt) composed like that.
info1: info2: info3: info4
And I want to show some information by column.
For example, I have some different information in "info3" shield, I want to see only the lines who are composed by "test" in "info3" column.
I think I have to use sort but I'm not sure.
Any idea ?
The previous answers are assuming that the third column is exactly equal to test. It looks like you were looking for columns where the value included test. We need to use awk's match function
awk -F: 'match($3, "test")' file
You can use awk for this. Assuming your columns are de-limited by : and column 3 has entries having test, below command lists only those lines having that value.
awk -F':' '$3=="test"' input-file
Assuming that the spacing is consistent, and you're looking for only test in the third column, use
grep ".*:.*: test:.*" file.txt
Or to take care of any spacing that might occur
grep ".*:.*: *test *:.*" file.txt

use uniq -d on a particular column?

Have a text file like this.
john,3
albert,4
tom,3
junior,5
max,6
tony,5
I'm trying to fetch records where column2 value is same. My desired output.
john,3
tom,3
junior,5
tony,5
I'm checking if we can use uniq -d on second column?
Here's one way using awk. It reads the input file twice, but avoids the need to sort:
awk -F, 'FNR==NR { a[$2]++; next } a[$2] > 1' file file
Results:
john,3
tom,3
junior,5
tony,5
Brief explanation:
FNR==NR is a common AWK idiom that is true for the first file in the arguments list. Here, column two is added to an array and incremented. On the second read of the file, we simply check if the value of column two is greater than one (the next keyword skips processing the rest of the code).
You can use uniq on fields (columns), but not easily in your case.
Uniq's -f and -s options filter by fields and characters respectively. However neither of these quite do what want.
-f divides fields by whitespace and you separate them with commas.
-s skips a fixed number of characters and your names are of variable length.
Overall though, uniq is used to compress input by consolidating duplicates into unique lines. You are actually wishing to retain duplicates and eliminate singletons, which is the opposite of what uniq is used to do. It would appear you need a different approach.

Print Min and Max from file in Linux

This is a homework assignment and I'm a bit stumped here. The objective is as follows:
Create a file called grades that will contain quiz scores. The file should be created so that
there is only one quiz score per line. Write a script called minMax that will accept a parameter
that represents the file grades and then determine the minimum and maximum scores received
on the quizzes. Your script should display the output in the following format:
Your highest quiz score is #.
Your lowest quiz score is #.
What I have done to accomplish this is first sort the grades so that it goes in order. Then I attempted to pipe it with this command such as this:
sort grades |awk 'NR==1;END{print}' grades
The output I get when I am done is the first and last entry of the file, but its no longer sorted and I'm not sure how to pick out the first and last to print them, is it $1 and $2?
Any help would be greatly appreciated.
sort -n grades | sed -n '1s/.*/Lowest: &/p;$s/.*/Highest: &/p;'
Lowest: 2
Highest: 19
You need to sort -n if you want to sort by number.
With sed, you may handle it in one pass.
Multiple Sed comamnds are concatenated by ;.
1s and $s mean the first and last line.
& is the whole read expression/line.
p prints the result.
-n is -no printing in general.
you can use head, and tail
head will get first
tail will get the last

How to delete double lines in bash

Given a long text file like this one (that we will call file.txt):
EDITED
1 AA
2 ab
3 azd
4 ab
5 AA
6 aslmdkfj
7 AA
How to delete the lines that appear at least twice in the same file in bash? What I mean is that I want to have this result:
1 AA
2 ab
3 azd
6 aslmdkfj
I do not want to have the same lines in double, given a specific text file. Could you show me the command please?
Assuming whitespace is significant, the typical solution is:
awk '!x[$0]++' file.txt
(eg, The line "ab " is not considered the same as "ab". It is probably simplest to pre-process the data if you want to treat whitespace differently.)
--EDIT--
Given the modified question, which I'll interpret as only wanting to check uniqueness after a given column, try something like:
awk '!x[ substr( $0, 2 )]++' file.txt
This will only compare columns 2 through the end of the line, ignoring the first column. This is a typical awk idiom: we are simply building an array named x (one letter variable names are a terrible idea in a script, but are reasonable for a one-liner on the command line) which holds the number of times a given string is seen. The first time it is seen, it is printed. In the first case, we are using the entire input line contained in $0. In the second case we are only using the substring consisting of everything including and after the 2nd character.
Try this simple script:
cat file.txt | sort | uniq
cat will output the contents of the file,
sort will put duplicate entries adjacent to each other
uniq will remove adjcacent duplicate entries.
Hope this helps!
The uniq command will do what you want.
But make sure the file is sorted first, it only checks for consecutive lines.
Like this:
sort file.txt | uniq

Resources