Linux : remove duplicate line - linux

I have a file txt. I would like to remove all duplicate line.
I tried these, but did not work
sort -ur file.txt
or
uniq -D -f 2 file.txt
file.txt
34.78.54.21 websrv1 nameweb
34.78.54.21 nameweb
I just need one line

From your input I assume you are referring to the first field (34.78.54.21) as a duplicate. If you just want to keep the first occurrence of each number then this works for you:
awk '!a[$1]++' file.txt
Output:
34.78.54.21 websrv1 nameweb
This command looks if $1 is not as a key in the array. If it is not then it will be added to the array and the default print will happen. For the next line $1 is in the array and the whole thing will evaluate to false and not print.

Related

How to get first word of every line and pipe it into dmenu script

I have a text file like this:
first state
second state
third state
Getting the first word from every line isn't difficult, but the problem comes when adding the extra \n required to separate every word (selection) in dmenu, per its syntax:
echo -e "first\nsecond\nthird" | dmenu
I haven't been able to figure out how to add the separating \n. I've tried this:
state=$(awk '{for(i=1;i<=NF;i+=2)print $(i)'\n'}' text.txt)
But it doesn't work. I also tried this:
lol=$(grep -o "^\S*" states.txt | perl -ne 'print "$_"')
But same deal. Not sure what I'm doing wrong.
Your problem is in the AWK script. You need to identify each input line as a record. This way, you can control how each record in the output is separated via the ORS variable (output record separator). By default this separator is the newline, which should be good enough for your purpose.
Now to print the first word of every input record (each line in the input stream in this case), you just need to print the first field:
awk '{print $1}' textfile | dmenu
If you need the output to include the explicit \n string (not the control character), then you can just overwrite the ORS variable to fit your needs:
awk 'BEGIN{ORS="\\n"}{print $1}' textfile | dmenu
This could be more easily done in while loop, could you please try following. This is simple, while is reading the file and during that its creating 2 variables 1st is named first and other is rest first contains first field which we are passing to dmenu later inside.
while read first rest
do
dmenu "$first"
done < "Input_file"
Based on the text file example, the following should achieve what you require:
awk '{ printf "%s\\n",$1 }' textfile | dmenu
Print the first space separated field of each line along with \n (\n needs to be escaped to stop it being interpreted by awk)
In your code
state=$(awk '{for(i=1;i<=NF;i+=2)print $(i)'\n'}' text.txt)
you attempted to use ' inside your awk code, however code is what between ' and first following ', therefore code is {for(i=1;i<=NF;i+=2)print $(i) and this does not work. You should use " for strings inside awk code.
If you want to merely get nth column cut will be enough in most cases, let states.txt content be
first state
second state
third state
then you can do:
cut -d ' ' -f 1 states.txt | dmenu
Explanation: treat space as delimiter (-d ' ') and get 1st column (-f 1)
(tested in cut (GNU coreutils) 8.30)

Find a line and modify it in a csv file given an input

I have a csv file with a list of workers and I wanna make an script for modify their work group given their ID's. Lines in CSV files are like this:
Before:
ID TAG GROUP
niub16677500;B00;AB0
After:
ID TAG GROUP
niub16677500;B00;BC0
How I can make this?
I'm working with awk and sed commands but I couldn't get anything at the moment.
With awk:
awk -F';' -v OFS=';' -v id="niub16677500" -v new_group="BC0" '{if($1==id)$3=new_group}1' input.csv
ID;TAG;GROUP
niub16677500;B00;BC0
Redirect the output to a file and note that the csv header should use the same field separator as the body.
Explanations:
-F';' to have input field separator as ;
-v OFS=';' same for the output FS
-v id="niub16677500" -v new_group="BC0" define the variables that you are going to use in the awk commands
'{if($1==id)$3=new_group}1' when the first column is equal to the value contained in variable id the overwrite the 3rd field and print the line
With sed:
id="niub16677500"; new_group="BC0"; sed "/^$id/s/;[^;]*$/;$new_group/" input.csv
ID;TAG;GROUP
niub16677500;B00;BC0
You can either do an inline change using -i.bak option, or redirect the output to a file.
Explanations:
Store the values in 2 variables
/^$id/ when you reach a line that starts with the ID store in the variable id, run sed search and replace
s/;[^;]*$/;$new_group/ search and replace command that will replace the last field by the new value
Sed can do it,
echo 'niub16677500;B00;AB0' | sed 's/\(^niub16677500;...;\)\(...\)$/\1BC0/'
will replace the AB0 group in your example with BC0, by matching the user name, semicolon, whatever 3 characters and another semicolon, and then matching the remaining 3 characters. Then as an output it repeats the first match with \1 and adds BC0.
You can use :
sed 's/\(^niub16677500;...;\)\(...\)$/\1BC0/' <old_file >new_file
to make a new_file with this change.
https://www.grymoire.com/Unix/Sed.html is a great resource, you should take a look at it.

Bash script - delete duplicates

I need to extract name from a file and delete duplicates.
output.txt:
Server001-1
Server001-2
Server001-3
Server001-4
Server002-1
Server002-2
Server003-1
Server003-2
Server003-3
I need to only have output as follow.
Server001-1
Server002-1
Server003-1
So, only print first server for every server group (Server00*) and delete the rest in that group.
try simply with awk:
awk -F"-" '!a[$1]++' Input_file
Explanation: Making a field separator as - and then creating an array named a whose index is current line's 1st field and checking here a condition !a[$1] means it will check if current line's 1st field doesn't have any presence in array a then do a print of that line and then ++ means it will create that specific line's 1st field's occurrence value to 1 in array a so next time that line will not be printed.
awk -F- 'dat[$1]=="" { dat[$1]=$0 } END { for (i in dat) {print dat[i]}}' filename
result:
Server001-1
Server002-1
Server003-1
Create an array keyed on the first space delimited piece of data storing the complete line only when there are no other entries in that array entry. This will ensure that only the first unique entry is stored. Loop through the array and print
Simple GNU datamash solution:
datamash -t'-' -g1 first 2 <file
-t'-' - field separator
-g1 - group lines by the 1st field
first 2 - get only first value of the 2 field for each group. Can be also changed to min 2 operation
The output:
Server001-1
Server002-1
Server003-1
Since you've mentioned the string format as Server00*, you can simply use this one :
grep -E "Server\d+-1" file
Server\d+ for cases Server1000, Server100000 etc
or even
grep '[-]1$' file
Output for both :
Server001-1
Server002-1
Server003-1
A simple way just 1 command line to get general unique result:
nin output.txt nul "^(\w+)-\d+" -u -w
Explanation:
nul is a non-existing Windows file like /dev/null on Linux.
-u to get unique result, -w to output whole lines. Ignore case ? use -i.
"^(\w+)-\d+" is the same Regex syntax in C++/C#/Java/Scala, etc.
Save to file ? nin output.txt nul "^(\w+)-\d+" -u -w > result.txt
Save to file with summary info ? nin output.txt nul "^(\w+)-\d+" -u -w -I > result.txt
Future automation with nin.exe : Result count = return value %ERRORLEVEL%
nin.exe / nin.gcc* is a single portable exe tool to get difference or intersection keys/lines between 2 files or a pipe and a file. See my open project tools directory of https://github.com/qualiu/msr.
And you can also see colorful built-in usage/examples: https://qualiu.github.io/msr/usage-by-running/nin-Windows.html

Generate record of files which have been removed by grep as a secondary function of primary command

I asked a question here to remove unwanted lines which contained strings which matched a particular pattern:
Remove lines containg string followed by x number of numbers
anubhava provided a good line of code which met my needs perfectly. This code removes any line which contains the string vol followed by a space and three or more consecutive numbers:
grep -Ev '\bvol([[:blank:]]+[[:digit:]]+){2}' file > newfile
The command will be used on a fairly large csv file and be initiated by crontab. For this reason, I would like to keep a record of the lines this command is removing, just so I can go back to check the correct data that is being removed- I guess it will be some sort of log containing the name sof the lines that did not make the final cut. How can I add this functionality?
Drop grep and use awk instead:
awk '/\<vol([[:blank:]]+[[:digit:]]+){2}/{print >> "deleted"; next} 1' file
The above uses GNU awk for word delimiters (\<) and will append every deleted line to a file named "deleted". Consider adding a timestamp too:
awk '/\<vol([[:blank:]]+[[:digit:]]+){2}/{print systime(), $0 >> "deleted"; next} 1' file

Delete lines from a file matching first 2 fields from a second file in shell script

Suppose I have setA.txt:
a|b|0.1
c|d|0.2
b|a|0.3
and I also have setB.txt:
c|d|200
a|b|100
Now I want to delete from setA.txt lines that have the same first 2 fields with setB.txt, so the output should be:
b|a|0.3
I tried:
comm -23 <(sort setA.txt) <(sort setB.txt)
But the equality is defined for whole line, so it won't work. How can I do this?
$ awk -F\| 'FNR==NR{seen[$1,$2]=1;next;} !seen[$1,$2]' setB.txt setA.txt
b|a|0.3
This reads through setB.txt just once, extracts the needed information from it, and then reads through setA.txt while deciding which lines to print.
How it works
-F\|
This sets the field separator to a vertical bar, |.
FNR==NR{seen[$1,$2]=1;next;}
FNR is the number of lines read so far from the current file and NR is the total number of lines read. Thus, when FNR==NR, we are reading the first file, setB.txt. If so, set the value of associative array seen to true, 1, for the key consisting of fields one and two. Lastly, skip the rest of the commands and start over on the next line.
!seen[$1,$2]
If we get to this command, we are working on the second file, setA.txt. Since ! means negation, the condition is true if seen[$1,$2] is false which means that this combination of fields one and two was not in setB.txt. If so, then the default action is performed which is to print the line.
This should work:
sed -n 's#\(^[^|]*|[^|]*\)|.*#/^\1/d#p' setB.txt |sed -f- setA.txt
How this works:
sed -n 's#\(^[^|]*|[^|]*\)|.*#/^\1/d#p'
generates an output:
/^c|d/d
/^a|b/d
which is then used as a sed script for the next sed after the pipe and outputs:
b|a|0.3
(IFS=$'|'; cat setA.txt | while read x y z; do grep -q -P "\Q$x|$y|\E" setB.txt || echo "$x|$y|$z"; done; )
explanation: grep -q means only test if grep can find the regexp, but do not output, -P means use Perl syntax, so that the | is matched as is because the \Q..\E struct.
IFS=$'|' will make bash to use | instead of the spaces (SPC, TAB, etc.) as token separator.

Resources