add a column with different label - linux

I want to add a column that contains two different labels. Let's say I have this text
aa bb cc
dd ee ff
gg hh ii
ll mm nn
oo pp qq
and I want to add 1 at hte first column of the first two lines and 2 at the first column of the remaining lines, so that eventually I will get that text:
1 aa bb cc
1 dd ee ff
2 gg hh ii
3 ll mm nn
4 oo pp qq
Do you know how to do it?
thanks

Assuming you are processing a text file in Linux shell, you could use awk for this. Your problem description says you want two labels 1 and 2, this would be
cat input.txt | awk '{print (NR<=2 ? "1 ":"2 ") $0}'
Your expected output says you want label 1 for the first two lines, and start counting from 2 beginning with the third line, this would be
cat input.txt | awk '{print (NR<=2 ? "1 ":NR-1" ") $0}'

I'm assuming that you want to do this using the shell, if your data is in a file called input.txt, you can either use cat -n or nl.
% tail -n+2 input.txt | cat -n
1 dd ee ff
2 gg hh ii
3 ll mm nn
4 oo pp qq
% tail -n+2 input.txt | nl
1 dd ee ff
2 gg hh ii
3 ll mm nn
4 oo pp qq
The first line can be added back manually.
The two commands will behave differently if you have empty lines in your input file.

Could you please try following and let me know if this helps you.
Solution 1st: By using a variable named count whose initial value is 1 and then checking if line number is either 1 or 2 then simply append 1 in $1 else increase variable count's value by 1 and append it in $1's value.
awk -v count=1 '{$1=NR==1||NR==2?1 FS $1:++count FS $1} 1' Input_file
Solution 2nd: Checking if line number is 1 or 2 then simply adding 1 to $1 else checking if a line is NOT NULL then add NR-1(which means subtract 1 from it's line number) and add to $1's value.
awk '{$1=NR==1||NR==2?1 FS $1:(NF?FNR-1 FS $1:"")} 1' Input_file

Related

How to remove lines based on another file? [duplicate]

This question already has answers here:
How to delete rows from a csv file based on a list values from another file?
(3 answers)
Closed 2 years ago.
Now I have two files as follows:
$ cat file1.txt
john 12 65 0
Nico 3 5 1
king 9 5 2
lee 9 15 0
$ cat file2.txt
Nico
king
Now I would like to remove each line which contains a name fron the second file in its first column.
Ideal result:
john 12 65 0
lee 9 15 0
Could anyone tell me how to do that? I have tried the code like this:
for i in 'less file2.txt'; do sed "/$i/d" file1.txt; done
But it does not work properly.
You don't need to iterate it, you just need to use grep with-v option to invert match and -w to force pattern to match only WHOLE words
grep -wvf file2.txt file1.txt
This job suites awk:
awk 'NR == FNR {a[$1]; next} !($1 in a)' file2.txt file1.txt
john 12 65 0
lee 9 15 0
Details:
NR == FNR { # While processing the first file
a[$1] # store the first field in an array a
next # move to next line
}
!($1 in a) # while processing the second file
# if first field doesn't exist in array a then print

printf format specifiers in awk does not work for multiple parameters

I'm trying to Write a program script in Bash named example7
Which accepts as parameters: a file name (lets call it file 1) and a list of
Numbers (below we'll call it List 1) and what the program needs to do is to print as output the columns from
File 1 after rescheduling them to right or left by the numbers in list 1. (This is obtainable by using the printf command of awk).
Example
Suppose the contents of an F1 file are:
A abcd ddd eee zz tt
ab gggwe 12 88 iii jjj
yaara yyzz 12abcd xyz x y z
After running the program by command:
example7 F1 -8 -7 6 4
Output:
A abcd ddd eee
ab gggwe 12 88
yaara yyzz 12abcd xyz
In the example above between A and ABCD there are 7 spaces, between abcd and ddd
there are 6 spaces and between ddd and eee
there is one space.
Another example:
After running the program by command:
example7 F1 -8 -7 6 4 5
Output:
A abcd ddd eee zz
ab gggwe 12 88 iii
yaara yyzz 12abcd xyz x
In the example above between A and ABCD there are 7 spaces, between abcd and ddd
there are 6 spaces, between ddd and eee
there is one space, between eee and zz there are 3 spaces, between 88 and iii
there are two spaces, between xyz and x there are 4 spaces.
I've tried doing something like this:
file=$1
shift
awk '{printf "%'$1's\n" ,$1}' $file
but it only works for one number and one parameter and I don't know how I can do it for multiple columns and multiple parameters..
any help will be appreciated.
Set an awk variable to all the remaining parameters, then split it and loop over them.
file=$1
shift
awk -v sizes="$*" '{words = split(sizes, s); for(i = 1; i <= words; i++) printf("%" s[i] "s", $i); print ""; }' "$file"
It's generally wrong to try to substitute a shell variable directly into an awk script. You should prefer to set an awk variable using -v, and then use awk's own string concatenation operation, as I did with s[i].

How to add number of identical line next to the line itself? [duplicate]

This question already has answers here:
Find duplicate lines in a file and count how many time each line was duplicated?
(7 answers)
Closed 7 years ago.
I have file file.txt which look like this
a
b
b
c
c
c
I want to know the command to which get file.txt as input and produces the output
a 1
b 2
c 3
I think uniq is the command you are looking for. The output of uniq -c is a little different from your format, but this can be fixed easily.
$ uniq -c file.txt
1 a
2 b
3 c
If you want to count the occurrences you can use uniq with -c.
If the file is not sorted you have to use sort first
$ sort file.txt | uniq -c
1 a
2 b
3 c
If you really need the line first followed by the count, swap the columns with awk
$ sort file.txt | uniq -c | awk '{ print $2 " " $1}'
a 1
b 2
c 3
You can use this awk:
awk '!seen[$0]++{ print $0, (++c) }' file
a 1
b 2
c 3
seen is an array that holds only uniq items by incrementing to 1 first time an index is populated. In the action we are printing the record and an incrementing counter.
Update: Based on comment below if intent is to get a repeat count in 2nd column then use this awk command:
awk 'seen[$0]++{} END{ for (i in seen) print i, seen[i] }' file
a 1
b 2
c 3

Change format of text file

I have a file with many lines of tab separated data in the following format:
1 1 2 2
3 3 4 4
5 5 6 6
...
and I would like to change the format to:
1 1
2 2
3 3
4 4
5 5
6 6
Is there a not too complicated way to do this? I don't have any experience with using awk, sed, etc.
Thanks
If you just want to group your file in blocks of X columns, you can make use of xargs -nX:
$ xargs -n2 < file
1 1
2 2
3 3
4 4
5 5
6 6
To have more control and print an empty line after 4th field, you can also use this awk:
$ awk 'BEGIN{FS=OFS="\t"} {for (i=1;i<=NF;i++) printf "%s%s", $i, (i%2?OFS:RS); print ""}' file
1 1
2 2
3 3
4 4
5 5
6 6
# <-- note there is an empty line here
Explanation
On odd fields, it print FS after it.
On even fields, print RS.
Note FS stands for field separator, which defaults to space, and RS stands for record separator, which defaults to new line. As you have tab as field separator, we redefine it in the BEGIN block.
This is probably the simplest way which allows for customisation
awk '{print $1,$2"\n"$3,$4}' file
For a line between
awk '{print $1,$2"\n"$3,$4"\n"}' file
although fedorquis answer with xargs is probably the simplest if this isn't needed
As Ed pointed out this wouldn't work if there were blanks in the fields, this could be resolved using
awk 'BEGIN{FS=OFS="\t"} {print $1,$2 ORS $3,$4 ORS}' file
Through perl,
perl -pe 's/\t(\d\t\d)$/\n$1\n/g' file
Fed the above command's output to the sed command to delete the last blank line.
perl -pe 's/\t(\d\t\d)$/\n$1\n/g' file | sed '$d'

count using awk commands

I have fileA.txt and a few lines of it are shown below:
AA
BB
CC
DD
EE
And i have fileB.txt, and it has text like shown below:
Group col2 col3 col4
1 pp 4567 AA,BC,AB
1 qp 3428 AA
2 pp 3892 AA
3 ee 28399 AA
4 dd 3829 BB,CC
1 dd 27819 BB
5 ak 29938 CC
For every line in fileA.txt, it should count the number of times it is present in fileB.txt based on column1 in fileB.txt.
Sample output should look like:
AA 3
BB 2
CC 2
AA is present 4 times but it is present in the group "1" twice. If it is present more than once in the same group in column1,it should be counted only once and therefore in the above output AA count is 3.
Any help using awk or any other oneliners?
Here is an awk one-liner that should work:
awk '
NR==FNR && !seen[$4,$1]++{count[$4]++;next}
($1 in count){print $1,count[$1]}' fileB.txt fileA.txt
Explaination:
NR==FNR&&!seen[$4,$1]++ pattern is only true when Column 1 has not been captured at all. For all duplicate captures we dont increment the counter.
$1 in count looks for first file column 1 presence in array. If it is present, we print along with counts.
Output:
$ awk 'NR==FNR && !seen[$4,$1]++{count[$4]++;next}($1 in count){print $1,count[$1]}' fileB.txt fileA.txt
AA 3
BB 2
CC 1
Update based on the modified question:
awk '
NR==FNR {
n = split($4,tmp,/,/);
for(x = 1; x <= n; x++) {
if(!seen[$1,tmp[x]]++) {
count[tmp[x]]++
}
}
next
}
($1 in count) {
print $1, count[$1]
}' fileB.txt fileA.txt
Outputs:
AA 3
BB 2
CC 2
Pure bash (4.0 or newer):
#!/bin/bash
declare -A items=()
# read in the list of items to track
while read -r; do items[$REPLY]=0; done <fileA.txt
# read fourth column from fileB and increment for each match
while read -r _ _ _ item _; do
[[ ${items[$item]} ]] || continue # skip unrecognized values
items[$item]=$(( items[$item] + 1 )) # otherwise, increment
done <fileB.txt
# print output
for key in "${!items[#]}"; do # iterate over keys
value="${items[$key]}" # look up values
printf '%s\t%s\n' "$key" "$value" # print them together
done
A simple awk one-liner.
awk 'NR>FNR{if($0 in a)print$0,a[$0];next}!a[$4,$1]++{a[$4]++}' fileB.txt fileA.txt
Note the order of files.

Resources