Using AWK to add a column to a TSV (tabbed separated file)

Using AWK to add a column to a TSV (tabbed separated file) - linux

I want to add a column to the beginning of a tabbed delimited file using awk, so a line like
col1 col2 col3
would end up like
345 col1 col2 col3
So far I have this
awk '{FS=" "; OFS=" "; print '345' $0;}' file.tsv > output.tsv
but I end up with
345col1 col2 col3
Where am I going wrong ?
Thanks

you need a comma after the '345'

You can simply do:
awk '{print "345\t"$0}' file.tsv > output.tsv

This awk should work:
awk '{FS=OFS="\t"; print '345', $0}' file
EDIT: Output on OSX with od to prove it works on OSX:
$> awk '{FS=OFS="\t"; print '345', $0}' file|od -c
0000000 3 4 5 \t c o l 1 \t c o l 2 \t c o
0000020 l 3 \n
0000023
$> uname -a
Darwin US143639.local 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386 i386

This might work for you (GNU sed):
sed 's/^/345\t/' file

Related

Reformat item with line feeds from a CSV file

I have a csv and I would like to know how to replace newline by -, just in the brothers column, with bash:
name,brothers,age,adress
------------------------
john,"marc
peter
paul
alex",18,street
thomas,mike,20,place

Awk is perfect for this
awk -v RS='^$' -v ORS= '{while ( match($0,/"[^"]+"/,a) ) {gsub(/\n/," ",a[0]); print substr($0,1,RSTART-1) a[0]; $0=substr($0,RSTART+RLENGTH)} print}' your.csv
outputs:
me,brothers,age,adress
------------------------
john,"marc peter paul alex",18,street
thomas,mike,20,place

Ungainly combo of csvtool, sed, & bash:
csvtool pastecol 2 1- \
input.csv
<(csvtool col 2 input.csv | \
sed -n '/"/,/"/{:a;N;$!ba;s/\([^"]\)\n/\1-/g;};p') | \
csvtool trim r -
Output:
name,brothers,age,adress
------------------------
john,marc-peter-paul-alex,18,street
thomas,mike,20,place
Except for the sed part, it's not that bad. csvtool replaces column 2 with an edited copy. At the end it trims an extra comma that csvtool stuck in there.

print a file content side by side bash

I have a file with below contents. I need to print each line side by side
hello
1223
man
2332
xyz
abc
Output desired:
hello 1223
man 2332
xyz abc
Is there any other alternative than paste command?

You can use this awk:
awk '{ORS = (NR%2 ? FS : RS)} 1' file
hello 1223
man 2332
xyz abc
This sets ORS (output record separator) equal to input field separator (FS) for odd numbered lines, for even numbered lines it will be set to input record separator (RS).
To get tabular data use column -t:
awk '{ORS = (NR%2 ? FS : RS)} 1' file | column -t
hello 1223
man 2332
xyz abc

awk/gawk solution:
$ gawk 'BEGIN{ OFS="\t"} { COL1=$1; getline; COL2=$1; print(COL1,COL2)}' file
hello 1223
man 2332
xyz abc
Bash solution (no paste command):
$ echo $(cat file) | while read col1 col2; do printf "%s\t%s\n" $col1 $col2; done
hello 1223
man 2332
xyz abc

Match specific column with grep command

I am having trouble matching specific column with grep command. I have a test file (test.txt) like this..
Bra001325 835 T 13 c$c$c$c$c$cccccCcc !!!!!68886676
Bra001325 836 C 8 ,,,,,.,, 68886676
Bra001325 841 A 6 ,$,.,,. BJJJJE
Bra001325 866 C 2 ,. HJ
And i want to extract all those lines which has a number 866 in the second column. When i use grep command i am getting all the lines that contains the number that number
grep "866" test.txt
Bra001325 835 T 13 c$c$c$c$c$cccccCcc !!!!!68886676
Bra001325 836 C 8 ,,,,,.,, 68886676
Bra001325 866 C 2 ,. HJ
How can i match specific column with grep command?

Try doing this :
$ awk '$2 == 866' test.txt
No need to add {print}, the default behaviour of awk is to print on a true condition.
with grep :
$ grep -P '^\S+\s+866\b' *
But awk can print filenames too & is quite more robust than grep here :
$ awk '$2 == 866{print FILENAME":"$0; nextfile}' *

In my case, the field separator is not space but comma. So I would have to add this, otherwise it won't work for me (On ubuntu 18.04.1).
awk -F ', ' '$2 == 866' test.txt

How to extract the integer or decimal at beginning of each input line, using Linux/Unix utilities?

Given input such as:
1
1a
1.1b
2.0c
How to extract the integer/decimal number at beginning of each input line, using only Linux/Unix command line utilities?

Using awk, you could say:
awk '{print $0+0}'

Awk is available in Linux, BSD, and many other Unix-like operating systems. It helps in this way:
echo "1" | awk '{a+=$0; print a}' # output 1
echo "1a" | awk '{a+=$0; print a}' # output 1
echo "1.1b" | awk '{a+=$0; print a}' # output 1.1
echo "2.0c" | awk '{a+=$0; print a}' # output 2

Some more awk
For extracting only digits
$ awk 'gsub(/[[:alpha:]].*/,x,$1) + 1' << EOF
1
1a
1.1b
2.0c
EOF
1
1
1.1
2.0
For integer
$ awk '{print int($0)}' << EOF
1
1a
1.1b
2.0c
EOF
1
1
1
2
---edit---
If there is any blank line in file, you can avoid printing zero from following
$ awk 'NF{$0+=0}1' << EOF
1
1a
1.1b
2foot4c
2
EOF
1
1
1.1
2
2

Here is a way to do this with sed:
echo "12.3abc" | sed -n 's/^\([0-9.][0-9.]*\).*/\1/p'
Output:
12.3
The block in parentheses matches all numbers or periods '.' that occur at the beginning of the line. Everything after that is match by the '.*'.
The \1 says to replace the entire line with just the portion that was matched in the parentheses.

Assuming your version of grep supports -o:
grep -o '^[0-9.]\+' data.in
NB: This will match any sequence of digits and decimal points at the start of the line.

Making horizontal String vertical shell or awk

I have a string
ABCDEFGHIJ
I would like it to print.
A
B
C
D
E
F
G
H
I
J
ie horizontal, no editing between characters to vertical. Bonus points for how to put a number next to each one with a single line. It'd be nice if this were an awk or shell script, but I am open to learning new things. :) Thanks!

If you just want to convert a string to one-char-per-line, you just need to tell awk that each input character is a separate field and that each output field should be separated by a newline and then recompile each record by assigning a field to itself:
awk -v FS= -v OFS='\n' '{$1=$1}1'
e.g.:
$ echo "ABCDEFGHIJ" | awk -v FS= -v OFS='\n' '{$1=$1}1'
A
B
C
D
E
F
G
H
I
J
and if you want field numbers next to each character, see #Kent's solution or pipe to cat -n.
The sed solution you posted is non-portable and will fail with some seds on some OSs, and it will add an undesirable blank line to the end of your sed output which will then become a trailing line number after your pipe to cat -n so it's not a good alternative. You should accept #Kent's answer.

awk one-liner:
awk 'BEGIN{FS=""}{for(i=1;i<=NF;i++)print i,$i}'
test :
kent$ echo "ABCDEF"|awk 'BEGIN{FS=""}{for(i=1;i<=NF;i++)print i,$i}'
1 A
2 B
3 C
4 D
5 E
6 F

So I figured this one out on my own with sed.
sed 's/./&\n/g' horiz.txt > vert.txt

One more awk
echo "ABCDEFGHIJ" | awk '{gsub(/./,"&\n")}1'
A
B
C
D
E
F
G
H
I
J

This might work for you (GNU sed):
sed 's/\B/\n/g' <<<ABCDEFGHIJ
for line numbers:
sed 's/\B/\n/g' <<<ABCDEFGHIJ | sed = | sed 'N;y/\n/ /'
or:
sed 's/\B/\n/g' <<<ABCDEFGHIJ | cat -n

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using AWK to add a column to a TSV (tabbed separated file) - linux

I want to add a column to the beginning of a tabbed delimited file using awk, so a line like col1 col2 col3 would end up like 345 col1 col2 col3 So far I have this awk '{FS=" "; OFS=" "; print '345' $0;}' file.tsv > output.tsv but I end up with 345col1 col2 col3 Where am I going wrong ? Thanks

you need a comma after the '345'

You can simply do: awk '{print "345\t"$0}' file.tsv > output.tsv

This might work for you (GNU sed): sed 's/^/345\t/' file

Related

Reformat item with line feeds from a CSV file

print a file content side by side bash

Match specific column with grep command

How to extract the integer or decimal at beginning of each input line, using Linux/Unix utilities?

Making horizontal String vertical shell or awk

Categories

Resources