Treat spaces as spaces after n column - linux

How to run bash column command that after n columns it treats spaces as spaces and not as a separator?
Input:
field1 field2 field3 field 4 with spaces
foo1 foo2 foo3 foo4
bar1 bar2 bar3 bar 4 with spaces
Output:
col1 col2 col3 col4
field1 field2 field3 field 4 with spaces
foo1 foo2 foo3 foo4
bar1 bar2 bar3 bar 4 with spaces
Maybe replace spaces with other char before the column command and after that replace it again with spaces? awk or sed might be the right tool for this, but I'm not too familiar with them.
Any help is appreciated! Please, don't shoot me down. This is my first question here...

Another awk that replaces first 3 spaces with a tab:
awk '{for (i=1; i<=3; ++i) sub(/ +/, "\t")} 1' file
field1 field2 field3 field 4 with spaces
foo1 foo2 foo3 foo4
bar1 bar2 bar3 bar 4 with spaces

How about this
$ cat t
field1 field2 field3 field 4 with spaces
foo1 foo2 foo3 foo4
bar1 bar2 bar3 bar 4 with spaces
$ cat t | sed -E 's/^([^ ]+) ([^ ]+) ([^ ]+) (.+)$/\1\t\2\t\3\t\4/g'
field1 field2 field3 field 4 with spaces
foo1 foo2 foo3 foo4
bar1 bar2 bar3 bar 4 with spaces
$

How about building another variable, here s4 to cause confusion:
$ awk '
BEGIN {
OFS="\t"
}
{
for(i=4;i<=NF;i++)
s4=s4 (s4==""?"":" ") $i
print $1,$2,$3,s4
s4=""
}' file
Output:
field1 field2 field3 field 4 with spaces
foo1 foo2 foo3
bar1 bar2 bar3 bar 4 with spaces
If there are multiple spaces like this, you need to set FS=" ".

Using awk and the example data in a file "spaces", while still utilising column:
awk '{ printf $1":"$2":"$3":";
for (i=4;i<=NF;i++)
{ if (i != NF) { printf $i" "
}
else { printf $i
}
}
printf "\n"
}' spaces | column -t -s":"
Use awk to separate the first four fields with ":" and then pipe through to column using ":" as a separator.

This might work for you (GNU sed):
sed 'y/ /\t/;s/\t/ /4g' file
Translate all spaces to tabs and then replace the 4th tab and thereafter with spaces.
If you prefer a kind of symmetry:
sed 's/ /\t/g;s/\t/ /4g' file

Related

Swap column x of tab-separated values file with column x of second tsv file

Let's say I have:
file1.tsv
Foo\tBar\tabc\t123
Bla\tWord\tabc\tqwer
Blub\tqwe\tasd\tqqq
file2.tsv
123\tzxcv\tAAA\tqaa
asd\t999\tBBB\tdef
qwe\t111\tCCC\tabc
And I want to overwrite column 3 of file1.tsv with column 3 of file2.tsv to end up with:
Foo\tBar\tAAA\t123
Bla\tWord\tBBB\tqwer
Blub\tqwe\tCCC\tqqq
What would be a good way to do this in bash?
Take a look at this awk:
awk 'FNR==NR{a[NR]=$3;next}{$3=a[FNR]}1' OFS='\t' file{2,1}.tsv > output.tsv
If you want to use just bash, with little more effort:
while IFS=$'\t' read -r a1 a2 _ a4; do
IFS=$'\t' read -ru3 _ _ b3 _
printf '%s\t%s\t%s\t%s\n' "$a1" "$a2" "$b3" "$a4"
done <file1.tsv 3<file2.tsv >output.tsv
Output:
Foo Bar AAA 123
Bla Word BBB qwer
Blub qwe CCC qqq
Another way to do this can be, with correction as pointed out by #PesaThe:
paste -d$'\t' <(cut -d$'\t' -f1,2 file1.tsv) <(cut -d$'\t' -f3 file2.tsv) <(cut -d$'\t' -f4 file1.tsv)
The output will be:
Foo Bar AAA 123
Bla Word BBB qwer
Blub qwe CCC qqq

Two text file comparison with grep

I have two files (a.txt, b.txt)
a.txt is a list of English words (one word in ever row)
b.txt contains in every row: a number, a space character, a 5-65 char long string
(for example b.txt can contain: 1234 dsafaaraehawada)
I would like to know which row in b.txt contains words from a.txt and how many of them?
Example input:
a.txt
green
apple
bar
b.txt
1212 greensdsdappleded
12124 dfsfsd
123 bardws
output:
2 1212 greensdsdappleded
1 123 bardws
First row contains 'green' and 'apple' (2)
Second row contains nothing.
Third row contains 'bar' (1)
Thats all I would like to know.
The code (By Mr. Barmar):
grep -F -o -f a.txt b.txt | sort | uniq -c | sort -nr
But it need to be modified.
Try something like this:
awk 'NR==FNR{A[$1]; next} {t=0; for (i in A) t+=gsub(i,"&",$2)} t{print t, $0}' file1 file2
Try something like this:
awk '
NR==FNR { list[$1]++; next }
{
cnt=0
for(word in list) {
if(index($2,word) > 0)
cnt++
}
if(cnt>0)
print cnt,$0
}' a.txt b.txt
Test:
$ cat a.txt
green
apple
bar
$ cat b.txt
1212 greensdsdappleded
12124 dfsfsd
123 bardws
$ awk '
NR==FNR { list[$1]++; next }
{
cnt=0
for(word in list) {
if(index($2,word) > 0)
cnt++
}
if(cnt>0)
print cnt,$0
}' a.txt b.txt
2 1212 greensdsdappleded
1 123 bardws

Re-ordering columns with a Perl one-liner

How do you reorganize this with one liner
foo r1.1 abc
foo r10.1 pqr
qux r2.1 lmn
bar r33.1 xpq
# In fact there could be more fields that preceeds column with "rxx.x".
Into this
r1.1 foo abc
r10.1 foo pqr
r2.1 qux lmn
r33.1 bar xpq
Basically, put second column into the first and everything else that succeeds it, after.
Assuming your text is in the file "test", this will do it:
perl -lane 'print "$F[1] $F[0] $F[2]"' test
If you have more than three columns, you will want something like:
perl -lane 'print join q( ),$F[1],$F[0],#F[2..#F-1]'
$ perl -pale '$_ = "#F[1,0,2..$#F]"' file
If it's tab-separated, a little more is needed:
$ perl -pale 'BEGIN { $"="\t"; } $_ = "#F[1,0,2..$#F]"' file
Content of 'infile':
foo r1.1 abc
foo r10.1 pqr
qux r2.1 lmn
bar r33.1 xpq
Perl one-line:
perl -pe 's/\A(\S+\s+)(\S+\s+)/$2$1/' infile
Result:
r1.1 foo abc
r10.1 foo pqr
r2.1 qux lmn
r33.1 bar xpq
The basic answers are provided by others, I considered the case of fixed width data with possible empty fields:
>cat spacedata.txt
foo r1.1 abc
foo r10.1 pqr
qux r2.1 lmn
bar r33.1 xpq
r1.2 cake
is r1.2 alie
>perl -lpwE '$_=pack "A7A5A*", (unpack "A5A7A*")[1,0,2];' spacedata.txt
r1.1 foo abc
r10.1 foo pqr
r2.1 qux lmn
r33.1 bar xpq
r1.2 cake
r1.2 is alie
file
a 5 ss
b 3 ff
c 2 zz
cat file | awk '{print $2, $1, $3}' # will print column 2,1,3
5 a ss
3 b ff
2 c zz
#or if you want to sort by column and print to new_file
cat file | sort -n -k2 | awk '{print $0}' > new_file
new_file
c 2 zz
b 3 ff
a 5 ss

How to do a numeric UNIX's sort on fields with a character attached in front of the number

I have a very large data (12G) that looks like this:
foo r1.1 abc
foo r10.1 pqr
qux r2.1 lmn
bar r33.1 xpq
What I want to do is to sort 2nd field numerically yielding (in reality there are more leading fields):
foo r1.1 abc
qux r2.1 lmn
foo r10.1 pqr
bar r33.1 xpq
I tried the following but won't work:
sort -k1 -n
What's the right way to do it?
How about sort -k1.2n if it starts with just an r
You almost had it - you need to do:
sort -k2
-k1 starts from the first character.

How to use `grep` as to choose lines with column > 1?

I get results like below from a pipeline in linux:
1 test1
1 test2
2 test3
1 test4
3 test5
1 test6
1 test7
How can I use grep to retrieve only the lines where the first column is > 1?
Don't use grep for this. Try awk instead:
<pipeline> | awk '$1>1 {print $0}'
grep -v "^1"
-v selects non-matching lines
^ is the start of a line
EDIT: As pointed out in the comments, this solution does not filter out lines starting with multi-digit numbers. Adding a space after the 1 solves the problem:
grep -v "^1 "
use the "^" char, it marks the beginning of a line
-v will not include lines starting with 1
include the extra space, so it will exclude lines like "1 asd" but not "12 asd"
grep -v "^1 "

Resources