How to use paste command for different lengths of columns

How to use paste command for different lengths of columns - linux

I have:
file1.txt file2.txt file3.txt
8 2 2
1 2 1
8 1 0
3 3
5 3
3
4
I want to paste all these three columns in ofile.txt
I tried with
paste file1.txt file2.txt file3.txt > ofile.txt
Result I got in ofile.txt:
ofile.txt:
8 2 2
1 2 1
8 1 0
3 3
5 3
3
4
Which should come
ofile.txt
8 2 2
1 2 1
8 1 0
3 3
5 3
3
4

You can try this paste command in bash using process substitution:
paste <(sed 's/^[[:blank:]]*//' file1.txt) file2.txt file3.txt
8 2 2
1 2 1
8 8 0
3 3
5 3
3
4
sed command is used to remove leading whitespace from file1.txt.

I can reproduce your output when I make inputfiles with tabs.
paste also uses tabs betwen the columns and does this how he thinks it should.
You see the results when I replace the tabs with -:
# more x* | tr '\t' '-'
::::::::::::::
x1
::::::::::::::
-1a
-1b
-1c
-1d
::::::::::::::
x2
::::::::::::::
-2a
-2b
::::::::::::::
x3
::::::::::::::
-3a
-3b
-3c
-3d
-3e
-3f
-3g
# paste x? | tr '\t' '-'
-1a--2a--3a
-1b--2b--3b
-1c---3c
-1d---3d
---3e
---3f
---3g
Think how you want it. When you want correct indents, you need to append lines with tab for files with less lines. Or manipulate the result: 3 tabs into 4 and 4 tabs at the beginning of the line to 5 tabs.
sed -e 's/\t\t\t/\t\t\t\t/' -e 's/^\t\t\t\t/\t\t\t\t\t/'

Related

bash cut columns to one file and save onto the end of another file

I would like to cut two columns from one file and stick them on the end of a second file. The two file have the exact same number of lines
file1.txt
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
file2.txt
a b c d e f g h i j
a b c d e f g h i j
a b c d e f g h i j
a b c d e f g h i j
So far I have been using
cut -f9-10 file2.txt | paste file1.txt - > file3.txt
which outputs exactly what I want
1 2 3 4 5 6 7 8 9 10 i j
1 2 3 4 5 6 7 8 9 10 i j
1 2 3 4 5 6 7 8 9 10 i j
However I don't want to have to make a new file I would prefer to alter file 1 to the above. I've tried
cut -f9-10 file2.txt | paste file1.txt -
but it simply prints everything on screen. Is there a way of just adding columns 9 and 10 to the end of file1.txt?

Use sponge from moreutils! It allows you to soak up standard input and write to a file. That is, to replace a file in-place after a pipe.
cut -f9-10 file2.txt | paste file1.txt - | sponge file1.txt
Note you can also do what you are doing by using paste with a process substitution.
$ paste -d' ' file1.txt <(awk '{print $(NF-1), $NF}' file2.txt) | sponge file1.txt
$ cat file1.txt
1 2 3 4 5 6 7 8 9 10 i j
1 2 3 4 5 6 7 8 9 10 i j
1 2 3 4 5 6 7 8 9 10 i j
This joins file1.txt with two last columns from file2.txt using ' ' as delimiter.

Extract n-th line from file in bash loop

I would like to extract n-th line from file and save it to a new file. For example I have index.txt :
cat index.txt
1 AAAGCGT
2 ACGAAGT
3 ACCTTGT
4 ATAATGT
5 AGGGTGT
6 AGCCAGT
7 AGTTCGT
8 AATGCAG
9 AAAGCGT
10 ACGAAGT
and output should be
cat index.1.txt:
1 AAAGCGT
2 ACGAAGT
cat index.2.txt:
3 ACCTTGT
4 ATAATGT
cat index.3.txt:
5 AGGGTGT
6 AGCCAGT
And so on.. So I would like to extract form input file first 2 rows in cycle and save to new file.

It doesn't give you exactly the names you want, but:
split -l 2 index.txt index.
seems like the easiest solution. It will create files with names beginning with the final argument, so will get names like 'index.aa' and 'index.bb'

This will work for any number of grouped lines just by changing the 2 to a 3 or whatever number you like:
$ awk 'NR%2==1{++i} {print > ("index." i ".txt")}' index.txt
$ ls index.?.txt
index.1.txt index.2.txt index.3.txt index.4.txt index.5.txt
$ tail index.?.txt
==> index.1.txt <==
1 AAAGCGT
2 ACGAAGT
==> index.2.txt <==
3 ACCTTGT
4 ATAATGT
==> index.3.txt <==
5 AGGGTGT
6 AGCCAGT
==> index.4.txt <==
7 AGTTCGT
8 AATGCAG
==> index.5.txt <==
9 AAAGCGT
10 ACGAAGT

awk '{print >"index."(x+=NR%2)".txt"}' file
This increments x every two lines starting from 1 and then prints the line into a file with that name
cat index.1.txt:
1 AAAGCGT
2 ACGAAGT
cat index.2.txt:
3 ACCTTGT
4 ATAATGT
cat index.3.txt:
5 AGGGTGT
6 AGCCAGT
In some awks, extra parens may be required as shown below (As commented by Ed Morton)
awk '{print >("index."(x+=NR%2)".txt")}' file

I would say:
awk '{file=int((NR+1)/2)".txt"; print > file}' file
int((NR+1)/2 maps every line number:
1 --> 1
2 --> 1
3 --> 2
x --> (x+1) / 2
So you get these files:
$ cat 1.txt
1 AAAGCGT
2 ACGAAGT
or
$ cat 3.txt
5 AGGGTGT
6 AGCCAGT

missing number from two squence

How do I findout missing number from two sequence using bash script
from example I have file which contain following data
1 1
1 2
1 3
1 5
2 1
2 3
2 5
output : missing numbers are
1 4
2 2
2 4

This awk one-liner gives the requested output for the specified input:
$ awk '$2!=l2+1&&$1==l1{for(i=l2+1;i<$2;i++)print l1,i}{l1=$1;l2=$2}' file
1 4
2 2
2 4

a solution using grep:
printf "%s\n" {1..2}" "{1..5} | grep -vf file

How can I separate some repeated patterns in a row into multiple rows using bash script?

I have some problem with bash script.
I've got a string which has some repeated patterns like this.
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 ...
Each fields is separated by tab key.
I want it to look like this...
1 2 3 4
1 2 3 4
1 2 3 4
…
How can I solve this problem using bash script like cut, sed, awk ... ?
I've tried some command like cut -f 'seq 4, 4, 40' example.txt
It doesn't work...
It looks very easy but so difficult to me...

You can use sed like this:
s='1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4'
p='1 2 3 4'
echo "$s"|sed "s/$p\s*/&\n/g"
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
Live Demo: http://ideone.com/P59OCJ

Here's a pure bash solution:
IFS=$'\t' set -- $(<input_file)
seen=()
while [[ $1 ]]; do
if (( ${seen[$1]} )); then # If we've seen the value before, start a new line.
echo
unset seen
fi
printf '%s ' "$1"
seen[$1]=1
shift
done

If you know the ending number of your sequence beforehand, you can do something like:
LAST_NUMBER=4
sed -e "s/$LAST_NUMBER\t*/&\n/g" < example.txt
Just replace 4 with the last number from the sequence
If you don't know the number, you have to search through it using the following:
#!/bin/bash
declare -A CHECKED_NUMBERS
LAST_NUMBER=
while read LINE; do
SPLIT_LINE=$(cut -d" " -f1- <<< "$LINE")
for number in $SPLIT_LINE; do
if [ "${CHECKED_NUMBERS[$number]}" == "1" ]; then
LAST_NUMBER=$number
else
CHECKED_NUMBERS[$number]=1
fi
done
done < example.txt
# do the replacement
sed -e "s/$LAST_NUMBER\t*/&\n/g" < example.txt

An awk version
awk '{for (i=1;i<=NF;i++) {printf "%s"(i%4?" ":"\n"),$i}}' file
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
An gnu awk version
awk -v RS="\t" '{printf "%s"(NR%4?" ":"\n"),$0}' file
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4

xargs may help:
kent$ echo "1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4"|xargs -n4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4

This might work for you:
printf "%s\t%s\t%s\t%s\n" $string
or you want the fields space separated:
printf "%s %s %s %s\n" $string

Records filtering

I have this kind of file file-1:
1 1 1.1552422143268792
1 2 1.1552422143268792
1 3 1.1552422143268792
1 4 1.1552422143268792
2 1 2.1906014042706916
2 2 2.1906014042706916
2 3 2.1906014042706916
2 4 2.1906014042706916
2 1 4.1906014042706916
2 2 4.1906014042706916
2 3 4.1906014042706916
2 4 4.1906014042706916
3 1 3.1876823799523781
3 2 3.1876823799523781
3 3 3.1876823799523781
3 4 3.1876823799523781
4 1 0.6213184222668061
4 2 0.6213184222668061
4 3 0.6213184222668061
4 4 0.6213184222668061
and I have antoher file too file-2
1
2
4
I would like to filter those records from file-1, in which the values of the first colum are the same as in file-2, so I would like to get this output
1 1 1.1552422143268792
1 2 1.1552422143268792
1 3 1.1552422143268792
1 4 1.1552422143268792
2 1 2.1906014042706916
2 2 2.1906014042706916
2 3 2.1906014042706916
2 4 2.1906014042706916
2 1 4.1906014042706916
2 2 4.1906014042706916
2 3 4.1906014042706916
2 4 4.1906014042706916
4 1 0.6213184222668061
4 2 0.6213184222668061
4 3 0.6213184222668061
4 4 0.6213184222668061
Can anybody help a little?

awk 'NR==FNR{f2[$1];next}$1 in f2' file-2 file-1

Very simple using join:
join file-1 file-2
The files must be sorted for join to work. The sort is based on text, not numeric values, so you may need to sort into a temp file first. Something like:
sort file-2 > sorted.tmp
sort file-1 | join - sorted.tmp

You can use the -f option in grep to read patterns from a file. But first you must change the patterns so that they match the first field only. You can do this by using sed to add a ^ to the beginning and a space to the end of each pattern in file-2, and using process substitution in your command.
The complete command is:
grep -f <(sed -e "s/^/^/g" -e "s/$/ /g" file-2) file-1

This might work for you:
sed 's/.*/\/^& \/p/' file-2 | sed -nf - file-1

Here is another way to do in awk:
awk 'NR==FNR{a[$1];next} !($1 in a){next}1' file-2 file-1

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to use paste command for different lengths of columns - linux

You can try this paste command in bash using process substitution: paste <(sed 's/^[[:blank:]]*//' file1.txt) file2.txt file3.txt 8 2 2 1 2 1 8 8 0 3 3 5 3 3 4 sed command is used to remove leading whitespace from file1.txt.

Related

bash cut columns to one file and save onto the end of another file

Extract n-th line from file in bash loop

missing number from two squence

How can I separate some repeated patterns in a row into multiple rows using bash script?

Records filtering

Categories

Resources