Change format of text file - linux

I have a file with many lines of tab separated data in the following format:
1 1 2 2
3 3 4 4
5 5 6 6
...
and I would like to change the format to:
1 1
2 2
3 3
4 4
5 5
6 6
Is there a not too complicated way to do this? I don't have any experience with using awk, sed, etc.
Thanks

If you just want to group your file in blocks of X columns, you can make use of xargs -nX:
$ xargs -n2 < file
1 1
2 2
3 3
4 4
5 5
6 6
To have more control and print an empty line after 4th field, you can also use this awk:
$ awk 'BEGIN{FS=OFS="\t"} {for (i=1;i<=NF;i++) printf "%s%s", $i, (i%2?OFS:RS); print ""}' file
1 1
2 2
3 3
4 4
5 5
6 6
# <-- note there is an empty line here
Explanation
On odd fields, it print FS after it.
On even fields, print RS.
Note FS stands for field separator, which defaults to space, and RS stands for record separator, which defaults to new line. As you have tab as field separator, we redefine it in the BEGIN block.

This is probably the simplest way which allows for customisation
awk '{print $1,$2"\n"$3,$4}' file
For a line between
awk '{print $1,$2"\n"$3,$4"\n"}' file
although fedorquis answer with xargs is probably the simplest if this isn't needed
As Ed pointed out this wouldn't work if there were blanks in the fields, this could be resolved using
awk 'BEGIN{FS=OFS="\t"} {print $1,$2 ORS $3,$4 ORS}' file

Through perl,
perl -pe 's/\t(\d\t\d)$/\n$1\n/g' file
Fed the above command's output to the sed command to delete the last blank line.
perl -pe 's/\t(\d\t\d)$/\n$1\n/g' file | sed '$d'

Related

How to remove lines based on another file? [duplicate]

This question already has answers here:
How to delete rows from a csv file based on a list values from another file?
(3 answers)
Closed 2 years ago.
Now I have two files as follows:
$ cat file1.txt
john 12 65 0
Nico 3 5 1
king 9 5 2
lee 9 15 0
$ cat file2.txt
Nico
king
Now I would like to remove each line which contains a name fron the second file in its first column.
Ideal result:
john 12 65 0
lee 9 15 0
Could anyone tell me how to do that? I have tried the code like this:
for i in 'less file2.txt'; do sed "/$i/d" file1.txt; done
But it does not work properly.
You don't need to iterate it, you just need to use grep with-v option to invert match and -w to force pattern to match only WHOLE words
grep -wvf file2.txt file1.txt
This job suites awk:
awk 'NR == FNR {a[$1]; next} !($1 in a)' file2.txt file1.txt
john 12 65 0
lee 9 15 0
Details:
NR == FNR { # While processing the first file
a[$1] # store the first field in an array a
next # move to next line
}
!($1 in a) # while processing the second file
# if first field doesn't exist in array a then print

Adding new line to file with sed

I want to add a new line to the top of a data file with sed, and write something to that line.
I tried this as suggested in How to add a blank line before the first line in a text file with awk :
sed '1i\
\' ./filename.txt
but it printed a backslash at the beginning of the first line of the file instead of creating a new line. The terminal also throws an error if I try to put it all on the same line ("1i\": extra characters after \ at the end of i command).
Input :
1 2 3 4
1 2 3 4
1 2 3 4
Expected output
14
1 2 3 4
1 2 3 4
1 2 3 4
$ sed '1i\14' file
14
1 2 3 4
1 2 3 4
1 2 3 4
but just use awk for clarity, simplicity, extensibility, robustness, portability, and every other desirable attribute of software:
$ awk 'NR==1{print "14"} {print}' file
14
1 2 3 4
1 2 3 4
1 2 3 4
Basially you are concatenating two files. A file containing one line and the original file. By it's name this is a task for cat:
cat - file <<< 'new line'
# or
echo 'new line' | cat - file
while - stands for stdin.
You can also use cat together with command substitution if your shell supports this:
cat <(echo 'new line') file
Btw, with sed it should be simply:
sed '1i\new line' file

Move Last Four Lines To Second Row In Text File

I need to move the last 4 lines of a text file and move them to the second row in the text file.
I'm assuming that tail and sed are used but, I haven't much luck so far.
Here is a head and tail solution. Let us start with the same sample file as Glenn Jackman:
$ seq 10 >file
Apply these commands:
$ head -n1 file ; tail -n4 file; tail -n+2 file | head -n-4
1
7
8
9
10
2
3
4
5
6
Explanation:
head -n1 file
Print first line
tail -n4 file
Print last four lines
tail -n+2 file | head -n-4
Print the lines starting with line 2 and ending before the fourth-to-last line.
If I'm assuming correctly, ed can handle your task:
seq 10 > file
ed file <<'COMMANDS'
$-3,$m1
w
q
COMMANDS
cat file
1
7
8
9
10
2
3
4
5
6
lines 7,8,9,10 have been moved to the 2nd line
$-3,$m1 means, for the range of lines from "$-3" (3 lines before the last line) to "$" (the last line, move them ("m") below the first line ("1")
Note that the heredoc has been quoted so the shell does not try to interpret the strings $- and $m1 as variables
If you don't want to actually modify the file, but instead print to stdout:
ed -s file <<'COMMANDS'
$-3,$m1
%p
Q
COMMANDS
Here is an awk solution:
seq 10 > file
awk '{a[NR]=$0} END {for (i=1;i<=NR-4;i++) if (i==2) {for (j=NR-3;j<=NR;j++) print a[j];print a[i]} else print a[i]}' file
1
7
8
9
10
2
3
4
5
6

Cannot get this simple sed command

This sed command is described as follows
Delete the cars that are $10,000 or more. Pipe the output of the sort into a sed to do this, by quitting as soon as we match a regular expression representing 5 (or more) digits at the end of a record (DO NOT use repetition for this):
So far the command is:
$ grep -iv chevy cars | sort -nk 5
I have to add another pipe at the end of that command I think which "quits as soon as we match a regular expression representing 5 or more digits at the end of a record"
I tried things like
$ grep -iv chevy cars | sort -nk 5 | sed "/[0-9][0-9][0-9][0-9][0-9]/ q"
and other variations within the // but nothing works! What is the command which matches a regular expression representing 5 or more digits and quits according to this question?
Nominally, you should add a $ before the second / to match 5 digits at the end of the record. If you omit the $, then any sequence of 5 digits will cause sed to quit, so if there is another number (a VIN, perhaps) before the price, it might match when you didn't intend it to.
grep -iv chevy cars | sort -nk 5 | sed '/[0-9][0-9][0-9][0-9][0-9]$/q'
On the whole, it's safer to use single quotes around the regex, unless you need to substitute a shell variable into it (or unless the regex contains single quotes itself). You can also specify the repetition:
grep -iv chevy cars | sort -nk 5 | sed '/[0-9]\{5,\}$/q'
The \{5,\} part matches 5 or more digits. If for any reason that doesn't work, you might find you're using GNU sed and you need to do something like sed --posix to get it working in the normal mode. Or you might be able to just remove the backslashes. There certainly are options to GNU sed to change the regex mechanism it uses (as there are with GNU grep too).
Another way.
As you don't post a file sample, a did it as a guess.
Here I'm looking for lines with the word "chevy" where the field 5 is less than 10000.
awk '/chevy/ {if ( $5 < 10000 ) print $0} ' cars
I forgot the flag -i from grep ... so the correct is:
awk 'BEGIN{IGNORECASE=1} /chevy/ {if ( $5 < 10000 ) print $0} ' cars
$ cat > cars
Chevy 2 3 4 10000
Chevy 2 3 4 5000
chEvy 2 3 4 1000
CHEVY 2 3 4 10000
CHEVY 2 3 4 2000
Prevy 2 3 4 1000
Prevy 2 3 4 10000
$ awk 'BEGIN{IGNORECASE=1} /chevy/ {if ( $5 < 10000 ) print $0} ' cars
Chevy 2 3 4 5000
chEvy 2 3 4 1000
CHEVY 2 3 4 2000
grep -iv chevy cars | sort -nk 5 | sed '/[0-9][0-9][0-9][0-9][0-9]$/d'

select the second line to last line of a file

How can I select the lines from the second line to the line before the last line of a file by using head and tail in unix?
For example if my file has 15 lines I want to select lines from 2 to 14.
tail -n +2 /path/to/file | head -n -1
perl -ne 'print if($.!=1 and !(eof))' your_file
tested below:
> cat temp
1
2
3
4
5
6
7
> perl -ne 'print if($.!=1 and !(eof))' temp
2
3
4
5
6
>
alternatively in awk you can use below:
awk '{a[count++]=$0}END{for(i=1;i<count-1;i++) print a[i]}' your_file
To print all lines but first and last ones you can use this awk as well:
awk 'NR==1 {next} {if (f) print f; f=$0}'
This always prints the previous line. To prevent the first one from being printed, we skip the line when NR is 1. Then, the last one won't be printed because when reading it we are printing the penultimate!
Test
$ seq 10 | awk 'NR==1 {next} {if (f) print f; f=$0}'
2
3
4
5
6
7
8
9

Resources