one-liner: print all lines except the last 3? - linux

I would like to simulate GNU's head -n -3, which prints all lines except the last 3, because head on FreeBSD doesn't have this feature. So I am thinking of something like
seq 1 10 | perl -ne ...
Here I have used 10 lines, but it can be any number larger than 3.
Can it be done in Perl or some other way on FreeBSD in BASH?
A super primitive solution would be
seq 1 10 | sed '$d' | sed '$d' | sed '$d'

seq 1 10 | perl -e '#x=("")x3;while(<>){print shift #x;push #x,$_}'
or
perl -e '#x=("")x3;while(<>){print shift #x;push #x,$_}' file
or
command | perl -pe 'BEGIN{#x=("")x3}push #x,$_;$_=shift #x'
perl -pe 'BEGIN{#x=("")x3}push #x,$_;$_=shift #x' file

seq 1 10 | perl -ne 'push #l, $_; print shift #l if #l > 3'

Pure bash and simple tools (wc and cut):
head -n $(($(wc -l file | cut -c-8)-3)) file
Disclaimer - I don't have access to FreeBSD right now, but this does work on OSX bash.

This works with a pipe as well as an input file:
seq 1 10 | perl -e'#x=<>;print#x[0..$#x-3]'

Nobody seems to have use sed and tac, so here's one:
$ seq 10 | tac | sed '1,3d' | tac
1
2
3
4
5
6
7

how about :
seq 1 10 | perl -ne 'print if ( !eof )' | perl -ne 'print if ( !eof )' | perl -ne 'print if ( !eof )'

This awk one-liner seems to do the job:
awk '{a[NR%4]=$0}NR>3{print a[(NR-3)%4]}' file

Or do it with bash alone if you have version 4.0 or newer:
seq 1 10 | (readarray -t LINES; printf '%s\n' "${LINES[#]:(-3)}")
Update: This one would remove the last three lines instead of showing only them.
seq 1 10 | (readarray -t L; C=${#L[#]}; printf '%s\n' "${L[#]:0:(C > 3 ? C - 3 : 0)}")
For convenience it could be placed on a function:
function exclude_last_three {
local L C
readarray -t L; C=${#L[#]}
printf '%s\n' "${L[#]:0:(C > 3 ? C - 3 : 0)}"
}
seq 1 10 | exclude_last_three
seq 11 20 | exclude_last_three

Here's a late answer, because I was running into something like this yesterday.
This solution is:
pure bash
one-liner
reads the input stream only once
reads the input stream line-by-line, not all at once
Tested on Ubuntu, Redhat and OSX.
$ seq 1 10 | { n=3; i=1; while IFS= read -r ln; do [ $i -gt $n ] && cat <<< "${buf[$((i%n))]}"; buf[$((i%n))]="$ln"; ((i++)); done; }
1
2
3
4
5
6
7
$
It works by reading lines into a circular buffer implemented as an n-element array.
n is the number of lines to cut off the end of the file.
For every line i we read, we can echo the line i-n from the circular buffer, then store the line i in the circular buffer. Nothing is echoed until the first n lines are read. (i mod n) is the index into the array which implements the circular buffer.
Because the requirement is for a one-liner, I tried to make it fairly brief, unfortunately at the expense of readability.

Another Awk solution that only uses minimal amount of buffers and prints lines quickly without needing to read all the lines first. It can also be used with pipes and large files.
awk 'BEGIN{X = 3; for(i = 0; i < X; ++i)getline a[i]}{i %= X; print a[i]; a[i++] = $0}'

Related

How to grab every n,n+1 th lines from a text file

I have a huge text file where i need to grab every n, n+1 th line in a separate file.
I came across sed command but that could help me grab every n th line only from a text file.
For example:
**
$ sed -n '0~4p' somefile**
**```**
Any suggestions on grabbing n+1th line at the same time?
**
$ sed -n '0~4p' somefile**
```
This might work for you (GNU sed):
sed -n '4~4,+1p' file
This will print the 4th plus one lines starting at line 4.
Here's one possible solution (guessing that this is the required output):
$ seq 20 | sed -n '0~4{N;p}'
4
5
8
9
12
13
16
17
As per the manual for the N command:
Add a newline to the pattern space, then append the next line of input
to the pattern space. If there is no more input then sed exits without
processing any more commands. When -z is used, a zero byte (the ascii
‘NUL’ character) is added between the lines (instead of a new line).
If 20 should also be part of the output for the above input, you can use:
seq 20 | sed -n '0~4{p;n;p}'
# alternative solution if ~ address range isn't supported
seq 20 | sed -n '1b; n;n;p;n;p'
The ~ option to sed is non-standard, but might be made to do what you want. But it seems more natural to use awk:
awk 'NR>1 && NR % n < 2' n=5
(eg):
$ yes | nl | sed 20q | awk 'NR>1 && NR % n < 2' n=5
5 y
6 y
10 y
11 y
15 y
16 y
20 y

Print second last line from variable in bash

VAR="1\n2\n3"
I'm trying to print out the second last line. One liner in bash!
I've gotten so far: printf -- "$VAR" | head -2
It however prints out too much.
I can do this with a file no problem: tail -2 ~/file | head -1
You almost done this task by yourself. Try
VAR="1\n2\n3"; printf -- "$VAR"|tail -2|head -1
Here is one pure bash way of doing this:
readarray -t arr < <(printf -- "$VAR") && echo "${arr[-2]}"
2
You may also use this awk as a single command:
VAR="1\n2\n3"
awk -F '\\\\n' '{print $(NF-1)}' <<< "$VAR"
2
maybe more efficient using a temporary variable and using expansions
var=$'1\n2\n3' ; tmpvar=${var%$'\n'*} ; echo "${tmpvar##*$'\n'}"
Use echo -e for backslash interpretation and to translate \n to newlines and print the interested line number using NR.
$ echo -e "${VAR}" | awk 'NR==2'
2
With multiple lines and do, tail and head can be used to print any particular line number.
$ echo -e "$VAR" | tail -2 | head -1
2
or do a fancy sed, where you keep the previous line in the buffer-space (x) to print and keep deleting until the last line,
$ echo -e "$VAR" | sed 'x;$!d'
2

Easy way of selecting certain lines from a file in a certain order

I have a text file, with many lines. I also have a selected number of lines I want to print out, in certain order. Let's say, for example, "5, 3, 10, 6". In this order.
Is there some easy and "canonical" way of doing this? (with "standard" Linux tools, and bash)
When I tried the answers from this question
Bash tool to get nth line from a file
it always prints the lines in order they are in the file.
A one liner using sed:
for i in 5 3 10 6 ; do sed -n "${i}p" < ff; done
A rather efficient method if your file is not too large is to read it all in memory, in an array, one line per field using mapfile (this is a Bash ≥4 builtin):
mapfile -t array < file.txt
Then you can echo all the lines you want in any order, e.g.,
printf '%s\n' "${array[4]}" "${array[2]}" "${array[9]}" "${array[5]}"
to print the lines 5, 3, 10, 6. Now you'll feel it's a bit awkward that the array fields start with a 0 so that you have to offset your numbers. This can be easily cured with the -O option of mapfile:
mapfile -t -O 1 array < file.txt
this will start assigning to array at index 1, so that you can print your lines 5, 3, 10 and 6 as:
printf '%s\n' "${array[5]}" "${array[3]}" "${array[10]}" "${array[6]}"
Finally, you want to make a wrapper function for this:
printlines() {
local i
for i; do printf '%s\n' "${array[i]}"; done
}
so that you can just state:
printlines 5 3 10 6
And it's all pure Bash, no external tools!
As #glennjackmann suggests in the comments you can make the helper function also take care of reading the file (passed as argument):
printlinesof() {
# $1 is filename
# $2,... are the lines to print
local i array
mapfile -t -O 1 array < "$1" || return 1
shift
for i; do printf '%s\n' "${array[i]}"; done
}
Then you can use it as:
printlinesof file.txt 5 3 10 6
And if you also want to handle stdin:
printlinesof() {
# $1 is filename or - for stdin
# $2,... are the lines to print
local i array file=$1
[[ $file = - ]] && file=/dev/stdin
mapfile -t -O 1 array < "$file" || return 1
shift
for i; do printf '%s\n' "${array[i]}"; done
}
so that
printf '%s\n' {a..z} | printlinesof - 5 3 10 6
will also work.
Here is one way using awk:
awk -v s='5,3,10,6' 'BEGIN{split(s, a, ","); for (i=1; i<=length(a); i++) b[a[i]]=i}
b[NR]{data[NR]=$0} END{for (i=1; i<=length(a); i++) print data[a[i]]}' file
Testing:
cat file
Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
Line 11
Line 12
awk -v s='5,3,10,6' 'BEGIN{split(s, a, ","); for (i=1; i<=length(a); i++) b[a[i]]=i}
b[NR]{data[NR]=$0} END{for (i=1; i<=length(a); i++) print data[a[i]]}' file
Line 5
Line 3
Line 10
Line 6
First, generate a sed expression that would print the lines with a number at the beginning that you can later use to sort the output:
#!/bin/bash
lines=(5 3 10 6)
sed=''
i=0
for line in "${lines[#]}" ; do
sed+="${line}s/^/$((i++)) /p;"
done
for i in {a..z} ; do echo $i ; done \
| sed -n "$sed" \
| sort -n \
| cut -d' ' -f2-
I's probably use Perl, though:
for c in {a..z} ; do echo $c ; done \
| perl -e 'undef #lines{#ARGV};
while (<STDIN>) {
$lines{$.} = $_ if exists $lines{$.};
}
print #lines{#ARGV};
' 5 3 10 6
You can also use Perl instead of hacking with sed in the first solution:
for c in {a..z} ; do echo $c ; done \
| perl -e ' %lines = map { $ARGV[$_], ++$i } 0 .. $#ARGV;
while (<STDIN>) {
print "$lines{$.} $_" if exists $lines{$.};
}
' 5 3 10 6 | sort -n | cut -d' ' -f2-
l=(5 3 10 6)
printf "%s\n" {a..z} |
sed -n "$(printf "%d{=;p};" "${l[#]}")" |
paste - - | {
while IFS=$'\t' read -r nr text; do
line[nr]=$text
done
for n in "${l[#]}"; do
echo "${line[n]}"
done
}
You can use the nl trick: number the lines in the input and join the output with the list of actual line numbers. Additional sorts are needed to make the join possible as it needs sorted input (so the nl trick is used once more the number the expected lines):
#! /bin/bash
LINES=(5 3 10 6)
lines=$( IFS=$'\n' ; echo "${LINES[*]}" | nl )
for c in {a..z} ; do
echo $c
done | nl \
| grep -E '^\s*('"$( IFS='|' ; echo "${LINES[*]}")"')\s' \
| join -12 -21 <(echo "$lines" | sort -k2n) - \
| sort -k2n \
| cut -d' ' -f3-

Write the contents of the variable to a file

How can I save the contents of the variable sum in this operation?
$ seq 1 5 | awk '{sum+=$1} end {print sum; echo "$sum" > test_file}'
It looks like you're confusing BASH syntax and Awk. Awk is a programming language, and it has very different syntax from BASH.
$ seq 1 5 | awk '{ sum += $1 } END { print sum }'
15
You want to capture that 15 into a file:
$ seq 1 5 | awk '{ sum += $1 } END { print sum }' > test_file
That is using the shell's redirection. The > appears outside of the Awk program where the shell has control, and redirects standard out into the file test_file.
You can also redirect inside of Awk, but this is Awk's redirection. However, it uses the same syntax as BASH:
$ seq 1 5 | awk '{ sum += $1 } END { print sum > "test_file" }'
Note that the file name has to be quoted, or Awk will assume that test_file is a variable, and you'll get some error about redirecting to a null file name.
To write your output into a file, you have to redirect to "test_file" like this:
$ seq 5 | awk '{sum+=$1} END{print sum > "test_file"}'
$ cat test_file
15
Your version was not working because you were not quoting test_file, so for awk it was considered a variable. And as you have not defined it beforehand, awk couldn't redirect properly. David W's answer explains it pretty well.
Note also that seq 5 is equivalent to seq 1 5.
In case you want to save the result into a variable, you can use the var=$(command) syntax:
$ sum=$(seq 5 | awk '{sum+=$1} END{print sum}')
$ echo $sum
15
echo won't work in the awk command. Try this:
seq 1 5 | awk '{sum+=$1} END {print sum > "test_file"}
You don't need awk for this. You can say:
$ seq 5 | paste -sd+ | bc > test_file
$ cat test_file
15
This question is tagged with bash so here is a pure bash solution:
for ((i=1; i<=5; i++)); do ((sum+=i)); done; echo "$sum" > 'test_file'
Or this one:
for i in {1..5}; do ((sum+=i)); done; echo "$sum" > 'test_file'
http://sed.sourceforge.net/grabbag/scripts/add_decs.sed
#! /bin/sed -f
# This is an alternative approach to summing numbers,
# which works a digit at a time and hence has unlimited
# precision. This time it is done with lookup tables,
# and uses only 10 commands.
G
s/\n/-/
s/$/-/
s/$/;9aaaaaaaaa98aaaaaaaa87aaaaaaa76aaaaaa65aaaaa54aaaa43aaa32aa21a100/
:loop
/^--[^a]/!{
# Convert next digit from both terms into analog form
# and put the two groups next to each other
s/^\([0-9a]*\)\([0-9]\)-\([^-]*\)-\(.*;.*\2\(a*\)\2.*\)/\1-\3-\5\4/
s/^\([^-]*\)-\([0-9a]*\)\([0-9]\)-\(.*;.*\3\(a*\)\3.*\)/\1-\2-\5\4/
# Back to decimal, but keeping the carry in analog form
# \2 matches an `a' if there are at least ten a's, else nothing
#
# 1------------- 3- 4----------------------
# 2 5----
s/-\(aaaaaaaaa\(a\)\)\{0,1\}\(a*\)\([0-9b]*;.*\([0-9]\)\3\5\)/-\2\5\4/
b loop
}
s/^--\([^;]*\);.*/\1/
h

Take nth column in a text file

I have a text file:
1 Q0 1657 1 19.6117 Exp
1 Q0 1410 2 18.8302 Exp
2 Q0 3078 1 18.6695 Exp
2 Q0 2434 2 14.0508 Exp
2 Q0 3129 3 13.5495 Exp
I want to take the 2nd and 4th word of every line like this:
1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495
I'm using this code:
nol=$(cat "/path/of/my/text" | wc -l)
x=1
while [ $x -le "$nol" ]
do
line=($(sed -n "$x"p /path/of/my/text)
echo ""${line[1]}" "${line[3]}"" >> out.txt
x=$(( $x + 1 ))
done
It works, but it is very complicated and takes a long time to process long text files.
Is there a simpler way to do this?
iirc :
cat filename.txt | awk '{ print $2 $4 }'
or, as mentioned in the comments :
awk '{ print $2 $4 }' filename.txt
You can use the cut command:
cut -d' ' -f3,5 < datafile.txt
prints
1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495
the
-d' ' - mean, use space as a delimiter
-f3,5 - take and print 3rd and 5th column
The cut is much faster for large files as a pure shell solution. If your file is delimited with multiple whitespaces, you can remove them first, like:
sed 's/[\t ][\t ]*/ /g' < datafile.txt | cut -d' ' -f3,5
where the (gnu) sed will replace any tab or space characters with a single space.
For a variant - here is a perl solution too:
perl -lanE 'say "$F[2] $F[4]"' < datafile.txt
For the sake of completeness:
while read -r _ _ one _ two _; do
echo "$one $two"
done < file.txt
Instead of _ an arbitrary variable (such as junk) can be used as well. The point is just to extract the columns.
Demo:
$ while read -r _ _ one _ two _; do echo "$one $two"; done < /tmp/file.txt
1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495
One more simple variant -
$ while read line
do
set $line # assigns words in line to positional parameters
echo "$3 $5"
done < file
If your file contains n lines, then your script has to read the file n times; so if you double the length of the file, you quadruple the amount of work your script does — and almost all of that work is simply thrown away, since all you want to do is loop over the lines in order.
Instead, the best way to loop over the lines of a file is to use a while loop, with the condition-command being the read builtin:
while IFS= read -r line ; do
# $line is a single line of the file, as a single string
: ... commands that use $line ...
done < input_file.txt
In your case, since you want to split the line into an array, and the read builtin actually has special support for populating an array variable, which is what you want, you can write:
while read -r -a line ; do
echo ""${line[1]}" "${line[3]}"" >> out.txt
done < /path/of/my/text
or better yet:
while read -r -a line ; do
echo "${line[1]} ${line[3]}"
done < /path/of/my/text > out.txt
However, for what you're doing you can just use the cut utility:
cut -d' ' -f2,4 < /path/of/my/text > out.txt
(or awk, as Tom van der Woerdt suggests, or perl, or even sed).
If you are using structured data, this has the added benefit of not invoking an extra shell process to run tr and/or cut or something. ...
(Of course, you will want to guard against bad inputs with conditionals and sane alternatives.)
...
while read line ;
do
lineCols=( $line ) ;
echo "${lineCols[0]}"
echo "${lineCols[1]}"
done < $myFQFileToRead ;
...

Resources