Merge sed commands into a script - linux

I need to write these two sed commands in a single script.
sed -n '10,20p' file.txt | sed '1!G;h;$!d'
I selects lines in range from 10 to 20 and prints them in a reverse order
Could anybody please help me with this?

If sed is still your most preferable tool then:
$ seq 20 | sed -n '10,15{10!G;h;15p}'
15
14
13
12
11
10
However, I don't like that line numbers (10 & 15) needs to be specified twice. sed -n '10,20p' file.txt | tac seems better...

This seems to be working
#!/bin/sed -Ef
10!G
h
20!d

Related

How To Delete First X Lines Based On Minimum Lines In File

I have a file with 10,000 lines. Using the following command, I am deleting all lines after line 10,000.
sed -i '10000,$ d' file.txt
However, now I would like to delete the first X lines so that the file has no more than 10,000 lines.
I think it would be something like this:
sed -i '1,$x d' file.txt
Where $x would be the number of lines over 10,000. I'm a little stuck on how to write the if, then part of it. Or, I was thinking I could use the original command and just cat the file in reverse?
For example, if we wanted just 3 lines from the bottom (seems simpler after a few helpful answers):
Input:
Example Line 1
Example Line 2
Example Line 3
Example Line 4
Example Line 5
Expected Output:
Example Line 3
Example Line 4
Example Line 5
Of course, if you know a more efficient way to write the command, I would be open to that too. Your positive input is highly appreciated.
tail can do exactly what you want.
tail -n 10000 file.txt
For simplicity, I would reverse the file, keep the first 10000 lines, then re-reverse the file.
It makes saving the file in-place a touch more complicated
source=file.txt
temp=$(mktemp)
tac "$source" | sed '10000 q' | tac > "$temp" && mv "$temp" "$source"
Without reversing the file, you'd count the number of lines and do some arithmetic:
sed -i "1,$(( $(wc -l < file.txt) - 10000 )) d" file.txt
$ awk -v n=3 '{a[NR%n]=$0} END{for (i=NR+1;i<=(NR+n);i++) print a[i%n]}' file
Example Line 3
Example Line 4
Example Line 5
Add -i inplace if you have GNU awk and want to do "inplace" editing.
To keep the first 10000 lines :
head -n 10000 file.txt
To keep the last 10000 lines :
tail -n 10000 file.txt
Test with your file Example
tail -n 3 file.txt
Example Line 3
Example Line 4
Example Line 5
tac file.txt | sed "$x q" | tac | sponge file.txt
The sponge command is useful here in avoiding an additional temporary file.
tail -10000 <<<"$(cat file.txt)" > file.txt
Okay, not «just» tail, but this way it`s capable of inplace truncation.

Why does Linux grep not give the correct count for line breaks?

On Ubuntu 10.04.4 LTS, I did the following small test and got a surprising result:
First, I created a file with 5 lines and name it as a.txt:
echo -e "1\n2\n3\n4\n5" > a.txt
$ cat a.txt
1
2
3
4
5
Then I run wc to count the number of lines
$ wc -l a.txt
5 a.txt
However, when I run grep to count the number of lines that have line breaks I got an answer that I did not understand:
$ grep -c -P '\n' a.txt
3
My question is: how does grep get this number? Shouldn't it be 4?
Please Read The Fine Manual!
seq 1 5 | wc -l
5
seq 1 5 | grep -ac $'\n'
5
I don't understand where is the problem!?
seq 1 5 | hd
00000000 31 0a 32 0a 33 0a 34 0a 35 0a |1.2.3.4.5.|
Explanation:
-a switch tell grep to open file in binary mode. IE don't care about text formatting.
$'\n' syntax is resolved by bash himself, before running grep. Doing this give the ability to pass control characters as arguments to any command under bash.
Grep cannot see new line character. It searches for inline pattern.
Consider using grep -c -P '$' a.txt to match the ending of each line.
The newline character is not part of lines. grep uses the newline character as the record separator, and removes it from the lines, so that patterns with $ work as expected. For example, to search for lines ending with foo you can use the pattern foo$ instead of foo\n$. That would be very inconvenient.
So grep -c -P '\n' a.txt should give you 0. If you're getting 3, that sounds extremely strange, but perhaps it can be explained the highly experimental remark in man grep:
-P, --perl-regexp
Interpret PATTERN as a Perl regular expression (PCRE, see
below). This is highly experimental and grep -P may warn of
unimplemented features.
I'm in Debian/Wheezy, which is much more recent than Ubuntu 10.04. If the -P is "highly experimental" today, it's not too difficult to imagine it was buggy in older systems. This is just a guess though.
To count the number of newlines, use wc -l, not a grep -c hack.
Btw, interestingly:
$ printf hello >> a.txt
$ wc -l a.txt
5 a.txt
$ grep -c '' a.txt
6
That is, printf doesn't print a newline, so after we append "hello" to a.txt, there won't be a newline at the end of the file. So wc -l counts newline characters, not exactly "lines", and grep '' (empty string) matches all lines.
I think you want to use
$ grep -c -P "." a.txt
5
$ echo "6" >> a.txt
$ grep -c -P "." a.txt
6
$ cat a.txt
1
2
3
4
5
6

Linux-About sorting shell output

I have output from a customised log file like this:
8 24 yum
8 24 yum
8 24 make
8 24 make
8 24 cd
8 24 cd
8 25 make
8 25 make
8 25 make
8 26 yum
8 26 yum
8 26 make
8 27 yum
8 27 install
8 28 ./linux
8 28 yum
I'd like to know if there's anyway to count the number of specific values of the third field. For example I may want to count the number of cd,yum and install only.
You can use awk to do get the third field values and wc -l to count the number.
awk '$3=="cd"||$3=="yum"||$3=="install"||$3=="cat" {print $0}' file | wc -l
You can also use egrep, but this will look for these words not only on the third field, but everywhere else in the line.
egrep "(cd|yum|install|cat)" file | wc -l
if you want to count a specific word on the third field, then you can do the above without multiple regexs.
awk '$3=="cd" {print $0}' | wc -l
A classic shell script to do the job is:
awk '{print $3}' "$file" | sort | uniq -c | sort -n
Extract values from column 3 with awk, sort the identical names together, count the repeats, sort the output in increasing order of count. The sort | uniq -c | sort -n part is a common meme.
If you're using GNU awk, you can do it all in the awk script; it might be more efficient, but for really humungous files, it can run out of memory where the pipeline doesn't (sort spills to disk when necessary; writing code to spill to disk in awk is not sensible).
Use cut, sort and uniq:
$ cut -d" " -f3 inputfile | sort | uniq -c
2 cd
1 install
1 ./linux
6 make
6 yum
For your input this
awk '{++a[$3]}END{for(i in a)print i "\t" a[i];}' file
Would print:
cd 2
install 1
./linux 1
make 6
yum 6
Using awk to count the occurrences of field three and sort to order the output:
$ awk '{a[$3]++}END{for(k in a)print a[k],k}' file | sort -n
1 install
1 ./linux
2 cd
6 make
6 yum
So filter by command:
$ awk '/cd|yum|install/{a[$3]++}END{for(k in a)print a[k],k}' file | sort -n
1 install
2 cd
6 yum
To stop partial matches such as grep in egrep use word boundaries \< and \> so the filter would be /\<cd\>|\<yum\>|\<install\>/
You can use grep to filter by multiple terms at the same time:
cut -f3 -d' ' file | grep -x -e yum -e make -e install | sort | uniq -c
Explanation:
The -x flag is to match only the lines that match exactly, as if with ^pattern$
The cut extracts the 3rd column only
We sort, uniq with count in the end for efficiency, after all junk is removed from the input
i guess u want to count the values of yum install & cd separately. if so, u shud go for 3 separate awk statements: awk '$3=="cd" {print $0}' file | wc -l
awk '$3=="yum" {print $0}' file | wc -l
awk '$3=="install" {print $0}' file | wc -l

select multiple lines using the linux command sed

I have an example [file] that I want to Grab lines 3-6 and lines 11 - 13 then sort with a one line command and save it as 3_6-11_13. These are the commands I have used thus far but I haven't gotten the desired output:
sed -n '/3/,/6/p'/11/,/13/p file_1 > file_2 | sort -k 2 > file_2 & sed -n 3,6,11,13p file_1 > file_2 | sort -k 2 file_2.
Is there a better way to shorten this. I have thought about using awk but have I stayed with sed so far.
With sed you're allowed to specify addresses by number like so:
sed -n '3,6p'
The -n is to keep sed from automatically printing output.
Then you can run multiple commands if you're using gsed by separating those commands with semicolons:
sed -n '3,6p; 11,13p' | sort -k2 > 3_6-11_13
sed combine multiple commands using -e option
$ sed -e 'comm' -e 'comm' file.txt
or you can separate commands using the semicolon
$ sed 'comm;comm;comm' file.txt

Print a file, skipping the first X lines, in Bash [duplicate]

This question already has answers here:
How can I remove the first line of a text file using bash/sed script?
(19 answers)
Closed 3 years ago.
I have a very long file which I want to print, skipping the first 1,000,000 lines, for example.
I looked into the cat man page, but I did not see any option to do this. I am looking for a command to do this or a simple Bash program.
You'll need tail. Some examples:
$ tail great-big-file.log
< Last 10 lines of great-big-file.log >
If you really need to SKIP a particular number of "first" lines, use
$ tail -n +<N+1> <filename>
< filename, excluding first N lines. >
That is, if you want to skip N lines, you start printing line N+1. Example:
$ tail -n +11 /tmp/myfile
< /tmp/myfile, starting at line 11, or skipping the first 10 lines. >
If you want to just see the last so many lines, omit the "+":
$ tail -n <N> <filename>
< last N lines of file. >
Easiest way I found to remove the first ten lines of a file:
$ sed 1,10d file.txt
In the general case where X is the number of initial lines to delete, credit to commenters and editors for this:
$ sed 1,Xd file.txt
If you have GNU tail available on your system, you can do the following:
tail -n +1000001 huge-file.log
It's the + character that does what you want. To quote from the man page:
If the first character of K (the number of bytes or lines) is a
`+', print beginning with the Kth item from the start of each file.
Thus, as noted in the comment, putting +1000001 starts printing with the first item after the first 1,000,000 lines.
If you want to skip first two line:
tail -n +3 <filename>
If you want to skip first x line:
tail -n +$((x+1)) <filename>
A less verbose version with AWK:
awk 'NR > 1e6' myfile.txt
But I would recommend using integer numbers.
Use the sed delete command with a range address. For example:
sed 1,100d file.txt # Print file.txt omitting lines 1-100.
Alternatively, if you want to only print a known range, use the print command with the -n flag:
sed -n 201,300p file.txt # Print lines 201-300 from file.txt
This solution should work reliably on all Unix systems, regardless of the presence of GNU utilities.
Use:
sed -n '1d;p'
This command will delete the first line and print the rest.
If you want to see the first 10 lines you can use sed as below:
sed -n '1,10 p' myFile.txt
Or if you want to see lines from 20 to 30 you can use:
sed -n '20,30 p' myFile.txt
Just to propose a sed alternative. :) To skip first one million lines, try |sed '1,1000000d'.
Example:
$ perl -wle 'print for (1..1_000_005)'|sed '1,1000000d'
1000001
1000002
1000003
1000004
1000005
You can do this using the head and tail commands:
head -n <num> | tail -n <lines to print>
where num is 1e6 + the number of lines you want to print.
This shell script works fine for me:
#!/bin/bash
awk -v initial_line=$1 -v end_line=$2 '{
if (NR >= initial_line && NR <= end_line)
print $0
}' $3
Used with this sample file (file.txt):
one
two
three
four
five
six
The command (it will extract from second to fourth line in the file):
edu#debian5:~$./script.sh 2 4 file.txt
Output of this command:
two
three
four
Of course, you can improve it, for example by testing that all argument values are the expected :-)
cat < File > | awk '{if(NR > 6) print $0}'
I needed to do the same and found this thread.
I tried "tail -n +, but it just printed everything.
The more +lines worked nicely on the prompt, but it turned out it behaved totally different when run in headless mode (cronjob).
I finally wrote this myself:
skip=5
FILE="/tmp/filetoprint"
tail -n$((`cat "${FILE}" | wc -l` - skip)) "${FILE}"

Resources