Deleting N lines from every start point

Deleting N lines from every start point - linux

How would I delete the 6 lines starting from every instance of a word i see?

I think this sed command will do what you want:
sed '/bar/,+5d' input.txt
It removes any line containing the text bar plus the five following lines.
Run as above to see the output. When you know it is working correctly use the switch --in-place=.backup to actually perform the change.

This simple perl script will remove every line that containts word "DELETE6" and 5 consecutive lines (total 6). It also saves previous version of file in FILENAME.bak. To run the script:
perl script.pl FILE_TO_CHANGE
#!/usr/bin/perl
use strict;
use warnings;
my $remove_count = 6;
my $word = "DELETE6";
local $^I = ".bak";
my $delete_count = 0;
while (<>) {
$delete_count = $remove_count if /$word/;
print if $delete_count <= 0;
$delete_count--;
}
HTH

perl -i.bak -n -e '$n ++; $n = -5 if /foo/; print if $n > 0' data.txt

perl -ne 'print unless (/my_word/ and $n = 1) .. ++$n == 7'
Note that if my_word occurs in the skipped-over lines, the counter will not be reset.

Related

Using sed on line break element

Hello let say I have a file such as :
$OUT some text
some text
some text
$OUT
$OUT
$OUT
how can I use sed in order to replace the 3 $OUT into "replace-thing" ?
and get
$OUT some text
some text
some text
replace-thing

With sed:
sed -n '1h; 1!H; ${g; s/\$OUT\n\$OUT\n\$OUT/replace-thing/g; p;}' file
GNU sed does not require the semicolon after p.
With commentary
sed -n ' # without printing every line:
# next 2 lines read the entire file into memory
1h # line 1, store current line in the hold space
1!H # not line 1, append a newline and current line to hold space
# now do the search-and-replace on the file contents
${ # on the last line:
g # replace pattern space with contents of hold space
s/\$OUT\n\$OUT\n\$OUT/replace-thing/g # do replacement
p # and print the revised contents
}
' file
This is the main reason I only use sed for very simple things: once you start using the lesser-used commands, you need extensive commentary to understand the program.
Note the commented version does not work on the BSD-derived sed on MacOS -- the comments break it, but removing them is OK.
In plain bash:
pattern=$'$OUT\n$OUT\n$OUT' # using ANSI-C quotes
contents=$(< file)
echo "${contents//$pattern/replace-thing}"
And the perl one-liner:
perl -0777 -pe 's/\$OUT(\n\$OUT){2}/replace-thing/g' file

for this particular task, I recommend to use awk instead. (hope that's an option too)
Update: to replace all 3 $OUT use: (Thanks to #thanasisp and #glenn jackman)
cat input.txt | awk '
BEGIN {
i = 0
p = "$OUT" # Pattern to match
n = 3 # N matches
r = "replace-thing"
}
$0 == p {
++i
if(i == n){
print(r)
i = 0 #reset counter (optional)
}
}
$0 != p {
i = 0
print($0)
}'
If you just want to replace the 3th $OUT usage, use:
cat input.txt | awk '
BEGIN {
i = 0
p = "\\$OUT" # Pattern to match
n = 3 # Nth match
r = "replace-thing"
}
$0 ~ p {
++i
if(i == n){
print(r)
}
}
i <= n || $0 !~ p {
print($0)
}'

This might work for you (GNU sed):
sed -E ':a;N;s/[^\n]*/&/3;Ta;/^(\$OUT\n?){3}$/d;P;D' file
Gather up 3 lines in the pattern space and if those 3 lines each contain $OUT, delete them. Otherwise, print/delete the first line and repeat.

Limit cat using cut to values 1 or over I do n

In an attempt to debug Apache on a very busy server, we have used strace to log all our processes. Now, I have 1000s of individual straces in a folder and I need to find the ones that have a value of 1.0+ or greater. This is the command we used to generate the straces
mkdir /strace; ps auxw | grep httpd | awk '{print"-p " $2}' | xargs strace -o /strace/strace.log -ff -s4096 -r
This has generated files with the name strace.log.29382 (Where 29382 is the PID of the process).
Now, if I run this command:
for i in `ls /strace/*`; do echo $i; cat $i | cut -c6-12 | sort -rn | head -c 8; done
it will output the filename and top runtime value. i.e.
/strace/strace.log.19125
0.13908
/strace/strace.log.19126
0.07093
/strace/strace.log.19127
0.09312
What I am looking for is only to output those with a value of 1.0 or greater.
Sample data: https://pastebin.com/Se89Jt1i
This data does not contain any thing 1.0+ But its the first set of #s trying to filter against only.
What I do not want to have show up
0.169598 close(85) = 0
What I do want to find
1.202650 accept4(3, {sa_family=AF_INET, sin_port=htons(4557), sin_addr=inet_addr("xxx.xxx.xxx.xxx")}, [16], SOCK_CLOEXEC) = 85
My cat sorts the values so the highest value in the file is always first.

As I am more used to use perl, a solution with perl which should be possible to translate with awk.
One-liner
perl -ane 'BEGIN{#ARGV=</strace/*>}$max=$F[0]if$F[0]>$max;if(eof){push#A,$ARGV if$max>1;$max=0};END{print"$_\n"for#A}'
No need to sort files to get the maximum value just storing it in a variable. The part which can be interresting to modify to get information:
push#A,$ARGV
can be changed to
push#A,"$ARGV:$max"
to get the value.
How it works :
-a flag: from perl -h : autosplit mode with -n or -p (splits $_ into #F) by default delimited by one ore more spaces.
BEGIN{} and END{} blocks are executed at the beginning and the end, the part which is not in thoose blocks is executed for each line as with awk.
</strace/*> is a glob matching which gives a list of files
#ARGV is a special array which contains command line argument (here list of files to process)
eof is a function which returns true when current line is the last of current file
$ARGV is current file name
push to append elements to an array
The script version with warnings which are useful to fix bugs.
#!/usr/bin/perl
use strict;
use warnings;
sub BEGIN {
use File::Glob ();
#ARGV = glob('/strace/*');
}
my (#A,#F);
my $max = 0;
while (defined($_ = readline ARGV)) {
#F = split(' ', $_, 0);
$max = $F[0] if $F[0] > $max;
if (eof) {
push #A, "${ARGV}:$max" if $max > 1;
$max = 0;
}
}
print "$_\n" foreach (#A);

remove lines from text file that contain specific text

I'm trying to remove lines that contain 0/0 or ./. in column 71 "FORMAT.1.GT" from a tab delimited text file.
I've tried the following code but it doesn't work. What is the correct way of accomplishing this? Thank you
my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6";

You can either call a one-liner as borodin and zdim said. Which one is right for you is still not clear because you don't tell whether 71st column means the 71st tab-separated field of a line or the 71st character of that line. Consider
12345\t6789
Now what is the 2nd column? Is it the character 2 or the field 6789? Borodin's answer assumes it's 6789 while zdim assumes it's 2. Both showed a solution for either case but these solutions are stand-alone solutions. Programs of its own to be run from the commandline.
If you want to integrate that into your Perl script you could do it like this:
Replace this line:
my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6";
with this snippet:
open( my $fh_in, '<', $Variantlinestsvfile ) or die "cannot open $Variantlinestsvfile: $!\n";
open( my $fh_out, '>', $MDLtsvfile ) or die "cannot open $MDLtsvfile: $!\n";
while( my $line = <$fh_in> ) {
# character-based:
print $fh_out $line unless (substr($line, 70, 3) =~ m{(?:0/0|\./\.)});
# tab/field-based:
my #fields = split(/\s+/, $line);
print $fh_out $line unless ($fields[70] =~ m|([0.])/\1|);
}
close($fh_in);
close($fh_out);
Use either the character-based line or the tab/field-based lines. Not both!
Borodin and zdim condensed this snippet to a one-liner, but you must not call that from a Perl script.

Since you need the exact position and know string lenghts substr can find it
perl -ne 'print if not substr($_, 70, 3) =~ m{(?:0/0|\./\.)}' filename
This prints lines only when a three-character long string starting at 71st column does not match either of 0/0 and ./.
The {} delimiters around the regex allow us to use / and | inside without escaping. The ?: is there so that the () are used only for grouping, and not capturing. It will work fine also without ?: which is there only for efficiency's sake.

perl -ane 'print unless $F[70] =~ m|([0.])/\1|' myfile > newfile

The problem with your command is that you are attempting to capture the output of a command which produces no output - all the matches are redirected to a file, so that's where all the output is going.
Anyway, calling grep from Perl is just wacky. Reading the file in Perl itself is the way to go.
If you do want a single shell command,
grep -Ev $'^([^\t]*\t){70}(\./\.|0/0)\t' file
would do what you are asking more precisely and elegantly. But you can use that regex straight off in your Perl program just as well.

Try it!
awk '{ if ($71 != "./." && $71 != ".0.") print ; }' old_file.txt > new_file.txt

How to divide a data file's column A by column B using Perl

I was given a text file with a whole bunch of data sorted in columns. Each of the columns are
separated by commas.
How could I divide a column by another column to print an output answer? I am using Perl right now now so it has to be done in Perl. How could I do this?
This is what I have so far:
#!/usr/bin/perl
open (FILE, 'census2008.txt');
while (<FILE>) {
chomp;
($sumlev, $stname,$ctyname,$popestimate2008,$births2008,$deaths2008) = split(",");
}
close (FILE);
exit;

There are several options:
Read the file in line by line, split the columns on ',' and divide the relevant columns (don't forget to handle the divide-by-zero error)
Do the same thing as a one-liner:
$ perl -F/,/ -lane 'print( $F[1] == 0 ? "" : $F[3]/$F[1] )' file.txt
Utilize a ready-to-use CPAN module like Text::CSV
Of course, there are more unorthodox/crazy/unspeakable alternatives à la TMTOWTDI ™, so one could:
Parse out the relevant columns with a regex and divide the matches:
if (/^\d*,(\d+),\d*,(\d+)/) { say $2/$1 if $2 != 0; }
Do it with s///e:
$ perl -ple 's!^\d*,(\d+),\d*,(\d+).*$! $2 == 0 ? "" : $2/$1 !e' file.txt;
Get the shell to do the dirty work via backticks:
sub print_divide { say `cat file.txt | some_command_line_command` }

#!/usr/bin/env perl
# divides column 1 by column 2 of some ','-delimited file,
# read from standard input.
# usage:
# $ cat data.txt | 8458760.pl
while (<STDIN>) {
#values = split(/,/, $_);
print $values[0] / $values[1] . "\n";
}

If you have fixed width columns of data you could use 'unpack' along the lines of:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my ($sumlev,$stname,$ctyname,$popest,$births,$deaths)
= unpack("A2xA10xA15xA7xA5xA5");
printf "%-15s %4.2f\n", $ctyname, $births/$deaths;
}
__DATA__
10,Main ,My City , 10000, 200, 150
12,Poplar ,Somewhere , 3000, 90, 100
13,Maple ,Your Place , 9123, 100, 90

Linux, big text file, strip out content from line A to line B

I want to strip a chunk of lines from a big text file. I know the start and end line number. What is the most elegant way to get the content (lines between the A and B) out to some file?
I know the head and tail commands - is there even a quicker (one step) way?
The file is over 5GB and it contains over 81 mio lines.
UPDATED: The results
time sed -n 79224100,79898190p BIGFILE.log > out4.log
real 1m9.988s
time tail -n +79224100 BIGFILE.log | head -n +`expr 79898190 - 79224100` > out1.log
real 1m11.623s
time perl fileslice.pl BIGFILE.log 79224100 79898190 > out2.log
real 1m13.302s
time python fileslice.py 79224100 79898190 < BIGFILE.log > out3.log
real 1m13.277s
The winner is sed. The fastest, the shortest. I think Chuck Norris would use it.

sed -n '<A>,<B>p' input.txt

This works for me in GNU sed:
sed -n 'I,$p; Jq'
The q quits when the indicated line is processed.
for example, these large numbers work:
$ yes | sed -n '200000000,${=;p};200000005q'
200000000
y
200000001
y
200000002
y
200000003
y
200000004
y
200000005
y

I guess big files need a bigger solution...
fileslice.py:
import sys
import itertools
for line in itertools.islice(sys.stdin, int(sys.argv[1]) - 1, int(sys.argv[2])):
sys.stdout.write(line)
invocation:
python fileslice.py 79224100 79898190 < input.txt > output.txt

Here's a perl solution :)
fileslice.pl:
#!/usr/bin/perl
use strict;
use warnings;
use IO::File;
my $first = $ARGV[1];
my $last = $ARGV[2];
my $fd = IO::File->new($ARGV[0], 'r') or die "Unable to open file $ARGV[0]: $!\n";
my $i = 0;
while (<$fd>) {
$i++;
next if ($i < $first);
last if ($i > $last);
print $_;
}
Start with
perl fileslice.pl file 79224100 79898190

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Deleting N lines from every start point - linux

How would I delete the 6 lines starting from every instance of a word i see?

I think this sed command will do what you want: sed '/bar/,+5d' input.txt It removes any line containing the text bar plus the five following lines. Run as above to see the output. When you know it is working correctly use the switch --in-place=.backup to actually perform the change.

perl -i.bak -n -e '$n ++; $n = -5 if /foo/; print if $n > 0' data.txt

perl -ne 'print unless (/my_word/ and $n = 1) .. ++$n == 7' Note that if my_word occurs in the skipped-over lines, the counter will not be reset.

Related

Using sed on line break element

Limit cat using cut to values 1 or over I do n

remove lines from text file that contain specific text

How to divide a data file's column A by column B using Perl

Linux, big text file, strip out content from line A to line B

Categories

Resources