Selecting surrounding lines around the missing sequence numbers [closed] - linux

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I have one file inside that file it is present as given below
TEST_4002_sample11_1_20110531.TXT
TEST_4002_sample11_2_20110531.TXT
TEST_4002_sample11_4_20110531.TXT
TEST_4002_sample11_5_20110531.TXT
TEST_4002_sample11_6_20110531.TXT
TEST_4002_sample10_1_20110531.TXT
TEST_4002_sample10_2_20110531.TXT
TEST_4002_sample10_4_20110531.TXT
TEST_4002_sample10_5_20110531.TXT
I want the output if the 4th filed of that file sequence is missing, then print previous file name and next file name as output.
TEST_4002_sample11_2_20110531.TXT
TEST_4002_sample11_4_20110531.TXT
TEST_4002_sample10_2_20110531.TXT
TEST_4002_sample10_4_20110531.TXT

This awk variant seems to produce the required output:
awk -F_ '$4>c+1{print p"\n"$0}{p=$0;c=$4}'

simple perl way:
perl -F_ -lane 'print "$o\n$_" if $F[3]-$n>1;$o=$_;$n=$F[3]' < file

In Perl you could do something like this:
use strict;
use warnings;
my $prev_line;
my $prev_val;
while(<>){
# get the 4th value
my $val = (split '_')[3];
# skip if invalid line
next if !defined $val;
# print if missed sequence
if(defined($prev_val) && $val > $prev_val + 1){
print $prev_line . $_;
}
# save for next iteration
$prev_line = $_;
$prev_val = $val;
}
Save that in foo.pl and run it with something like:
cat file.txt | perl foo.pl
I'm sure it can be shortened quite a lot. Could use something like this if all lines are valid:
perl -n -e '$v=(/[^_]/g)[3];print"$l$_"if$l&&$v>$p+1;$p=$v;$l=$_' file.txt
or
perl -naF_ -e '$v=$F[3];print"$l$_"if$l&&$v>$p+1;$p=$v;$l=$_' file.txt

As far as I understand what you need, here is a Perl script that do the job:
#!/usr/local/bin/perl
use strict;
use warnings;
my $prev = '';
my %seq1;
while(<DATA>) {
chomp;
my ($seq1, $seq2) = $_ =~ /^.*?(\d+)_(\d+)_\d+\.TXT$/;
$seq1{$seq1} = $seq2 - 1 unless exists $seq1{$seq1};
if ($seq1{$seq1}+1 != $seq2) {
print $prev,"\n",$_,"\n";
}
$prev = $_;
$seq1{$seq1} = $seq2;
}
__DATA__
TEST_4002_sample11_1_20110531.TXT
TEST_4002_sample11_2_20110531.TXT
TEST_4002_sample11_4_20110531.TXT
TEST_4002_sample11_5_20110531.TXT
TEST_4002_sample11_6_20110531.TXT
TEST_4002_sample10_1_20110531.TXT
TEST_4002_sample10_2_20110531.TXT
TEST_4002_sample10_4_20110531.TXT
TEST_4002_sample10_5_20110531.TXT
output:
TEST_4002_sample11_2_20110531.TXT
TEST_4002_sample11_4_20110531.TXT
TEST_4002_sample10_2_20110531.TXT
TEST_4002_sample10_4_20110531.TXT

I used glob to get the files (it's possible that it's as simple as <TEST_*.TXT>).
use strict;
use warnings;
my %last = ( name => '', group => '', seq => 0 );
foreach my $file ( sort glob('TEST_[0-9][0-9][0-9][0-9]_sample[0-9][0-9]_[0-9]_*.TXT')
) {
my ( $group, $seq ) = $file =~ m/(\d{4,}_sample\d+)_(\d+)/;
if ( $group eq $last{group} && $seq - $last{seq} > 1 ) {
print join( "\n", $last{name}, $file, '' );
}
#last{ qw<name group seq> } = ( $file, $group, $seq );
}

Related

Need to open a file and replace multiple strings

I have a really big xml file. It has certain incrementing numbers inside, which i would like to replace with a different incrementing number. I've looked and here is what someone suggested here before. Unfortunately i cant get it to work :(
In the code below all instances of 40960 should be replaced with 41984, all instances of 40961 with 41985 etc. Nothing happens. What am i doing wrong?
use strict;
use warnings;
my $old = 40960;
my $new = 41984;
my $string;
my $file = 'file.txt';
rename($file, $file.'.bak');
open(IN, '<'.$file.'.bak') or die $!;
open(OUT, '>'.$file) or die $!;
$old++;
$new++;
for (my $i = 0; $i < 42; $i++) {
while(<IN>) {
$_ =~ s/$old/$new/g;
print OUT $_;
}
}
close(IN);
close(OUT);
Other answers give you better solutions to your problem. Mine concentrates on explaining why your code didn't work.
The core of your code is here:
$old++;
$new++;
for (my $i = 0; $i < 42; $i++) {
while(<IN>) {
$_ =~ s/$old/$new/g;
print OUT $_;
}
}
You increment the values of $old and $new outside of your loops. And you never change those values again. So you're only making the same substitution (changing 40961 to 41985) 42 times. You never try to change any other numbers.
Also, look at the while loop that reads from IN. On your first iteration (when $i is 0) you read all of the data from IN and the file pointer is left at the end of the file. So when you go into the while loop again on your second iteration (and all subsequent iterations) you read no data at all from the file. You need to reset the file pointer to the start of your file at the end of each iteration.
Oh, and the basic logic is wrong. If you think about it, you'll end up writing each line to the output file 42 times. You need to do all possible substitutions before writing the line. So your inner loop needs to be the outer loop (and vice versa).
Putting those suggestions together, you need something like this:
my $old = 40960;
my $change = 1024;
while (<IN>) {
# Easier way to write your loop
for my $i ( 1 .. 42 ) {
my $new = $old + $change;
# Use \b to mark word boundaries
s/\b$old\b/$new/g;
$old++;
}
# Print each output line only once
print OUT $_;
}
Here's an example that works line by line, so the size of file is immaterial. The example assumes you want to replace things like "45678", but not "fred45678". The example also assumes that there is a range of numbers, and you want them replaced with a new range offset by a constant.
#!/usr/bin/perl
use strict;
use warnings;
use constant MIN => 40000;
use constant MAX => 90000;
use constant DIFF => +1024;
sub repl { $_[0] >= MIN && $_[0] <= MAX ? $_[0] + DIFF : $_[0] }
while (<>) {
s/\b(\d+)\b/repl($1)/eg;
print;
}
exit(0);
Invoked with the file you want to transform as an argument, it produces altered output on stdout. With the following input ...
foo bar 123
40000 50000 60000 99999
fred60000
fred 60000 fred
... it produces this output.
foo bar 123
41024 51024 61024 99999
fred60000
fred 61024 fred
There are a couple of classic Perlisms here, but the example shouldn't be hard to follow if you RTFM appropriately.
Here is an alternative way which reads the input file into a string and does all the substitutions at once:
use strict;
use warnings;
{
my $old = 40960;
my $new = 41984;
my ($regexp) = map { qr/$_/ } join '|', map { $old + $_ } 0..41;
my $file = 'file.txt';
rename($file, $file.'.bak');
open(IN, '<'.$file.'.bak') or die $!;
my $str = do {local $/; <IN>};
close IN;
$str =~ s/($regexp)/do_subst($1, $old, $new)/ge;
open(OUT, '>'.$file) or die $!;
print OUT $str;
close OUT;
}
sub do_subst {
my ( $old, $old_base, $new_base ) = #_;
my $i = $old - $old_base;
my $new = $new_base + $i;
return $new;
}
Note: Can probably be made more efficient by using Regexp::Assemble

Perl - count words of a file

i want to count words in a file and want result the number of same word
my script
#!/usr/bin/perl
#use strict;
#use warnings;
use POSIX qw(strftime);
$datestring = strftime "%Y-%m-%d", localtime;
print $datestring;
my #files = <'/mnt/SESSIONS$datestring*'>;
my $latest;
foreach my $file (#files) {
$latest = $file if $file gt $latest;
}
#temp_arr=split('/',$latest);
open(FILE,"<$latest");
print "file loaded \n";
my #lines=<FILE>;
close(FILE);
#my #temp_line;
foreach my $line(#lines) {
#line=split(' ',$line);
#push(#temp_arr);
$line =~ s/\bNT AUTHORITY\\SYSTEM\b/NT__AUTHORITY\\SYSTEM/ig;
print $line;
#print "$line[0] $line[1] $line[2] $line[3] $line[4] $line[5] \n";
}
My log file
SID USER TERMINAL PROGRAM
---------- ------------------------- --------------- -------------------------
1 SYSTEM titi toto (fifi)
2 SYSTEM titi toto (fofo)
4 SYSTEM titi toto (bobo)
5 NT_AUTHORITY\SYSTEM titi roro
6 NT_AUTHORITY\SYSTEM titi gaga
7 SYSTEM titi gogo (fifi)
5 rows selected.
I want result :
User = 3 SYSTEM with program toto
, User = 1 SYSTEM with program gogo
Thanks for any information
I see yours as a two-step problem -- you want to parse the log files, but then you also want to store elements of that data into a data structure that you can use to count.
This is a guess, based on your sample data, but if your data is fixed-width, one way you can parse that into the fields is to use unpack. I think substr might more efficient, so consider how many files you need to parse and how long each is.
I would store the data into a hash and then dereference it after the files have all been read.
my %counts;
open my $IN, '<', 'logfile.txt' or die;
while (<$IN>) {
next if length ($_) < 51;
my ($sid, $user, $terminal, $program) = unpack 'A9 #11 A25 #37 A15 #53 A25', $_;
next if $sid eq '---------'; # you need some way to filter out bogus or header rows
$program =~ s/\(.+//; # based on your example, turn toto (fifi) into toto
$counts{$user}{$program}++;
}
close $IN;
while (my ($user, $ref) = each %counts) {
while (my ($program, $count) = each %$ref) {
print "User = $count $user with program $program\n";
}
}
Output from program:
User = 3 SYSTEM with program toto
User = 1 SYSTEM with program gogo
User = 1 NT_AUTHORITY\SYSTEM with program roro
User = 1 NT_AUTHORITY\SYSTEM with program gaga
This code detect automatically the size of input fields (your snippet seems an output from Oracle query) and print the results:
#!/usr/bin/perl
use strict;
use warnings;
use v5.10;
open my $file, '<', 'input.log' or die "$?";
my $data = {};
my #cols_size = ();
while (<$file>) {
my $line = $_;
if ( $line =~ /--/) {
foreach (split(/\s/, $line)) {
push(#cols_size, length($_) +1);
}
next;
}
next unless (#cols_size);
next if ($line =~ /rows selected/);
my ($sid, $user, $terminal, $program) = unpack('A' . join('A', #cols_size), $line);
next unless ($sid);
$program =~ s/\(\w+\)//;
$data->{$user}->{$program}++;
}
close $file;
foreach my $user (keys %{$data}) {
foreach my $program (keys %{$data->{$user}}) {
say sprintf("User = %s %s with program %s", $data->{$user}->{$program}, $user, $program);
}
}
i don't understand $counts{$user}{$program}++;

Parsing the output of SGE's (qstat -j "*") using AWK or Perl

I have the output from SGE qstat command that looks like this:
http://dpaste.com/1177012/plain/
It is obtained with the following command:
$ qstat -j "*"
What I want to do is to parse the output of qstat -j "*"
into table format:
#job_number submission_time owner usage
526715 Sat Apr 13 18:43:19 2013 yminakuc cpu=33:04:05:52, mem=2471753193.24440 GBs, io=619.41401, vmem=864.175G, maxvmem=920.232G
....
I was thinking to to create a code that can be used as pipe:
$ qstat -j "*" | ./mycodeparse.pl
What's the way to do it in AWK or Perl?
Or is there any available unix tool for that?
I'm stuck with the following construct (logic)
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my %hash;
my $curr;
while ( my $line = <> ) {
chomp $line;
if ( $line == /^=/ ) {
$hash{$line}=$curr = [];
}
elsif ( $line =~ /^job_number/ ||
$line =~ /^owner/ ||
$line =~ /^usage/ ||
$line =~ /^submission_time/)) {
push #$curr,$line;
}
}
print Dumper \%hash ;
# Hash print everything instead of selected lines.
That format is fairly close to YAML, so one option would be close the gap:
perl -lne 'BEGIN { print "---" } if (/^=/) { $new = 1; next } if ($new) { s/^/- /; $new = 0 } else { s/^/ / } print' paste > paste.yml
And then load it normally:
#! /usr/bin/env perl
use common::sense;
use YAML 'LoadFile';
die "usage: $0 <file.yml>\n" unless #ARGV == 1;
my %jobs = map { $_->{job_number}, $_ } #{LoadFile(shift)};
say "#job_number submission_time owner usage";
for (keys %jobs) {
say join("\t", $_, #{$jobs{$_}}{"submission_time", "owner", "usage 1"})
}
As the ugliness of that "usage 1" suggests, you might want to massage the keys as well. The keys also vary as to names_with_underlines vs. 'names with spaces'. Of course, you can key the %jobs hash on whatever value you want, or else skip building it and just process the arrayref:
for (#{LoadFile(shift)}) {
say join("\t", #{$_}{"job_number", "submission_time", "owner", "usage 1"})
}
Output:
#job_number submission_time owner usage
5276175 Sat Apr 13 18:43:19 2013 yminakuc cpu=33:04:05:52, mem=2471753193.24440 GBs, io=619.41401, vmem=864.175G, maxvmem=920.232G
606837 Fri Dec 14 19:20:55 2012 ataiba
6252671 Wed May 8 23:08:22 2013 harukao cpu=9:13:06:40, mem=13115128.89679 GBs, io=19.38717, vmem=16.202G, maxvmem=19.131G
Regarding your edited-in attempt: the basic idea is very sound, but you make a few mistakes (trivial mistake: == instead of =~ when matching /^=/; more significant mistake: keying your hash off the line of ='s, which is the same for every record, with the result that you only end up dumping the last record) and you miss a few tricks: f.e. you store unprocessed record lines instead of breaking them up into keys and values.
This alteration demonstrates: pushing hashes onto an array, and splitting the lines for only some of the keys:
if ( $line =~ /^=/ ) {
push #array, $curr = {};
}
elsif ( $line =~ / ^ (job_number
| owner
| usage
| submission_time)
.*?: \s* (.+)/x) {
$curr->{$1} = $2
}
Excerpt of the dumped output:
{
'usage' => 'cpu=33:04:05:52, mem=2471753193.24440 GBs, io=619.41401, vmem=864.175G, maxvmem=920.232G',
'owner' => 'yminakuc',
'job_number' => '5276175',
'submission_time' => 'Sat Apr 13 18:43:19 2013'
},
With a small change you can store $curr into %hash with whatever key after you exhaust your input or run into another ==== line.

Reading specified line using Perl program and command-line arguments

So, let's say I am writing a Perl program:
./program.perl 10000 < file
I want it to read the 10000th line of "file" only. How could I do it using input redirection in this form? It seems that I keep getting something along the lines of 10000 is not a file.
I thought this would work:
#!/usr/bin/perl -w
$line_num = 0;
while ( defined ($line = <>) && $line_num < $ARGV[0]) {
++$line_no;
if ($line_no == $ARGV[0]) {
print "$line\n";
exit 0;
}
}
But it failed miserably.
If there are command-line arguments, then <> opens the so-named files and reads from them, and if not, then it takes from standard-input. (See "I/O Operators" in the perlop man-page.)
If, as in your case, you want to read from standard-input whether or not there are command-line arguments, then you need to use <STDIN> instead:
while ( defined ($line = <STDIN>) && $line_num < $ARGV[0]) {
Obligatory one-liner:
perl -ne 'print if $. == 10000; exit if $. > 10000'
$. counts lines read from stdin. -n implicitly wraps program in:
while (<>) {
...program...
}
You could use Tie::File
use Tie::File;
my ($name, $num) = #ARGV;
tie my #file, 'Tie::File', $name or die $!;
print $file[$num];
untie #file;
Usage:
perl script.pl file.csv 10000
You could also do this very simply using awk:
awk 'NR==10000' < file
or sed:
sed -n '10000p' file

How can I parse people's full names into user names in Perl?

I need to convert a name in the format Parisi, Kenneth into the format kparisi.
Does anyone know how to do this in Perl?
Here is some sample data that is abnormal:
Zelleb, Charles F.,,IV
Eilt, John,, IV
Wods, Charles R.,,III
Welkt, Craig P.,,Jr.
These specific names should end up as czelleb, jeilt, cwoods, cwelkt, etc.
I have one more condition that is ruining my name builder
O'Neil, Paulso far, Vinko Vrsalovic's answer is working the best when weird/corrupt names are in the mix, but this example above would come out as "pneil"... id be damned below judas if i cant get that o between the p and the n
vinko#parrot:~$ cat genlogname.pl
use strict;
use warnings;
my #list;
push #list, "Zelleb, Charles F.,,IV";
push #list, "Eilt, John,, IV";
push #list, "Woods, Charles R.,,III";
push #list, "Welkt, Craig P.,,Jr.";
for my $name (#list) {
print gen_logname($name)."\n";
}
sub gen_logname {
my $n = shift;
#Filter out unneeded characters
$n =~ s/['-]//g;
#This regex will grab the lastname a comma, optionally a space (the
#optional space is my addition) and the first char of the name,
#which seems to satisfy your condition
$n =~ m/(\w+), ?(.)/;
return lc($2.$1);
}
vinko#parrot:~$ perl genlogname.pl
czelleb
jeilt
cwoods
cwelkt
I would start by filtering the abnormal data so you only have regular names. Then something like this should do the trick
$t = "Parisi, Kenneth";
$t =~ s/(.+),\s*(.).*/\l$2\l$1/;
Try:
$name =~ s/(\w+),\s(\w)/$2$1/;
$name = lc $name;
\w here matches an alphanumerical character. If you want to be more specific, you could also use [a-z] instead, and pass the i flag (case insensitive):
$name =~ s/([a-z]+)\s([a-z])/$2$1/i;
Here's a one line solution, assuming you store all the names in a file called "names" (one per line) and you will do duplicated name detection somehow later.
cat names | perl -e 'while(<>) {/^\s*(\S*)?,\s*(\S)/; print lc "$2$1\n";}' | sed s/\'//g
It looks like your input data is comma-separated. To me, the clearest way to do this would be split into components, and then generate the login names from that:
while (<>) {
chomp;
my ($last, $first) = split /,/, lc $_;
$last =~ s/[^a-z]//g; # strip out nonletters
$first =~ s/[^a-z]//g; # strip out nonletters
my $logname = substr($first, 0, 1) . $last;
print $logname, "\n";
}
$rowfetch =~ s/['-]//g; #All chars inside the [ ] will be filtered out.
$rowfetch =~ m/(\w+), ?(.)/;
$rowfetch = lc($2.$1);
this is how I ended up using Vinko Vrsalovic's solution... its inside a while loop that goes through a sql query result ... thanks again vinko
This should do what you need
use strict;
use warnings;
use 5.010;
while ( <DATA> ) {
say abbreviate($_);
}
sub abbreviate {
for ( #_ ) {
s/[-']+//g;
tr/A-Z/a-z/;
tr/a-z/ /c;
return "$2$1" if /([a-z]+)\s+([a-z])/;
}
}
__DATA__
Zelleb, Charles F.,,IV
Eilt, John,, IV
Woods, Charles R.,,III
Welkt, Craig P.,,Jr.
O'Neil, Paul
output
czelleb
jeilt
cwoods
cwelkt
poneil

Resources