Perl - count words of a file - linux

i want to count words in a file and want result the number of same word
my script
#!/usr/bin/perl
#use strict;
#use warnings;
use POSIX qw(strftime);
$datestring = strftime "%Y-%m-%d", localtime;
print $datestring;
my #files = <'/mnt/SESSIONS$datestring*'>;
my $latest;
foreach my $file (#files) {
$latest = $file if $file gt $latest;
}
#temp_arr=split('/',$latest);
open(FILE,"<$latest");
print "file loaded \n";
my #lines=<FILE>;
close(FILE);
#my #temp_line;
foreach my $line(#lines) {
#line=split(' ',$line);
#push(#temp_arr);
$line =~ s/\bNT AUTHORITY\\SYSTEM\b/NT__AUTHORITY\\SYSTEM/ig;
print $line;
#print "$line[0] $line[1] $line[2] $line[3] $line[4] $line[5] \n";
}
My log file
SID USER TERMINAL PROGRAM
---------- ------------------------- --------------- -------------------------
1 SYSTEM titi toto (fifi)
2 SYSTEM titi toto (fofo)
4 SYSTEM titi toto (bobo)
5 NT_AUTHORITY\SYSTEM titi roro
6 NT_AUTHORITY\SYSTEM titi gaga
7 SYSTEM titi gogo (fifi)
5 rows selected.
I want result :
User = 3 SYSTEM with program toto
, User = 1 SYSTEM with program gogo
Thanks for any information

I see yours as a two-step problem -- you want to parse the log files, but then you also want to store elements of that data into a data structure that you can use to count.
This is a guess, based on your sample data, but if your data is fixed-width, one way you can parse that into the fields is to use unpack. I think substr might more efficient, so consider how many files you need to parse and how long each is.
I would store the data into a hash and then dereference it after the files have all been read.
my %counts;
open my $IN, '<', 'logfile.txt' or die;
while (<$IN>) {
next if length ($_) < 51;
my ($sid, $user, $terminal, $program) = unpack 'A9 #11 A25 #37 A15 #53 A25', $_;
next if $sid eq '---------'; # you need some way to filter out bogus or header rows
$program =~ s/\(.+//; # based on your example, turn toto (fifi) into toto
$counts{$user}{$program}++;
}
close $IN;
while (my ($user, $ref) = each %counts) {
while (my ($program, $count) = each %$ref) {
print "User = $count $user with program $program\n";
}
}
Output from program:
User = 3 SYSTEM with program toto
User = 1 SYSTEM with program gogo
User = 1 NT_AUTHORITY\SYSTEM with program roro
User = 1 NT_AUTHORITY\SYSTEM with program gaga

This code detect automatically the size of input fields (your snippet seems an output from Oracle query) and print the results:
#!/usr/bin/perl
use strict;
use warnings;
use v5.10;
open my $file, '<', 'input.log' or die "$?";
my $data = {};
my #cols_size = ();
while (<$file>) {
my $line = $_;
if ( $line =~ /--/) {
foreach (split(/\s/, $line)) {
push(#cols_size, length($_) +1);
}
next;
}
next unless (#cols_size);
next if ($line =~ /rows selected/);
my ($sid, $user, $terminal, $program) = unpack('A' . join('A', #cols_size), $line);
next unless ($sid);
$program =~ s/\(\w+\)//;
$data->{$user}->{$program}++;
}
close $file;
foreach my $user (keys %{$data}) {
foreach my $program (keys %{$data->{$user}}) {
say sprintf("User = %s %s with program %s", $data->{$user}->{$program}, $user, $program);
}
}

i don't understand $counts{$user}{$program}++;

Related

Need to open a file and replace multiple strings

I have a really big xml file. It has certain incrementing numbers inside, which i would like to replace with a different incrementing number. I've looked and here is what someone suggested here before. Unfortunately i cant get it to work :(
In the code below all instances of 40960 should be replaced with 41984, all instances of 40961 with 41985 etc. Nothing happens. What am i doing wrong?
use strict;
use warnings;
my $old = 40960;
my $new = 41984;
my $string;
my $file = 'file.txt';
rename($file, $file.'.bak');
open(IN, '<'.$file.'.bak') or die $!;
open(OUT, '>'.$file) or die $!;
$old++;
$new++;
for (my $i = 0; $i < 42; $i++) {
while(<IN>) {
$_ =~ s/$old/$new/g;
print OUT $_;
}
}
close(IN);
close(OUT);
Other answers give you better solutions to your problem. Mine concentrates on explaining why your code didn't work.
The core of your code is here:
$old++;
$new++;
for (my $i = 0; $i < 42; $i++) {
while(<IN>) {
$_ =~ s/$old/$new/g;
print OUT $_;
}
}
You increment the values of $old and $new outside of your loops. And you never change those values again. So you're only making the same substitution (changing 40961 to 41985) 42 times. You never try to change any other numbers.
Also, look at the while loop that reads from IN. On your first iteration (when $i is 0) you read all of the data from IN and the file pointer is left at the end of the file. So when you go into the while loop again on your second iteration (and all subsequent iterations) you read no data at all from the file. You need to reset the file pointer to the start of your file at the end of each iteration.
Oh, and the basic logic is wrong. If you think about it, you'll end up writing each line to the output file 42 times. You need to do all possible substitutions before writing the line. So your inner loop needs to be the outer loop (and vice versa).
Putting those suggestions together, you need something like this:
my $old = 40960;
my $change = 1024;
while (<IN>) {
# Easier way to write your loop
for my $i ( 1 .. 42 ) {
my $new = $old + $change;
# Use \b to mark word boundaries
s/\b$old\b/$new/g;
$old++;
}
# Print each output line only once
print OUT $_;
}
Here's an example that works line by line, so the size of file is immaterial. The example assumes you want to replace things like "45678", but not "fred45678". The example also assumes that there is a range of numbers, and you want them replaced with a new range offset by a constant.
#!/usr/bin/perl
use strict;
use warnings;
use constant MIN => 40000;
use constant MAX => 90000;
use constant DIFF => +1024;
sub repl { $_[0] >= MIN && $_[0] <= MAX ? $_[0] + DIFF : $_[0] }
while (<>) {
s/\b(\d+)\b/repl($1)/eg;
print;
}
exit(0);
Invoked with the file you want to transform as an argument, it produces altered output on stdout. With the following input ...
foo bar 123
40000 50000 60000 99999
fred60000
fred 60000 fred
... it produces this output.
foo bar 123
41024 51024 61024 99999
fred60000
fred 61024 fred
There are a couple of classic Perlisms here, but the example shouldn't be hard to follow if you RTFM appropriately.
Here is an alternative way which reads the input file into a string and does all the substitutions at once:
use strict;
use warnings;
{
my $old = 40960;
my $new = 41984;
my ($regexp) = map { qr/$_/ } join '|', map { $old + $_ } 0..41;
my $file = 'file.txt';
rename($file, $file.'.bak');
open(IN, '<'.$file.'.bak') or die $!;
my $str = do {local $/; <IN>};
close IN;
$str =~ s/($regexp)/do_subst($1, $old, $new)/ge;
open(OUT, '>'.$file) or die $!;
print OUT $str;
close OUT;
}
sub do_subst {
my ( $old, $old_base, $new_base ) = #_;
my $i = $old - $old_base;
my $new = $new_base + $i;
return $new;
}
Note: Can probably be made more efficient by using Regexp::Assemble

Perl, using a linux command on users stored in a hash

Here is the code:
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
my #temparray;
my $count = 0;
my #lastarray;
my $lastbash;
#Opens the file /etc/shadow and puts the users with an uid over 1000 but less that 65000 into an array.
open( my $passwd, "<", "/etc/passwd") or die "/etc/passwd failed to open.\n";
while (my $lines = <$passwd>) {
my #splitarray = split(/\:/, $lines );
if( $splitarray[2] >= 1000 && $splitarray[2] < 65000) {
$temparray[$count] =$splitarray[0];
print "$temparray[$count]\n";
$count++;
}
}
close $passwd;
foreach (#temparray) {
$lastbash = qx(last $temparray);
print "$lastbash\n";
}
What I want to do is use the built in linux command "last" on all the users stored in the #temparray. And i want the output to be like this:
user1:10
user2:22
Where 22 and 10 being the number of times they logged in. How can I achieve this ?
I have tried several different ways but I always end up with errors.
The following should perform the task as requested:
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
my #temparray;
my $count = 0;
my #lastarray;
my $lastbash;
#Opens the file /etc/shadow and puts the users with an uid over 1000 but less that 65000 into an array.
open( my $passwd, "<", "/etc/passwd") or die "/etc/passwd failed to open.\n";
while (my $lines = <$passwd>) {
my #splitarray = split(/\:/, $lines );
if( $splitarray[2] >= 1000 && $splitarray[2] < 65000) {
$temparray[$count] =$splitarray[0];
print "$temparray[$count]\n";
$count++;
}
}
close $passwd;
foreach (#temparray) {
my #lastbash = qx(last $_); #<----Note the lines read in go to the $_ variable. Note use of my. You also read the text into array.
print $_.":".#lastbash."\n"; #<----Note the formatting. Reading #lastbash returns the number of elements.
}
You don't really need the $count, you could just do push #temparray, $splitarray[0].
That said, I'm not sure why you need #temparray either... You can just run the command against the users as you find them.
my $passwd = '/etc/passwd';
open( my $fh, '<', $passwd )
or die "Could not open file '$passwd' : $!";
my %counts;
# Get `last` counts and store them %counts
while ( my $line = <$fh> ) {
my ( $user, $uid ) = ( split( /:/, $line ) )[ 0, 2 ];
if ( $uid >= 1000 && $uid < 65000 ) {
my $last = () = qx{last $user};
$counts{$user} = $last
}
}
close $fh;
# Sort %counts keys by value (in descending order)
for my $user ( sort { $counts{$b} <=> $counts{$a} } keys %counts ) {
printf "%s:\t %3d\n", $user, $counts{$user};
}

Comparing and displaying hash values

I am able to print all of the lines from /etc/passwd by UID and username.
I would like to compare the values of UID and display corresponding usernames by <150 and >150.
this is my while loop and count
while(<PASSWD>){
chomp;
my #f = split /:/;
sort #f;
#{$passwd{$f[3]}}=#f;
print #f[3 , 0], "\n";
}
my $count = keys(%passwd);
print $count, "\n";
sort #f does nothing - sort returns the list sorted, but does not change it in place. If you added use warnings; to your programme, Perl would tell you.
This is how I would do it:
#!/usr/bin/perl
use warnings;
use strict;
open my $PASSWD, '<', '/etc/passwd' or die $!;
my %passwd;
while (<$PASSWD>) {
chomp;
my #f = split /:/;
#{ $passwd{ $f[3] } } = #f;
}
my $reported = 0;
for my $k (sort { $a <=> $b } keys %passwd) {
if ($k > 150 and not $reported) {
$reported = print "Over 150\n";
}
print "$k\n";
}
You can also grep the keys for the small ones:
my #under150 = grep $_ < 150, keys %passwd;
print $_->[0], "\n" for #passwd{ #under150 };

My array is showing empty after I insert huge data into it in perl

#!/usr/bin/perl -w
################################################################################
##Get_Duration.pl
#
# This is a perl script which is used to parse the audio files
# present in the device and build's the xml containing all the
# track i.e both audio and video files duration
#
# The xml file is created in the name of ParsedMetadataInformation.xml
# in <ATAF Path>/tmp/ directory.
#
#
# CHANGE HISTORY
# --------------------------------------------------------------------------
use strict;
use warnings;
use Env;
use File::Find;
use XML::TreePP;
use Data::Dumper;
my $data;
if (not defined $ATAF){
print "=====================================================\n";
print "ERROR: ATAF Path is not set.\n";
print "(Example: export ATAF=/home/roopa/ATAF)\n";
print "=====================================================\n";
exit 1;
}
print "Enter the Absolute path for the device to be scanned\n";
print "(Example: /media/RACE_1.6A)\n";
$DB::single=1;
my #metadataInfo = ();
print "Enter Path:";
my $configDir = <STDIN>;
chomp $configDir;
my #configFiles;
find( sub {push #configFiles, "$File::Find::name$/" if (/\.mp3|\.wma|\.wav|\.ogg| \.flac| \.m4a|\.mp4|\.avi|\.mpg|\.mpeg|\.mov|\.wmv|\.m4b$/i)}, $configDir);
chomp #configFiles;
if (!#configFiles){
print "=====================================================\n";
print "ERROR: No Files Found!!!\n";
print "=====================================================\n";
exit -1;
}
my $tpp = XML::TreePP->new();
my $metadataHashTree1 = ();
print "=====================================================\n";
print "Extracting the Metadata Information\n";
print "=====================================================\n";
foreach my $file (#configFiles){
print "Currently in: $file\n";
(my $fileName = $file) =~ s/^.*\///g;
$file =~ s/([\!\$\^\*\&\(\)\|\}\{\[\]\:\"\;\'\?\>\<\,\=\`\s])/\\$1/g;
#metadataInfo = (`ffmpeg -i $fileName`);
my $size= scalar (#metadataInfo);
#chomp #metadataInfo;
foreach my $eachfile (#metadataInfo){
if ($eachfile =~ m/^Duration: /i){
$eachfile =~ m/Duration:(.*?),/;
$data= $1;
$metadataHashTree1->{$fileName}->{'Duration'}=$data;
}
}
}
print "=====================================================\n";
print "Building XML tree\n";
print "=====================================================\n\n";
my $xml = $tpp->write($metadataHashTree1);
sleep 5;
print "=====================================================================\n";
print "Writing the XML tree in <ATAF Path>/tmp/ParsedMetadataInformation.xml\n";
print "=====================================================================\n\n";
open (FILEHANDLE, ">$ATAF/tmp/ParsedDurationInformation.xml") or die "ERROR: $!\n";
print FILEHANDLE $xml;
close FILEHANDLE;
sleep 5;
print "=====================================================\n";
print "Successfully Completed!!!\n";
print "=====================================================\n\n";
########################################################################################
In the above program I am trying to get the duration of a track using ffmpeg command and saving the output in #metadataInfo. But the array size shows 0 if I try to print using the command
$size= scalar (#metadataInfo);
"$File::Find::name$/"
should be
$File::Find::name
Appending $/ makes no sense.
You don't convert the file name to a shell literal.
`ffmpeg -i $fileName`
should be
use String::ShellQuote qw( shell_quote );
my $cmd = shell_quote('ffmpeg', '-i', $fileName);
`$cmd`
This will handle problems such as a spaces in the file name.
You don't check if the backticks succeeded. What's the value of $?? And if that's -1, what's the value of $!?

unix - breakdown of how many records in file with number of character occurrences

Is there an inbuilt command to do this or has anyone had any luck with a script that does it?
I am looking to get counts of how many records (as defined by a specific EOL such as "^%!") had how many occurrences of a specfic character. (sorted descending by the number of occurrences)
For example, with this sample file:
jdk,|ljn^%!dk,|sn,|fgc^%!
ydfsvuyx^%!67ds5,|bvujhy,|s6d75
djh,|sudh^%!nhjf,|^%!fdiu^%!
Suggested input: delimiter EOL and filename as arguments.
bash/perl some_script_name ",|" "^%!" samplefile
Desired output:
occs count
3 1
2 1
1 2
0 2
This is because the 1st record had one delimiter, 2nd record had 2, 3rd record had 0, 4th record had 3, 5th record had 1, 6th record had 0.
Bonus pts if you can make the delimiter and EOL argument accept hex input (ie 2C7C) or normal character input (ie ,|) .
Script:
#!/usr/bin/perl
use strict;
$/ = $ARGV[1];
open my $fh, '<', $ARGV[2] or die $!;
my #records = <$fh> and close $fh;
$/ = $ARGV[0];
my %counts;
$counts{(split $_)-1}++ for #records;
delete $counts{-1};
print "$_\t$counts{$_}\n" for (reverse sort keys %counts);
Test:
perl script.pl ',|' '^%!' samplefile
Output:
3 1
2 1
1 2
0 2
This is what perl lives for:
#!perl -w
use 5.12.0;
my ($delim, $eol, $file) = #ARGV;
open my $fh, "<$file" or die "error opening $file $!";
$/ = $eol; # input record separator
my %counts;
while (<$fh>) {
my $matches = () = $_ =~ /(\Q$delim\E)/g; # "goatse" operator
$counts{$matches}++;
}
say "occs\tcount";
foreach my $num (reverse sort keys %counts) {
say "$num\t$counts{$num}";
}
(if you haven't got 5.12, remove the "use 5.12" line and replace the say with print)
A solution in awk:
BEGIN {
RS="\\^%!"
FS=",\\|"
max_occ = 0
}
{
if(match($0, "^ *$")) { # This is here to deal with the final separator.
next
}
if(NF - 1 > max_occ) {
max_occ = NF - 1
}
count[NF - 1]=count[NF - 1] + 1
}
END {
printf("occs count\n")
for(i = 0; i <= max_occ; i++) {
printf("%s %s\n", i, count[i])
}
}
Well, there's one more empty record at the end of the file which has 0. So, here's a script to do what you wanted. Adding headers and otherwise tweaking the printf output is left as an excercise for you. :)
Basically, read the whole file in, split it into records, and for each record, use a /g regex to count the sub-delimiters. Since /g returns an array of all matches, use #{[]} to make an arrayref then deref that in scalar context to get a count. There has to be a more elegant solution to that particular part of the problem, but whatever; it's perl line noise. ;)
user#host[/home/user]
$ ./test.pl ',|' '^%!' test.in
3 1
2 1
1 2
0 3
user#host[/home/user]
$ cat test.in
jdk,|ljn^%!dk,|sn,|fgc^%!
ydfsvuyx^%!67ds5,|bvujhy,|s6d75
djh,|sudh^%!nhjf,|^%!fdiu^%!
user#host[/home/user]
$ cat test.pl
#!/usr/bin/perl
my( $subdelim, $delim, $in,) = #ARGV;
$delim = quotemeta $delim;
$subdelim = quotemeta $subdelim;
my %counts;
open(F, $in) or die qq{Failed opening $in: $?\n};
foreach( split(/$delim/, join(q{}, <F>)) ){
$counts{ scalar(#{[m/.*?($subdelim)/g]}) }++;
}
printf( qq{%i% 4i\n}, $_, $counts{$_} ) foreach (sort {$b<=>$a} keys %counts);
And here's a modified version which only keeps fields which contain at least one non-space character. That removes the last field, but also has the consequence of removing any other empty fields. It also uses $/ and \Q\E to reduce a couple of explicit function calls (thank, Alex). And, like the previous one, it works with strict + warnings;
#!/usr/bin/perl
my( $subdelim, $delim, $in ) = #ARGV;
local $/=$delim;
my %counts;
open(F, $in) or die qq{Failed opening $in: $?\n};
foreach ( grep(/\S/, <F>) ){
$counts{ scalar(#{[m/.*?(\Q$subdelim\E)/g]}) }++;
}
printf( qq{%i% 4i\n}, $_, $counts{$_} ) foreach (sort {$b<=>$a} keys %counts);
If you really only want to remove the last record unconditionally, I'm partial to using pop:
#!/usr/bin/perl
my( $subdelim, $delim, $in ) = #ARGV;
local $/=$delim;
my %counts;
open(F, $in) or die qq{Failed opening $in: $?\n};
my #lines = <F>;
pop #lines;
$counts{ scalar(#{[m/.*?(\Q$subdelim\E)/g]}) }++ foreach (#lines);
printf( qq{%i% 4i\n}, $_, $counts{$_} ) foreach (sort {$b<=>$a} keys %counts);

Resources