i made a script in TCL which receives huge input file, reads line by line and then modifies the data in some way.
the problem starts when i need to do the same with *.gz format files, which contains the data file.
the only thing i found by google search is how to do it by using gzcat and that also didn't work + it's not good because it reads the whole file ( i think ?) and i don't want it to process the whole file.
on short : i need to read a gz file line by line, how do i do it?
example of what i did on normal :
set fh [open <some path> r]
while {[gets $fh line]>=0} {
do something with $line
}
what i tried and couldn't understand\make it work for me :
set pipeline [open "| zcat foo.gz"]
set data [read $pipeline]
close $pipeline
thanks!
If you have Tcl 8.6, just do:
set fh [open <SomePath.gz> r]
zlib push gunzip $fh
while {[gets $fh line]>=0} {
do something with $line
}
close $fh
With 8.5 or before, going via an external gzcat process is the simplest way.
set ZCAT_PROGRAM gzcat; # Might be called something else on your system
set fh [open |[list $ZCAT_PROGRAM <SomePath.gz>] r]
while {[gets $fh line]>=0} {
do something with $line
}
close $fh
You can also do it if you have gzip if you pass the right flags, which has the advantage of it being pretty consistently called gzip when it is present at all:
set fh [open |[list gzip -d -c <SomePath.gz>] r]
while {[gets $fh line]>=0} {
do something with $line
}
close $fh
(The -d option does decompression, the -c option sends it to stdout so we can read it from the pipeline.)
Related
the string to to be searched is:
the file_is being created_automaically {
period=20ns }
the perl script i am using is following ( this script is working fine for single line string but not working for multi line )
#!/usr/bin/perl
my $dir = "/home/vikas";
my #files = glob( $dir . '/*' );
#print "#files";
system ("rm -rf $dir/log.txt");
my $list;
foreach $list(#files){
if( !open(LOGFILE, "$list")){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: unable to open file";
close (File);
else {
while (<LOGFILE>){
if($_ =~ /".*the.*automaically.*\{\n.*period\=20ns.*\}"/){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: File contain the required string\n";
close (File);
break;
}
}
close (LOGFILE);
}
}
This code does not compile, it contains errors that causes it to fail to execute. You should never post code that you have not first tried to run.
The root of your problem is that for a multiline match, you cannot read the file in line-by-line mode, you have to slurp the whole file into a variable. However, your program contains many flaws. I will demonstrate. Here follows excerpts of your code (with fixed indentation and missing curly braces).
First off, always use:
use strict;
use warnings;
This will save you many headaches and long searches for hidden problems.
system ("rm -rf $dir/log.txt");
This is better done in Perl, where you can control for errors:
unlink "$dir/log.txt" or die "Cannot delete '$dir/log.txt': $!";
foreach my $list (#files) {
# ^^
Declare the loop variable in the loop itself, not before it.
if( !open(LOGFILE, "$list")){
open (File, ">>", "$dir/log.txt");
select (File);
print " $list \: unable to open file";
close (File);
You never have to explicitly select a file handle before you print to it. You just print to the file handle: print File "....". What you are doing is just changing the STDOUT file handle, which is not a good thing to do.
Also, this is error logging, which should go to STDERR instead. This can be done simply by opening STDERR to a file at the beginning of your program. Why do this? If you are not debugging a program at a terminal, for example via the web or some other process where STDERR does not show up on your screen. Otherwise it is just extra work while debugging.
open STDERR, ">", "$dir/log.txt" or die "Cannot open 'log.txt' for overwrite: $!";
This has the added benefit of you not having to delete the log first. And now you do this instead:
if (! open LOGFILE, $list ) {
warn "Unable to open file '$list': $!";
} else ....
warn goes to STDERR, so it is basically the same as print STDERR.
Speaking of open, you should use three argument open with explicit file handle. So it becomes:
if (! open my $fh, "<", $list )
} else {
while (<LOGFILE>) {
Since you are looking for a multiline match, you need to slurp the file(s) instead. This is done by setting the input record separator to undef. Typically like this:
my $file = do { local $/; <$fh> }; # $fh is our file handle, formerly LOGFILE
Next how to apply the regex:
if($_ =~ /".*the.*automaically.*\{\n.*period\=20ns.*\}"/) {
$_ =~ is optional. A regex automatically matches against $_ if no other variable is used.
You should probably not use " in the regex. Unless you have " in the target string. I don't know why you put it there, maybe you think strings need to be quoted inside a regex. If you do, that is wrong. To match the string you have above, you do:
if( /the.*automaically.*{.*period=20ns.*}/s ) {
You don't have to escape \ curly braces {} or equal sign =. You don't have to use quotes. The /s modifier makes . (wildcard character period) also match newline, so we can remove \n. We can remove .* from start or end of string, because that is implied, regex matches are always partial unless anchors are used.
break;
The break keyword is only used with the switch feature, which is experimental, plus you don't use it, or have it enabled. So it is just a bareword, which is wrong. If you want to exit a loop prematurely, you use last. Note that we don't have to use last because we slurp the file, so we have no loop.
Also, you generally should pick suitable variable names. If you have a list of files, the variable that contains the file name should not be called $list, I think. It is logical that it is called $file. And the input file handle should not be called LOGFILE, it should be called $input, or $infh (input file handle).
This is what I get if I apply the above to your program:
use strict;
use warnings;
my $dir = "/home/vikas";
my #files = glob( $dir . '/*' );
my $logfile = "$dir/log.txt";
open STDERR, ">", $logfile or die "Cannot open '$logfile' for overwrite: $!";
foreach my $file (#files) {
if(! open my $input, "<", $file) {
warn "Unable to open '$file': $!";
} else {
my $txt = do { local $/; <$fh> };
if($txt =~ /the.*automaically.*{.*period=20ns.*}/) {
print " $file : File contain the required string\n";
}
}
}
Note that the print goes to STDOUT, not to the error log. It is not common practice to have STDOUT and STDERR to the same file. If you want, you can simply redirect output in the shell, like this:
$ perl foo.pl > output.txt
The following sample code demonstrates usage of regex for multiline case with logger($fname,$msg) subroutine.
Code snippet assumes that input files are relatively small and can be read into a variable $data (an assumption is that computer has enough memory to read into).
NOTE: input data files should be distinguishable from rest files in home directory $ENV{HOME}, in this code sample these files assumed to match pattern test_*.dat, perhaps you do not intend to scan absolutely all files in your home directory (there could be many thousands of files but you interested in a few only)
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
my($dir,$re,$logfile);
$dir = '/home/vikas/';
$re = qr/the file_is being created_automaically \{\s+period=20ns\s+\}/;
$logfile = $dir . 'logfile.txt';
unlink $logfile if -e $logfile;
for ( glob($dir . "test_*.dat") ) {
if( open my $fh, '<', $_ ) {
my $data = do { local $/; <$fh> };
close $fh;
logger($logfile, "INFO: $_ contains the required string")
if $data =~ /$re/gsm;
} else {
logger($logfile, "WARN: unable to open $_");
}
}
exit 0;
sub logger {
my $fname = shift;
my $text = shift;
open my $fh, '>>', $fname
or die "Couldn't to open $fname";
say $fh $text;
close $fh;
}
Reference: regex modifies, unlink, perlvar
Well, i tried to find online my answer but actually I didn't and I really need help..
I have a text file (file.txt) that contain :
C:/Users/00_file/toto.odb,
dis,455,
stre,54,
stra,25,
C:/Users/00_file/tota.odb,
And a TCL script that allows me to read values of each lines :
set Infile [open "C:/Users/00_file/file.txt" r]
set filelines [split $Infile ","]
set Namepath [lindex $filelines 1 0] #*doesn't work*
set dis [lindex $filelines 2 0] # *work good*
...
The problem is when I want the complete line 1 of the text file with my TCL script, some informations are missing and extra caracter disapear..
How can I have the complete string (line 1 of my text file) ?
Thanks a lot !
You open the file for reading but you don't actually read from it. $Infile is just (basically) a pointer to a file descriptor, not the contents of the file:
% set fh [open file.txt r]
% puts $fh
file3
The idiomatic way to read from a file: line-by-line
set fh [open "C:/Users/00_file/file.txt" r]
set data [list]
while {[get $fh line] != -1} {
lappend data [split $line ,]
}
close $fh
Or, read the whole file and split it on newlines
set fh [open "C:/Users/00_file/file.txt" r]
set data [lmap line [split [read -nonewline $fh] \n] {split $line ,}]
close $fh
Then access the data
set Namepath [lindex $data 0 0] ;# first line, first field
set dis [lindex $data 1 1] ;# second line, second field
Tcl code will be as follow:
set file [open c:/filename.txt ]
set file_device [read $file]
set data [split $file_device "\n"]
for {set count 0} {$count < 2} {incr count} {
puts $data
# for every iterartion one line will be printed.
# split /n is use for getting the end of each line.
# open command open the file at given path.
# read command is use to read the open file.
}
close $file
break
this will take the line one after another.
I am using Net::OpenSSH
my $ssh = Net::OpenSSH->new("$linux_machine_host")
Using the SSH object, fews commands are executed multiple times for N hours.
At times I need to look for any error messages, such as Timeout, in the var/adm/message file.
My suggestion
$ssh->capture2("echo START >> /var/adm/messages");
$ssh->capture2("some command which will be run in background for n hours");
$ssh->capture2("echo END >> /var/adm/messages");
Then read all lines between START and END and grep for the required error message.
$ssh->capture2("grep -A 100000 "START" /var/adm/messages | grep -B 100000 END");`
Without writing START and END into the message file, can I tail the var/adm/message file at some point and capture any new messages appearing afterwards.
Are there any Net::OpenSSH methods which would capture new lines and write them into a file?
You can read the messages file via SFTP (see Net::SFTP::Foreign):
# untested!
use Net::SFTP::Foreign::Constants qw(:flags);
...
my $sftp = $ssh->sftp;
# open the messages file creating it if it doesn't exist
# and move to the end:
my $fh = $sftp->open("/var/adm/messages",
SSH2_FXF_READ|SSH2_FXF_CREAT)
or die $sftp->error;
seek($fh, 0, 2);
$ssh->capture2("some command which...");
# look for the size of /var/adm/messages now so that we
# can ignore any lines that may be appended while we are
# reading it:
my $end = (stat $fh)[7];
# and finally read any lines added since we opened it:
my #msg;
while (1) {
my $pos = tell $fh;
last if $pos < 0 or $pos >= $end;
my $line = <$fh>;
last unless defined $line;
push #msg, $line;
}
Note that you are not taking into account that the messages file may be rotated. Handling that would require more convoluted approaches.
Want to replace SVT-ATL in all the lines of file with SVT without disturbing other text.
Using below code:
set fileDest3 "$dirName/$filename"
set fpr [open $fileDest3 r+]
set line [gets $fpr]
regsub -all "SVT-ATL" $line "SVT" line
puts $fpr "$line"
Because you're changing the length of lines, you must rewrite the whole file. (Well, you could theoretically leave the lines before the first thing being changed a lot, but that's a whole bunch more work.) The simplest way is to read it all in, string map to perform the change (in the simplest case; regsub if things are trickier) and then write it all back out (chan seek to the beginning first, of course). As you're shortening things, you'll need to finish with a chan truncate.
set fileDest3 "$dirName/$filename"
set fpr [open $fileDest3 r+]
set newContents [string map {"SVT-ATL" "SVT"} [read $fptr]]
chan seek $fptr 0
puts -nonewline $fptr $newContents
chan truncate $fptr
close $fptr
The puts has a -nonewline so you don't get an extra terminating newline; the one that was there originally will still be in (as we're reading it all in and not just line-by-line).
package require fileutil
proc cmd data {
string map {SVT-ATL SVT} $data
}
if {[catch {fileutil::updateInPlace [file join $dir $filename] cmd}]} {
error "failed to change file"
}
The Tcllib fileutil::updateInPlace command takes care of the low-level details of opening, reading, applying a given command to the content, truncating, writing, and closing files that you want updated. You simply provide a command like cmd here and enjoy the odds ever being in your favor.
Documentation: catch, error, if, package, proc, string
The fileutil package is documented here: fileutil
set timestamp [clock format [clock seconds] -format {%Y%m%d%H%M%S}]
set filename "yourfilenamehere.txt"
set temp $filename.tmp.$timestamp
set backup $filename.bak.$timestamp
set in [open $filename r]
set out [open $temp w]
# line-by-line, read the original file
while {[gets $in line] != -1} {
# Modifying $line by replacing the 'SVT-AL' with 'SVT'
regsub -all "SVT-ATL" $line "SVT" line
# then write the modified line to 'tmp' file
puts $out $line
}
close $in
close $out
# This is to rename the current file to backup file
file rename -force $filename $backup
# This is to rename the tmp file to the original file
file rename -force $temp $filename
Reference : Glenn Jackman & Donal Fellows
Update :
If you don't want to create a new file, then at least, as Jerry pointed out, we can read all the file content at once, apply our string replacement and then write back to file.
# Reading the file content
set fd [ open "yourfilename" r ]
set data [ read $fd ]
close $fd
# Replacing the string now...
regsub -all "SVT-ATL" $data "SVT" data
# Opening file with 'w' mode which will truncate the file
set fd [ open "yourfilename" w ]
puts $fd $data
close $fd
I would consider
exec sed -i {s/SVT-ATL/SVT/g} "$dirName/$filename"
how can i sent this values
24.215729
24.815729
25.055134
27.123499
27.159186
28.843474
28.877798
28.877798
to tcl input argument?
as you know we cant use pipe command because tcl dosent accept in that way!
what can i do to store this numbers in tcl file(the count of this numbers in variable and can be 0 to N and in this example its 7)
This is pretty easy to do in bash, dump the list of values into a file and then run:
tclsh myscript.tcl $(< datafilename)
And then the values are accessible in the script with the argument variables:
puts $argc; # This is a count of all values
puts $argv; # This is a list containing all the arguments
You can read data piped to stdin with commands like
set data [gets stdin]
or from temporary files, if you prefer. For example, the following program's first part (an example from wiki.tcl.tk) reads some data from a file, and the other part then reads data from stdin. To test it, put the code into a file (eg reading.tcl), make it executable, create a small file somefile, and execute via eg
./reading.tcl < somefile
#!/usr/bin/tclsh
# Slurp up a data file
set fsize [file size "somefile"]
set fp [open "somefile" r]
set data [read $fp $fsize]
close $fp
puts "Here is file contents:"
puts $data
puts "\nHere is from stdin:"
set momo [read stdin $fsize]
puts $momo
A technique I use when coding is to put data in my scripts as a literal:
set values {
24.215729
24.815729
25.055134
27.123499
27.159186
28.843474
28.877798
28.877798
}
Now I can just feed them into a command one at a time with foreach, or send them as a single argument:
# One argument
TheCommand $values
# Iterating
foreach v $values {
TheCommand $v
}
Once you've got your code working with a literal, switching it to pull the data from a file is pretty simple. You just replace the literal with code to read a file:
set f [open "the/data.txt"]
set values [read $f]
close $f
You can also pull the data from stdin:
set values [read stdin]
If there's a lot of values (more than, say, 10–20MB) then you might be better off processing the data one line at a time. Here's how to do that with reading from stdin…
while {[gets stdin v] >= 0} {
TheCommand $v
}