Efficient transfer of console data, tar & gzip/ bzip2 without creating intermediary files

Efficient transfer of console data, tar & gzip/ bzip2 without creating intermediary files - linux

Linux environment. So, we have this program 't_show', when executed with an ID will write price data for that ID on the console. There is no other way to get this data.
I need to copy the price data for IDs 1-10,000 between two servers, using minimum bandwidth, minimum number of connections. On the destination server the data will be a separate file for each id with the format:
<id>.dat
Something like this would be the long-winded solution:
dest:
files=`seq 1 10000`
for id in `echo $files`;
do
./t_show $id > $id
done
tar cf - $files | nice gzip -c > dat.tar.gz
source:
scp user#source:dat.tar.gz ./
gunzip dat.tar.gz
tar xvf dat.tar
That is, write each output to its own file, compress & tar, send over network, extract.
It has the problem that I need to create a new file for each id. This takes up tonnes of space and doesn't scale well.
Is it possible to write the console output directly to a (compressed) tar archive without creating the intermediate files? Any better ideas (maybe writing compressed data directly across network, skipping tar)?
The tar archive would need to extract as I said on the destination server as a separate file for each ID.
Thanks to anyone who takes the time to help.

You could just send the data formatted in some way and parse it on the the receiver.
foo.sh on the sender:
#!/bin/bash
for (( id = 0; id <= 10000; id++ ))
do
data="$(./t_show $id)"
size=$(wc -c <<< "$data")
echo $id $size
cat <<< "$data"
done
On the receiver:
ssh -C user#server 'foo.sh'|while read file size; do
dd of="$file" bs=1 count="$size"
done
ssh -C compresses the data during transfer

You can at least tar stuff over a ssh connection:
tar -czf - inputfiles | ssh remotecomputer "tar -xzf -"
How to populate the archive without intermediary files however, I don't know.
EDIT: Ok, I suppose you could do it by writing the tar file manually. The header is specified here and doesn't seem too complicated, but that isn't exactly my idea of convenient...

I don't think this is working with a plain bash script. But you could have a look at the Archive::TAR module for perl or other scripting languages.
The Perl Module has a function add_data to create a "file" on the fly and add it to the archive for streaming accros the network.
The Documentation is found here:

You can do better without tar:
#!/bin/bash
for id in `seq 1 1000`
do
./t_show $id
done | gzip
The only difference is that you will not get the boundaries between different IDs.
Now put that in a script, say show_me_the_ids and do from the client
shh user#source ./show_me_the_ids | gunzip
And there they are!
Or either, you can specify the -C flag to compress the SSH connection and remove the gzip / gunzip uses all together.
If you are really into it you may try ssh -C, gzip -9 and other compression programs.
Personally I'll bet for lzma -9.

I would try this:
(for ID in $(seq 1 10000); do echo $ID: $(/t_show $ID); done) | ssh user#destination "ImportscriptOrProgram"
This will print "1: ValueOfID1" to standardout, which a transfered via ssh to the destination host, where you can start your importscript or program, which reads the lines from standardin.
HTH

Thanks all
I've taken the advice 'just send the data formatted in some way and parse it on the the receiver', it seems to be the consensus. Skipping tar and using ssh -C for simplicity.
Perl script. Breaks the ids into groups of 1000. IDs are source_id in hash table. All data is sent via single ssh, delimited by 'HEADER', so it writes to the appropriate file. This is a lot more efficient:
sub copy_tickserver_files {
my $self = shift;
my $cmd = 'cd tickserver/ ; ';
my $i = 1;
while ( my ($source_id, $dest_id) = each ( %{ $self->{id_translations} } ) ) {
$cmd .= qq{ echo HEADER $source_id ; ./t_show $source_id ; };
$i++;
if ( $i % 1000 == 0 ) {
$cmd = qq{ssh -C dba\#$self->{source_env}->{tickserver} " $cmd " | };
$self->copy_tickserver_files_subset( $cmd );
$cmd = 'cd tickserver/ ; ';
}
}
$cmd = qq{ssh -C dba\#$self->{source_env}->{tickserver} " $cmd " | };
$self->copy_tickserver_files_subset( $cmd );
}
sub copy_tickserver_files_subset {
my $self = shift;
my $cmd = shift;
my $output = '';
open TICKS, $cmd;
while(<TICKS>) {
if ( m{HEADER [ ] ([0-9]+) }mxs ) {
my $id = $1;
$output = "$self->{tmp_dir}/$id.ts";
close TICKSOP;
open TICKSOP, '>', $output;
next;
}
next unless $output;
print TICKSOP "$_";
}
close TICKS;
close TICKSOP;
}

Related

finding a file in directory using perl script

I'm trying to develop a perl script that looks through all of the user's directories for a particular file name without the user having to specify the entire pathname to the file.
For example, let's say the file of interest was data.list. It's located in /home/path/directory/project/userabc/data.list. At the command line, normally the user would have to specify the pathname to the file like in order to access it, like so:
cd /home/path/directory/project/userabc/data.list
Instead, I want the user just to have to enter script.pl ABC in the command line, then the Perl script will automatically run and retrieve the information in the data.list. which in my case, is count the number of lines and upload it using curl. the rest is done, just the part where it can automatically locate the file

Even though very feasible in Perl, this looks more appropriate in Bash:
#!/bin/bash
filename=$(find ~ -name "$1" )
wc -l "$filename"
curl .......
The main issue would of course be if you have multiple files data1, say for example /home/user/dir1/data1 and /home/user/dir2/data1. You will need a way to handle that. And how you handle it would depend on your specific situation.
In Perl that would be much more complicated:
#! /usr/bin/perl -w
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if 0; #$running_under_some_shell
use strict;
# Import the module File::Find, which will do all the real work
use File::Find ();
# Set the variable $File::Find::dont_use_nlink if you're using AFS,
# since AFS cheats.
# for the convenience of &wanted calls, including -eval statements:
# Here, we "import" specific variables from the File::Find module
# The purpose is to be able to just type '$name' instead of the
# complete '$File::Find::name'.
use vars qw/*name *dir *prune/;
*name = *File::Find::name;
*dir = *File::Find::dir;
*prune = *File::Find::prune;
# We declare the sub here; the content of the sub will be created later.
sub wanted;
# This is a simple way to get the first argument. There is no
# checking on validity.
our $filename=$ARGV[0];
# Traverse desired filesystem. /home is the top-directory where we
# start our seach. The sub wanted will be executed for every file
# we find
File::Find::find({wanted => \&wanted}, '/home');
exit;
sub wanted {
# Check if the file is our desired filename
if ( /^$filename\z/) {
# Open the file, read it and count its lines
my $lines=0;
open(my $F,'<',$name) or die "Cannot open $name";
while (<$F>){ $lines++; }
print("$name: $lines\n");
# Your curl command here
}
}
You will need to look at the argument-parsing, for which I simply used $ARGV[0] and I do dont know what your curl looks like.
A more simple (though not recommended) way would be to abuse Perl as a sort of shell:
#!/usr/bin/perl
#
my $fn=`find /home -name '$ARGV[0]'`;
chomp $fn;
my $wc=`wc -l '$fn'`;
print "$wc\n";
system ("your curl command");

Following code snippet demonstrates one of many ways to achieve desired result.
The code takes one parameter, a word to look for in all subdirectories inside file(s) data.list. And prints out a list of found files in a terminal.
The code utilizes subroutine lookup($dir,$filename,$search) which calls itself recursively once it come across a subdirectory.
The search starts from current working directory (in question was not specified a directory as start point).
use strict;
use warnings;
use feature 'say';
my $search = shift || die "Specify what look for";
my $fname = 'data.list';
my $found = lookup('.',$fname,$search);
if( #$found ) {
say for #$found;
} else {
say 'Not found';
}
exit 0;
sub lookup {
my $dir = shift;
my $fname = shift;
my $search = shift;
my $files;
my #items = glob("$dir/*");
for my $item (#items) {
if( -f $item && $item =~ /\b$fname\b/ ) {
my $found;
open my $fh, '<', $item or die $!;
while( my $line = <$fh> ) {
$found = 1 if $line =~ /\b$search\b/;
if( $found ) {
push #{$files}, $item;
last;
}
}
close $fh;
}
if( -d $item ) {
my $ret = lookup($item,$fname,$search);
push #{$files}, $_ for #$ret;
}
}
return $files;
}
Run as script.pl search_word
Output sample
./capacitor/data.list
./examples/data.list
./examples/test/data.list
Reference:
glob,
Perl file test operators

How can I use bash to gather all NFS mount points with multiple configuration files to check that each mount is writeable?

I am trying to create a script that will dynamically find all NFS mount points that should be writable and check that they are still writable however I can't seem to get my head around connecting the mounts to their share directories.
So for example I have a server's /etc/auto.master like this (I've sanitized some of the data):
/etc/auto.master
/nfs1 /etc/auto.nfs1 --ghost
/nfs2 /etc/auto.nfs2 --ghost
And each of those files has:
/etc/auto.nfs1
home -rw,soft-intr -fstype=nfs server1:/shared/home
store -rw,soft-intr -fstype=nfs server2:/shared/store
/etc/auto.nfs2
data -rw,soft-intr -fstype=nfs oralceserver1:/shared/data
rman -rw,soft-intr -fstype=nfs oracleserver1:/shared/rman
What I'm trying to get out of that is
/nfs1/home
/nfs1/store
/nfs2/data
/nfs2/rman
without getting any erroneous or commented entries caught in the net.
My code attempt is this:
#!/bin/bash
for automst in `grep '^/' /etc/auto.master|awk -F" " '{for(i=1;i<=NF;i++){if ($i ~ /etc/){print $i}}}'`;
do echo $automst > /tmp/auto.mst
done
AUTOMST=`cat /tmp/auto.mst`
for mastermount in `grep '^/' /etc/auto.master|awk -F" " '{for(i=1;i<=NF;i++){if ($i ~ /etc/){print $i}}}'`;
do grep . $mastermount|grep -v '#'|awk {'print $1'};
done > /tmp/nfsmounteddirs
for dir in `cat /tmp/nfsmounteddirs`;
do
if [ -w "$dir" ]; then echo "$dir is writeable"; else echo "$dir is not writeable!";fi
done
I have 600 Linux servers and many have their own individual NFS setups and we don't have an alerting solution in place that can check, and while having all those individual scripts would be "a" solution, it would be a nightmare to manage and a lot of work so the dynamic aspect of it would be very useful.

awk '/^\// { # Process where lines begin with a /
fil=$2; # Track the file name
nfs=$1 # Track the nfs
while (getline line < fil > 0) { # Read the fil specified by fil till the end of the file
split(line,map,","); # Split the lines in array map with , as the delimiter
split(map[1],map1,/[[:space:]]+/); # Further split map into map1 with spaces as the delimiter
if(map1[2]~/w/ && line !~ /^#/) {
print nfs" "map1[1] # If w is in the permissions string and the line doesn't begin with a comment, print the nfs and share
}
}
close(fil) # Close the file after we have finished reading
}' /etc/auto.master
One liner:
awk '/^\// { fil=$2;nfs=$1;while (getline line < fil > 0) { split(line,map,",");split(map[1],map1,/[[:space:]]+/);if(map1[2]~/w/ && line !~ /^#/) { print nfs" "map1[1] } } close(fil) }' /etc/auto.master
Output:
/nfs1 home
/nfs1 store
/nfs2 data
/nfs2 rman

batch download from URL

I want to download thousand of files from a URL. Each line in "FileName.txt" contains the name of file to download. I am using a Perl script to take the file name from "FileName.txt" and downloading them after a random time. I run script as "./program.pl Filename.txt"
Filename.txt
A
B
C
B
program.pl
#!/usr/bin/perl
$file1=$ARGV[0];
open(FP1, $file1);
while($s1=<FP1>)
<br>
{ chomp ($s1);
$range = 5;
$minimum = 3;
$random_number = int(rand($range)) + $minimum;
`wget --wait="$random_number" "http://URL=$s1"`;
}
I am getting the output for few initial file but not for remaining file. For remaining file $ emacs fileD.txt give
[13] 29699
Could you kindly tell me why I am getting "[13] 29699", and what is the best way to download file after random time interval. Sorry, the program at while does not show the correct handler. Thanks

You don't show where $id comes from, but presumably some URLs contain & which puts the process in the background. You should use single quotes for wget's argument or use the list form of system.
Further, wget's wait parameter is only relevant if your are using wget itself to traverse links from a given URL. In your case, you need your Perl script to sleep between invoking wget for each URL:
#!/usr/bin/env perl
use strict;
use warnings;
use constant WAIT_MINIMUM => 3;
use constant WAIT_RANGE => 5;
my ($url_list_file) = #ARGV;
defined($url_list_file)
or die "Need URL list\n";
open my $fh, '<', $url_list_file
or die "Cannot open '$url_list_file': $!";
while (my $url = <$fh>) {
$url =~ s/\R\z//;
my #cmd = (wget => 'http://$url');
print "#cmd\n";
my $error = system #cmd;
if ($error) {
warn "''#cmd' failed: $?";
}
sleep WAIT_MINIMUM + rand(WAIT_RANGE);
}

What means URL=? wget takes url as simple paramter. Seems to be you need
`wget --wait=$random_number 'http://$s1'`;

How to rename multiple files in terminal (LINUX)?

I have bunch of files with no pattern in their name at all in a directory. all I know is that they are all Jpg files. How do I rename them, so that they will have some sort of sequence in their name.
I know in Windows all you do is select all the files and rename them all to a same name and Windows OS automatically adds sequence numbers to compensate for the same file name.
I want to be able to do that in Linux Fedora but I you can only do that in Terminal. Please, help. I am lost.
What is the command for doing this?

The best way to do this is to run a loop in the terminal going from picture to picture and renaming them with a number that gets bigger by one with every loop.
You can do this with:
n=1
for i in *.jpg; do
p=$(printf "%04d.jpg" ${n})
mv ${i} ${p}
let n=n+1
done
Just enter it into the terminal line by line.
If you want to put a custom name in front of the numbers, you can put it before the percent sign in the third line.
If you want to change the number of digits in the names' number, just replace the '4' in the third line (don't change the '0', though).

I will assume that:
There are no spaces or other weird control characters in the file names
All of the files in a given directory are jpeg files
That in mind, to rename all of the files to 1.jpg, 2.jpg, and so on:
N=1
for a in ./* ; do
mv $a ${N}.jpg
N=$(( $N + 1 ))
done
If there are spaces in the file names:
find . -type f | awk 'BEGIN{N=1}
{print "mv \"" $0 "\" " N ".jpg"
N++}' | sh
Should be able to rename them.
The point being, Linux/UNIX does have a lot of tools which can automate a task like this, but they have a bit of a learning curve to them

Create a script containing:
#!/bin/sh
filePrefix="$1"
sequence=1
for file in $(ls -tr *.jpg) ; do
renamedFile="$filePrefix$sequence.jpg"
echo $renamedFile
currentFile="$(echo $file)"
echo "renaming \"$currentFile\" to $renamedFile"
mv "$currentFile" "$renamedFile"
sequence=$(($sequence+1))
done
exit 0
If you named the script, say, RenameSequentially then you could issue the command:
./RenameSequentially Images-
This would rename all *.jpg files in the directory to Image-1.jpg, Image-2.jpg, etc... in order of oldest to newest... tested in OS X command shell.

I wrote a perl script a long time ago to do pretty much what you want:
#
# reseq.pl renames files to a new named sequence of filesnames
#
# Usage: reseq.pl newname [-n seq] [-p pad] fileglob
#
use strict;
my $newname = $ARGV[0];
my $seqstr = "01";
my $seq = 1;
my $pad = 2;
shift #ARGV;
if ($ARGV[0] eq "-n") {
$seqstr = $ARGV[1];
$seq = int $seqstr;
shift #ARGV;
shift #ARGV;
}
if ($ARGV[0] eq "-p") {
$pad = $ARGV[1];
shift #ARGV;
shift #ARGV;
}
my $filename;
my $suffix;
for (#ARGV) {
$filename = sprintf("${newname}_%0${pad}d", $seq);
if (($suffix) = m/.*\.(.*)/) {
$filename = "$filename.$suffix";
}
print "$_ -> $filename\n";
rename ($_, $filename);
$seq++;
}
You specify a common prefix for the files, a beginning sequence number and a padding factor.
For exmaple:
# reseq.pl abc 1 2 *.jpg
Will rename all matching files to abc_01.jpg, abc_02.jpg, abc_03.jpg...

How to find/cut for only the filename from an output of ls -lrt in Perl

I want the file name from the output of ls -lrt, but I am unable to find a file name. I used the command below, but it doesn't work.
$cmd=' -rw-r--r-- 1 admin u19530 3506 Aug 7 03:34 sla.20120807033424.log';
my $result=`cut -d, -f9 $cmd`;
print "The file name is $result\n";
The result is blank. I need the file name as sla.20120807033424.log
So far, I have tried the below code, and it works for the filename.
Code
#!/usr/bin/perl
my $dir = <dir path>;
opendir (my $DH, $dir) or die "Error opening $dir: $!";
my %files = map { $_ => (stat("$dir/$_"))[9] } grep(! /^\.\.?$/, readdir($DH));
closedir($DH);
my #sorted_files = sort { $files{$b} <=> $files{$a} } (keys %files);
print "the file is $sorted_files[0] \n";

use File::Find::Rule qw( );
use File::stat qw( stat );
use List::Util qw( reduce );
my ($oldest) =
map $_ ? $_->[0] : undef, # 4. Get rid of stat data.
reduce { $a->[1]->mtime < $b->[1]->mtime ? $a : $b } # 3. Find one with oldest mtime.
map [ $_, scalar(stat($_)) ], # 2. stat each file.
File::Find::Rule # 1. Find relevant files.
->maxdepth(1) # Don't recurse.
->file # Just plain files.
->in('.'); # In whatever dir.
File::Find::Rule
File::stat
List::Util

You're making it harder for yourself by using -l. This will do what you want
print((`ls -brt`)[0]);
But it is generally better to avoid shelling out unless Perl can't provide what you need, and this can be done easily
print "$_\n" for (sort { -M $a <=> -M $b } glob "*")[0];

if the name of log file is under your control, ie., free of space or other special characters, perhaps a quick & dirty job will do:
my $cmd=' -rw-r--r-- 1 admin u19530 3506 Aug 7 03:34 sla.20120807033424.log more more';
my #items = split ' ', $cmd;
print "log filename is : #items[8..$#items]";
print "\n";

It's not possible to do it reliably with -lrt - if you were willing to choose other options you could do it.
BTW you can still sort by reverse time with -rt even without the -l.
Also if you must use ls, you should probably use -b.

my $cmd = ' -rw-r--r-- 1 admin u19530 3506 Aug 7 03:34 sla.20120807033424.log';
$cmd =~ / ( \S+) $/x or die "can't find filename in string " ;
my $filename = $1 ;
print $filename ;
Disclaimer - this won't work if filename has spaces and probably under other circumstances. The OP will know the naming conventions of the files concerned. I agree there are more robust ways not using ls -lrt.

Maybe as this:
ls -lrt *.log | perl -lane 'print $F[-1]'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Efficient transfer of console data, tar & gzip/ bzip2 without creating intermediary files - linux

Related

finding a file in directory using perl script

How can I use bash to gather all NFS mount points with multiple configuration files to check that each mount is writeable?

batch download from URL

How to rename multiple files in terminal (LINUX)?

How to find/cut for only the filename from an output of ls -lrt in Perl

Categories

Resources