batch download from URL - linux

I want to download thousand of files from a URL. Each line in "FileName.txt" contains the name of file to download. I am using a Perl script to take the file name from "FileName.txt" and downloading them after a random time. I run script as "./program.pl Filename.txt"
Filename.txt
A
B
C
B
program.pl
#!/usr/bin/perl
$file1=$ARGV[0];
open(FP1, $file1);
while($s1=<FP1>)
<br>
{ chomp ($s1);
$range = 5;
$minimum = 3;
$random_number = int(rand($range)) + $minimum;
`wget --wait="$random_number" "http://URL=$s1"`;
}
I am getting the output for few initial file but not for remaining file. For remaining file $ emacs fileD.txt give
[13] 29699
Could you kindly tell me why I am getting "[13] 29699", and what is the best way to download file after random time interval. Sorry, the program at while does not show the correct handler. Thanks

You don't show where $id comes from, but presumably some URLs contain & which puts the process in the background. You should use single quotes for wget's argument or use the list form of system.
Further, wget's wait parameter is only relevant if your are using wget itself to traverse links from a given URL. In your case, you need your Perl script to sleep between invoking wget for each URL:
#!/usr/bin/env perl
use strict;
use warnings;
use constant WAIT_MINIMUM => 3;
use constant WAIT_RANGE => 5;
my ($url_list_file) = #ARGV;
defined($url_list_file)
or die "Need URL list\n";
open my $fh, '<', $url_list_file
or die "Cannot open '$url_list_file': $!";
while (my $url = <$fh>) {
$url =~ s/\R\z//;
my #cmd = (wget => 'http://$url');
print "#cmd\n";
my $error = system #cmd;
if ($error) {
warn "''#cmd' failed: $?";
}
sleep WAIT_MINIMUM + rand(WAIT_RANGE);
}

What means URL=? wget takes url as simple paramter. Seems to be you need
`wget --wait=$random_number 'http://$s1'`;

Related

finding a file in directory using perl script

I'm trying to develop a perl script that looks through all of the user's directories for a particular file name without the user having to specify the entire pathname to the file.
For example, let's say the file of interest was data.list. It's located in /home/path/directory/project/userabc/data.list. At the command line, normally the user would have to specify the pathname to the file like in order to access it, like so:
cd /home/path/directory/project/userabc/data.list
Instead, I want the user just to have to enter script.pl ABC in the command line, then the Perl script will automatically run and retrieve the information in the data.list. which in my case, is count the number of lines and upload it using curl. the rest is done, just the part where it can automatically locate the file
Even though very feasible in Perl, this looks more appropriate in Bash:
#!/bin/bash
filename=$(find ~ -name "$1" )
wc -l "$filename"
curl .......
The main issue would of course be if you have multiple files data1, say for example /home/user/dir1/data1 and /home/user/dir2/data1. You will need a way to handle that. And how you handle it would depend on your specific situation.
In Perl that would be much more complicated:
#! /usr/bin/perl -w
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if 0; #$running_under_some_shell
use strict;
# Import the module File::Find, which will do all the real work
use File::Find ();
# Set the variable $File::Find::dont_use_nlink if you're using AFS,
# since AFS cheats.
# for the convenience of &wanted calls, including -eval statements:
# Here, we "import" specific variables from the File::Find module
# The purpose is to be able to just type '$name' instead of the
# complete '$File::Find::name'.
use vars qw/*name *dir *prune/;
*name = *File::Find::name;
*dir = *File::Find::dir;
*prune = *File::Find::prune;
# We declare the sub here; the content of the sub will be created later.
sub wanted;
# This is a simple way to get the first argument. There is no
# checking on validity.
our $filename=$ARGV[0];
# Traverse desired filesystem. /home is the top-directory where we
# start our seach. The sub wanted will be executed for every file
# we find
File::Find::find({wanted => \&wanted}, '/home');
exit;
sub wanted {
# Check if the file is our desired filename
if ( /^$filename\z/) {
# Open the file, read it and count its lines
my $lines=0;
open(my $F,'<',$name) or die "Cannot open $name";
while (<$F>){ $lines++; }
print("$name: $lines\n");
# Your curl command here
}
}
You will need to look at the argument-parsing, for which I simply used $ARGV[0] and I do dont know what your curl looks like.
A more simple (though not recommended) way would be to abuse Perl as a sort of shell:
#!/usr/bin/perl
#
my $fn=`find /home -name '$ARGV[0]'`;
chomp $fn;
my $wc=`wc -l '$fn'`;
print "$wc\n";
system ("your curl command");
Following code snippet demonstrates one of many ways to achieve desired result.
The code takes one parameter, a word to look for in all subdirectories inside file(s) data.list. And prints out a list of found files in a terminal.
The code utilizes subroutine lookup($dir,$filename,$search) which calls itself recursively once it come across a subdirectory.
The search starts from current working directory (in question was not specified a directory as start point).
use strict;
use warnings;
use feature 'say';
my $search = shift || die "Specify what look for";
my $fname = 'data.list';
my $found = lookup('.',$fname,$search);
if( #$found ) {
say for #$found;
} else {
say 'Not found';
}
exit 0;
sub lookup {
my $dir = shift;
my $fname = shift;
my $search = shift;
my $files;
my #items = glob("$dir/*");
for my $item (#items) {
if( -f $item && $item =~ /\b$fname\b/ ) {
my $found;
open my $fh, '<', $item or die $!;
while( my $line = <$fh> ) {
$found = 1 if $line =~ /\b$search\b/;
if( $found ) {
push #{$files}, $item;
last;
}
}
close $fh;
}
if( -d $item ) {
my $ret = lookup($item,$fname,$search);
push #{$files}, $_ for #$ret;
}
}
return $files;
}
Run as script.pl search_word
Output sample
./capacitor/data.list
./examples/data.list
./examples/test/data.list
Reference:
glob,
Perl file test operators

Adding custom header to specific files in a directory

I would like to add a unique one line header that pertains to each file FOCUS*.tsv file in a specified directory. After that, I would like to combine all of these files into one file.
First I’ve tried sed command.
`my $cmd9 = `sed -i '1i$SampleID[4]' $tsv_file`;` print $cmd9;
It looked like it worked but after I’ve combined all of these files into one file in the next section of the code, the inserted row was listed four times for each file.
I’ve tried the following Perl script to accomplish the same but it deleted the content of the file and only prints out the added header.
I’m looking for the simplest way to accomplish what I’m looking for.
Here is what I’ve tried.
#!perl
use strict;
use warnings;
use Tie::File;
my $home="/data/";
my $tsv_directory = $home."test_all_runs/".$ARGV[0];
my $tsvfiles = $home."test_all_runs/".$ARGV[0]."/tsv_files.txt";
my #run_directory = (); #run_directory = split /\//, $tsv_directory; print "The run directory is #############".$run_directory[3]."\n";
my $cmd = `ls $tsv_directory/FOCUS*\.tsv > $tsvfiles`; #print "$cmd";
my $cmda = "ls $tsv_directory/FOCUS*\.tsv > $tsvfiles"; #print "$cmda";
my #tsvfiles =();
#this code opens the vcf_files.txt file and passes each line into an array for indidivudal manipulation
open(TXT2, "$tsvfiles");
while (<TXT2>){
push (#tsvfiles, $_);
}
close(TXT2);
foreach (#tsvfiles){
chop($_);
}
#this loop works fine
for my $tsv_file (#tsvfiles){
open my $in, '>', $tsv_file or die "Can't write new file: $!";
open my $out, '>', "$tsv_file.new" or die "Can't write new file: $!";
$tsv_file =~ m|([^/]+)-oncomine.tsv$| or die "Can't extract Sample ID";
my $sample_id = $1;
#print "The sample ID is ############## $sample_id\n";
my $headerline = $run_directory[3]."/".$sample_id;
print $out $headerline;
while( <$in> ) {
print $out $_;
}
close $out;
close $in;
unlink($tsv_file);
rename("$tsv_file.new", $tsv_file);
}
Thank you
Apparently, the wrong '>' when opening the file for reading was the problem and it got solved.
However, I'd like to make a few comments on some of the rest of the code.
The list of files is built by running external ls redirected to a file, then reading this file into an array. However, that is exactly the job of glob and all of that is replaced by
my #tsvfiles = glob "$tsv_directory/FOCUS*.tsv";
Then you don't need the chomp either, and the chop that is used would actually hurt since it removes the last character, not only the newline (or really $/).
Use of chop is probably not what you want. If you are removing the linefeed ($/) use chomp
To extract a match and assign it, a common idiom is
my ($sample_id) = $tsv_file =~ m|([^/]+)-oncomine.tsv$|
or die "Can't extract Sample ID: $!";
Note that I also added $!, to actually print the error. Otherwise we just don't know what it was.
The unlink and rename appear to be overwriting one file with another. You can do that by using move from the core module File::Copy
use File::Copy qw(move);
move ($tsv_file_new, $tsv_file)
or die "Can't move $tsv_file to $tsv_file_new: $!";
which renames the _new into $tsv_file, so overwriting it.
As for how the files need to be combined, more precise explanation would be needed.

How do I interactively access Linux sub-directories through a perl script?

Here is my sample code:
#!/usr/bin/perl
use strict;
print "Enter the name of the input file and its relevant path:\n";
my $file1 = <STDIN>;
chomp $file1;
open (DOC,"$file1") || die "Could not open $file1, $!";
Is there a better way to interactively specify a file-name, along with its path?
A way by which, similar to a Linux command line interface, a user can :
Use tab, to auto-complete the path?
View the content of the current directory, by pressing Ctrl+D?
You can use the perl module Term::Complete to allow the user auto-completing the path.
Term::Complete requires an array with the words which shall be auto-completed so you would need to read the content of the current directory and save it to an array.
Example:
#!/usr/bin/perl
use Term::Complete;
use File::Find qw(finddepth);
finddepth(sub {
return if ($_ eq '.' || $_ eq '..');
push #files, $File::Find::name;
}, '/');
$input = Complete('File: ', \#files);

Setting Binary Transfer mode

My Perl script below is very basic. It goes and copies a .zip file located on one server and transfers it to another server.
#!/usr/bin/perl -w
use strict;
use warnings;
my $remotehost ="XXXXXX";
my $remotepath = "/USA/Fusion_Keyword_Reports";
my $remoteuser = "XXXXXXX";
my $remotepass = "XXXXXXX";
my $inputfile ="/fs/fs01/crmdata/SYWR/AAM/list8.txt";
my $remotefile1;
#my $DIR="/fs/fs01/crmdata/SYWR/AAM";
open (FILEIN, "<", $inputfile) or die "can't open list8 file";
while (my $line =<FILEIN>) {
if ($line =~ m /Keywords-Report(.*?)/i && $line !~ m/Keywords-Report-loopback/i) {
print $line;
$remotefile1 =$line;
last;
}
}
close FILEIN;
print "remotefile $remotefile1\n";
my $DIR1="/fs/fs01/crmdata/SYWR/AAM/$remotefile1";
my $cmd= "ftp -in";
my $ftp_command = "open $remotehost
user $remoteuser $remotepass
cd $remotepath
asc
get $remotefile1
bye
";
open (CMD, "|$cmd");
print CMD $ftp_command;
close (CMD);
exit(0);
When I run the script it does work but I get an error and the file that gets transferred is corrupted as a result.
226 Transfer complete.
WARNING! 40682 bare linefeeds received in ASCII mode.
File may not have transferred correctly.
I did some reading and I think I need to set the transfer mode to binary. However I am really not sure how to do that in my script. Additionally, I am not sure that is the right solution either.
I would really appreciate your thoughts about this error. If setting the transfer mode to Binary will fix this problem can you please show me where I would do that?
my $ftp_command = "open $remotehost
user $remoteuser $remotepass
cd $remotepath
binary
get $remotefile1
bye
";

using perl fetch a .txt file and for every line in that file do something [duplicate]

This question already has an answer here:
Why does my file content/user input not match? (missing chomp canonical) [duplicate]
(1 answer)
Closed 8 years ago.
I'm pretty new in perl so please try to understand me.
I have in a .txt file defined some lines like this:
doc1.20131010.zip
doc2.20131010.zip
doc3.20131010.zip
doc4.20131010.zip
I made this code:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie;
use Net::SFTP::Attributes;
use Net::SFTP;
use constant {
HOST => "x.x.x.x",
USER_NAME => "sftptest",
PASSWORD => "**********",
DEBUG => "0",
};
my $REMOTE_DIR = "IN";
my $LOCAL_DIR = "/home/rec";
my $sftp = Net::SFTP->new (
HOST,
timeout => 240,
user => USER_NAME,
password => PASSWORD,
autodie => 1,
);
#
# Fetch Files
#
#my $res = $sftp->ls($REMOTE_DIR,sub { print $_[0]{longname}, "\n" });
#print "$res";
my $ls = $sftp->ls($REMOTE_DIR)
or die "ls failed: " . $sftp->error;
open my $fh, '>', '/home/rec/listing' or die "unable to create file: $!";
print $fh $_->{filename}, "\n" for #$ls;
close $fh;
open F, "</home/docs/listing";
for my $line (<F>)
{
#print "$line";
$sftp->get("$line","$line") ;
}
Now when I run the above code it should give me the above files listed, instead I get this:
Couldn't stat remote file: No such file or directory at ./r.pl line 40.
You probably need to remove newline after reading file names from filehandle:
for my $line (<F>) {
chomp($line);
$sftp->get($line, $line);
}
or more commonly,
while (my $line = <F>) {
chomp($line);
$sftp->get($line, $line);
}
You use use autodie;, yet you have:
open my $fh, '>', '/home/rec/listing' or die "unable to create file: $!";
No need for the or die... since the program will automatically die.
You also have use feature qw(say);, yet you use print instead of say. The whole purpose of say is to prevent issues that might be the cause of your error.
You also should check the return results of your $sftp->get($line, $line); line to see if it was successful or not.
If you did both of these, you would have seen that your $sftp->get($line, $line) was failing because you forgot to chomp that NL at the end of the file.
Instead, you used:
`print $line;`
which printed the file out, but since the file name had a NL, it looked fine. Otherwise, you would have see the extra space and immediately seen the problem.

Resources