Log-in only once using wget multiple times on same ftp-server - linux

Basically, I am using wget on a file containing multiple URLs. I notice that for each line the command I use:
wget -i list_of_urls
and for each row in "list_of_urls" wget does a log-in step to the FTP server that I'm downloading from. It does the log-in step automatically, without me entering any username and password. Each line produces the output
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.13|.21... connected.
Logging in as anonymous ... Logged in!
followed by the file downloading.
Is there any way to log in only for the first row and then using that login to download all the following rows? Since the URLs point to the same FTP server, only different files, it feels like logging in for each row is wasteful.
Edit: changed from "website" to "FTP server" since that was what I actually meant, thanks. Added a sample output of the log-in message.

After some fiddling around I think using the rsync protocol solved the problem. This works in this case since the file host has both ftp and rsync servers containing the same files. I then simply (for small file sizes) use
rsync $(tr '\n' ' ' <list_of_urls) /usrpath/
which was much faster than using wget on the ftps. I had to include the $(tr '\n' ' ' <list_of_urls) since the list of urls had end of line separation, but rsync takes space-separated files in the command line. It seems like the rsync protocol in this case only logs on once and then downloads all files, since it went much faster.
Another problem arises with this method when list_of_urls is very long, which I haven't solved yet.

Related

How do I curl a URL with an unknown filename at the end?

I'm talking to a server that creates a new zip file daily, ex: (data-1234.zip). Every day the name of the previous zip is removed and a new one is created with an incremented number, ex: (data-1235.zip). The script will be run sporadically throughout the week but it's on a lab system where the user can't manually update the name with what's on the server.
The server only has one zip file in that directory, it's just a matter of getting the correct naming convention. There is, however a "data.ini" file in the folder as well, so something just searching by first name wouldn't necessarily work. I've seen posts similar to This question using regex but the file is currently on 10,609 and I'd rather not use expansion for potentially thousands of calls depending on access to modify the script in the coming years. I've been searching for something similar to "data-*.zip" but haven't had any luck.
Question was solved by changing commands and running
lftp https://download.companyname.com/product/data/ -e "mget data-*.zip; bye"
since lftp allows wildcards in the filename, unlike curl.

WGET - how to download embedded pdf's that have a download button from a text file URL list? Is it possible?

Happy New Years!
I wanted to see if anybody has ever successfully downloaded embedded pdf file's from multiple url's contained in a .txt file for a website?
For instance;
I tried several combinations of wget -i urlist.txt (which downloads all the html files perfectly); however it doesn't also grab each html file's embedded .pdf?xxxxx <---- slug on the end of the .pdf?*
The exact example of this obstacle is the following:
This dataset I have placed all 2 pages of links into a url.txt:
https://law.justia.com/cases/washington/court-of-appeals-division-i/2014/
1 example URL within this dataset:
https://law.justia.com/cases/washington/court-of-appeals-division-i/2014/70147-9.html
The embedded pdf link is the following:
https://cases.justia.com/washington/court-of-appeals-division-i/2014-70147-9.pdf?ts=1419887549
The .pdf files are actually "2014-70147-9.pdf?ts=1419887549" .pdf?ts=xxxxxxxxxx
each one is different.
The URL list contains 795 links. Does anyone have a successful method to download every .html in my urls.txt while also downloading the .pdfxxxxxxxxxxxxxx file's also to go with the .html's ?
Thank you!
~ Brandon
Try using the following:
wget --level 1 --recursive --span-hosts --accept-regex 'https://law.justia.com/cases/washington/court-of-appeals-division-i/2014/.*html|https://cases.justia.com/washington/court-of-appeals-division-i/.*.pdf.*' --input-file=urllist.txt
Details about the options --level, --recursive, --span-hosts, --accept-regex, and --input-file can be found in wget documentation at https://www.gnu.org/software/wget/manual/html_node/index.html.
You will also need to know how regular expressions work. You can start at https://www.grymoire.com/Unix/Regular.html
You are looking for a web-scraper. Be careful to not break any rules if you ever use one.
You could also process the content you have received through wget using some string manipulation in a bash script.

Connect to a FTP from shell script and get the last 14 files from the folder

How do we connect to a FTP from a shell script and pull the last 14 (or n) modified files by timestamp and place them on one of the folders from the current host. I try to use
mget
, but can we specify to get the files based on the timestamp and the number of files to get... Please advise... Thanks in advance...
You can define an ftp macro (macdef) and automate your login (in your ~/.netrc file) and in youe script fetch a file listing; sort the listing by date with awk or sort -k in your script to build a list of N files that you want to fetch; then simply loop through the list and fire up ftp to fetch them.
It might be easier to use curl. and probably would more portable to use something like perl and Net::FTP.

Linux: WGET - scheme missing using -i option

I am trying to download multiple files from yahoo finance using wget.
To do that i used a python script to generate a text file with all urls that i need.
When downloading a single file (a csv file) using the following code:
wget ichart.finance.yahoo.com/table.csv?s=BIOM3.SA&a=00&b=5&c=1900&d=04&e=21&f=2013&g=d&ignore=.csv
everything goes OK!
However, when the option -i is added and instead of reading the url directly, but instead reading it from the file, i get the error:
Invalid URL ichart.finance.yahoo.com/table.csv?s=BIOM3.SA&a=00&b=5&c=1900&d=04&e=21&f=2013&g=d&ignore=.csv: Scheme missing
The file that contains the urls is a text file with a single url in each line. The urls are exactly like the one in the first example, but with some different parameters.
Is there a way to correct this?
Thanks a lot for reading!!
To solve the problem I added double-quotes on the links and a web protocol. For example:
"http://ichart.finance.yahoo.com/table.csv?s=BIOM3.SA&a=00&b=5&c=1900&d=04&e=21&f=2013&g=d&ignore=.csv"

"Silent" Printing in a Web Application

I'm working on a web application that needs to prints silently -- that is without user involvement. What's the best way to accomplish this? It doesn't like it can be done with strictly with Javascript, nor Flash and/or AIR. The closest I've seen involves a Java applet.
I can understand why it would a Bad Idea for just any website to be able to do this. This specific instance is for an internal application, and it's perfectly acceptable if the user needs to add the URL to a trusted site list, install an addon, etc.
Here’s what you need to do to enable Firefox immediately print without showing the print preferences dialog box.
Type about:config at Firefox’s location bar and hit Enter.
Right click at anywhere on the page and select New > Boolean
Enter the preference name as print.always_print_silent and click OK.
I found that somewhere and it helped me
As #Axel wrote, Firefox has the print.always_print_silent option.
For Chrome, use the --kiosk-printing option to skip the Print Preview dialog:
Edit the shortcut you use to start Chrome and add "--kiosk-printing" then restart Chrome.
Note: If it doesn't work it is most likely because you did not completely stop Chrome, logging out and back in will surely do the trick.
Here are two code samples you can try:
1:
<script>
function Print() {
alert ("THUD.. another tree bites the dust!")
if (document.layers)
{
window.print();
}
else if (document.all)
{
WebBrowser1.ExecWB(6, 1);
//use 6, 1 to prompt the print dialog or 6, 6 to omit it
//some websites also indicate that 6,2 should be used to omit the box
WebBrowser1.outerHTML = "";
}
}
</script>
<object ID="WebBrowser1" WIDTH="0" HEIGHT="0"
CLASSID="CLSID:8856F961-340A-11D0-A96B-00C04FD705A2">
</object>
2:
if (navigator.appName == "Microsoft Internet Explorer")
{
var PrintCommand = '<object ID="PrintCommandObject" WIDTH=0 HEIGHT=0 CLASSID="CLSID:8856F961-340A-11D0-A96B-00C04FD705A2"></object>';
document.body.insertAdjacentHTML('beforeEnd', PrintCommand);
PrintCommandObject.ExecWB(6, -1); PrintCommandObject.outerHTML = "";
}
else {
window.print();
}
You may need to add the site/page you are testing on to you local intranet zone.
We struggled with a similar problem. We needed to print checks to a check printer, labels to a label printer, and customer invoices to an invoice printer for retail store embrasse-moi. We have dummy computers, nooks, ipads, iphones with no printing capabilities. The printing an invoice feature was basically a silent print. A pdf was written to the server, and a shell script was used locally to retrieve it and print.
We used the following for a perfect solution with minimal libraries:
use TCPDF in PHP to create PDF. Store the PDF on the server. Put it in a 'Print Queue' Folder. Kudos for TCPDF, a bit difficult to learn, but SICK SICK SICK. Note we are printing 80 labels per page using avery 5167 with a bar code with perfect accuracy. We have a labels, check, and invoice print queue. Different folders basically for different printers.
Use the included shell script to connect to the server via FTP, download the PDF, delete the PDF off the server, send the PDF to the printer, and again, delete the PDF.
Using a local computer attached to the printer, run the script in terminal. obviously modify your printers and paths.
Because you always want this running, and because you use a MAC, create an 'app' using automator. Start automator, put the script in a 'run shell script' and save. Then stick that app in a login item. See the script below the shell script if you want to see the 'output' window on the MAC.
BAM - works sick.
Here is the shell script
#!/bin/bash
# Get a remote directory Folder
# List the contents every second
# Copy the files to a local folder
# delete the file from server
# send the file to a printer
# delete the file
# compliments of embrasse-moi.com
clear # clear terminal window
echo "##########################################"
echo "Embrasse-Moi's Remote Print Queue Script"
echo "##########################################"
#Local Print Queue Directory
COPY_TO_DIRECTORY=/volumes/DATA/test/
echo "Local Directory: $COPY_TO_DIRECTORY"
#Priter
PRINTER='Brother_MFC_7820N'
echo "Printer Name: $PRINTER"
#FTP Info
USER="user"
PASS="pass"
HOST="ftp.yourserver.com"
#remote path
COPY_REMOTE_DIRECTORY_FILES=/path
echo "Remote Print Queue Directory: $HOST$COPY_REMOTE_DIRECTORY_FILES"
echo 'Entering Repeating Loop'
while true; do
#make the copy to directory if not exist
echo "Making Directory If it Does Not Exist"
mkdir -p $COPY_TO_DIRECTORY
cd $COPY_TO_DIRECTORY
######################### WGET ATTEMPTS ############################################
#NOTE wget will need to be installed
echo "NOT Using wget to retrieve remote files..."
# wget --tries=45 -o log --ftp-user=$USER --ftp-password=$PASS ftp://ftp.yourserver.com$COPY_REMOTE_DIRECTORY_FILES/*.pdf
######################### FTP ATTEMPTS ############################################
echo "NOT Using ftp to retrieve and delete remote files..."
#This seems to fail at mget, plus not sure how to delete file or loop through files
ftp -n $HOST <<END_SCRIPT
quote USER $USER
quote PASS $PASS
cd $COPY_REMOTE_DIRECTORY_FILES
ls
prompt
mget *
mdel *
END_SCRIPT
echo "Examining Files in $COPY_TO_DIRECTORY"
for f in $COPY_TO_DIRECTORY/*.pdf
do
# take action on each file. $f store current file name
#print
echo "Printing File: $f To: $PRINTER"
lpr -P $PRINTER $f
# This will remove the file.....
echo "Deleting File: $f"
rm "$f"
done
echo "Script Complete... now repeat until killed..."
sleep 5
done
and the automator script if you want to see output, keep the app with the script
choose a run apple script option:
on run {input, parameters}
tell application "Finder" to get folder of (path to me) as Unicode text
set workingDir to POSIX path of result
tell application "Terminal"
do script "sh " & "'" & workingDir & "script1.sh" & "'"
end tell
return input
end run
I know this is an older thread, but it's still the top Google search for 'silent printing' so I'll add my findings for the benefit of anyone coming across this now.
We had a similar issue with printing labels of various types to various printers for a stocksystem. It took some trial and error, but we got around it by having the system create a pdf of the labels, with printer name and page qty's encoded in the pdf. All you then have to do is:
IN IE, go to Internet Options >> Security >> Trusted Sites >> Sites
Clear 'Require server verification (https:) for all sites in this zone'
add "http://[yoururl]"
and the pdf will print out automatically.
When we originally set this up we were using Chrome as the default browser, but in September 2015, Chrome dropped the ability to run NPAPI plugins. This meant that you could no longer select the Adobe pdf plugin as the default pdf handler, and the built in pdf plugin does not handle silent printing :-(
It does still work in Internet Explorer (IE11 at time of writing) but I've not tried any other browsers.
HTH
Cheers,
Nige
I wrote a python tsr that polled the server every so often (it pulled its polling frequency from the server) and would print out to label printer. Was relatively nice.
Once written in python, I used py2exe on it, then inno setup compiler, then put on intranet and had user install it.
It was not great, but it worked. Users would launch it in the morning, and the program would receive the kill switch from the server at night.
I have it working all day long using a simple JSP page and the Java PDF Renderer library (https://pdf-renderer.dev.java.net). This works because Java prints using the OS and not the browser. Supposedly "silent printing" is considered a browser vulnerability/exploit and was patched after IE 6 so good luck getting it to work via Javascript or Active X. Maybe its possible but I couldn't get it to work without Java.
I have to be honest, I am kinda thinking out loud here.. But could it not be done with an applet or some sort (be it Java or whatever) that is given trusted permissions (such as that within the Intranet zone) or something?
May be worth investigating what permissions can be given to each zone?
Following a Google, I think you definately have a challenge, so far most of the articles I have seen involve printing to printers connected to the server.
If its internal, would it be possible to route printing from the server to department/user printers or something?
If it is just an internal application, then you can avoid printing from the browser, and send a printout directly from the server to the nearest printer to the user.
I'm on the same issue here, this is what i learn so far.
A.: You need to setup an IPP PrintServer
You have multiple print server implementations you may try.
Hardware IPP print server: like DLINK DPR-1020 or similar, some printer have this functionality builtin.
Linux server with CUPPS : http://www.howtoforge.com/ipp_based_print_server_cups
XP-Pro server with ISS: http://www.michaelphipps.com/ipp-print-server-windows-xp-solution
B.: You need to make your WebApp a client of this IPP Server so you pick-process-send every user's print request to the PrintServer.
PHP::PRINT::IPP is a php lib you may try (it's well tested on cups servers).
You should have a look at PrintNode. They provide a silent remote printing services for web applications. You install a piece of software on the desktop which syncs to their servers. You can then send printjobs using an json request and they are instantly printed out.

Resources