Currently I am struggling with mirroring a website using Wget.
Browsing the web I came out with the following command to mirror a complete website:
wget --mirror --convert-links --adjust-extension --backup-converted --page-requisites -e robots=off http://www.example.com
As expected, after running the command there is a folder called www.example.com containing all downloaded files. However, some background images are missing. Digging through the files and logs I found that wget seems to have a problem with quoted image URLs.
The website uses the following CSS to include a background image:
<div ... style="background-image: url("/path/to/image") ;..." ... />
Collecting the pages requisites wget parses the URL and tries to download the file,
http://www.example.com/"/path/to/image"
which obviously fails with an error 404:
--2018-01-08 18:04:00-- https://www.example.com/"/path/to/image"
Reusing existing connection to www.example.com:443.
HTTP request sent, awaiting response... 404 Not Found
2018-01-08 18:04:00 ERROR 404: Not Found
Unfortunately I cannot post the original domain for privacy reasons...
I already tried to find a solution on the web, but I did not manage to find the right keywords to search for, so as a last choice I must ask you for help.
Is there a way to tell Wget to ignore quotes inside URLs?
how can we access an Apache server using a Linux command, to retrieve file.
the file which is to be retrieved has been copied to a directory.
Without knowing more about what you are doing, I would suggest looking into wget or curl.
For example: to copy a file available on a web server via url to the current directory using the wget command:
wget http://www.example.com/path/to/file.txt
or, if you are accessing your web server by IP address:
wget http://192.168.1.1/path/to/file.txt
I have been trying to use Curl and wget to download file from Sharepoint. I am planning to make it as Script which runs automatically everyday and download the file from URL.
I tried using CURL with following command
curl -O --user Myusername:Mypassword https://OurDomain.sharepoint.com/_XXX&file=IPS_cleaned.xlsx&action=default
But it gave me error about SSL connection. I got to know that there is some existing bug in CURL 7.35 So i downgraded it to 7.22. But still gives me same error.
I also tried using Wget
wget --user=Myusername --password=MyPassword --no-check-certificate https://OurDomain.sharepoint.com/_XXX&file=IPS_cleaned.xlsx&action=default
But it still gives me error -- Unable to establish SSL connection
Can someone please let me know how i can accomplish my task
UPDATE
I was able to resolve the error in CURL. Below is the command that i gave
curl -O -L --sslv3 -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13" --user Myusername:Mypassword 'https://OurDomain.sharepoint.com/_%7BB21r-9CA2-345DEF%7D&file=IPS_cleaned.xlsx&action=default'
Now what it downloads is a file, which when i open it shows me Login page of Sharepoint. It does not download the actual excel file.
Any reason?
Another potential solution to this involves taking your sharepoint link and replacing the text after the '?' with download=1:
This:
https://my.sharepoint.com/:u:/g/XXX/XXXX-bunchofRandomText?e=kRlVi
Becomes this:
https://my.sharepoint.com/:u:/g/XXX/XXXX-bunchofRandomText?download=1
Now, you can just:
wget https://my.sharepoint.com/:u:/g/XXX/XXXX-bunchofRandomText?download=1
*Note, this example used a single file and a link where anyone with the link could access the file (no credentials required)
Please use rclone
Download and install the latest one from https://rclone.org/downloads
First option: Use OneDrive to access SharePoint sites/personal folder. This option will help you to upload large files.
1.create rclone configurations using the rclone config command
2.Select New remote and give a name
3.Select cloud storage OneDrive
4.Leave client ID and secret as blank
5.Edit advanced config: n
6.Remote config: Use auto-config: y
7.Open the URL on the browser and give access to rclone
8.Select personal/shared site URL option
8a.Shared site URL option you have to give the site URL. ie; https://sharepoint.com/sites/SiteName
9.Select personal/Documents drive. Documents drive will show if you selected the shared site URL option in the 8th step
Save config and quit
And the configuration file contents will be like the following. If you selected the Personal option drive type will be personal.
[onedrive]
type = onedrive
token =
drive_id =
drive_type = documentLibrary
Second option: In this option, you can upload up to 2 GB-sized files.
1.create rclone configurations using rclone config command
2.Select New remote and give a name
3.Select cloud storage WebDAV
4.Give site URL, username and password
5.Save and quit
And the configuration file contents will be like the following. Password will be in an encrypted format.
vim /root/.config/rclone/rclone.conf
[sharepoint]
type = webdav
url = https://sharepoint.com/sites/SiteName/Documents
vendor = sharepoint
user =
pass =
Download a file from SharePoint.
rclone copy --ignore-times --ignore-size --verbose sharepoint:SourceFolder/file.txt DestFolder
Firefox plugin that captures the link with session ID etc.. and it provides a command you could paste in the console for curl or wget.
If anyone has a better suggestion please let me know.
It gives you a curl or wget command with headers, cookies and all, with a copy to clipboard button, right on the download dialogue.
Download URL: https://addons.mozilla.org/en-US/firefox/addon/cliget
Reference: https://superuser.com/questions/27243/how-to-find-out-the-real-download-url-on-download-sites-that-use-redirects/1239026#1239026
Struggled with the same issue myself, and had my not-so-automatic-but-man-so-convenient way, with a daily log-in.
logged into Sharepoint with a browser,
exported the cookie,
run the following command.
wget --cookies=on --load-cookies cookies.txt --keep-session-cookies --no-check-certificate -m https://yoursharepoint.com
And files were downloaded just fine.
For anyone using CURL to download a file on Sharepoint with an "Anyone with the link" download option. Below are the steps I had to follow to download. Essentially you have to use the cookie from the share link, and then download the file from a different download link they don't provide easily for you.
When sending the CURL command for the “share link” it returns a 302 message, a forward link, and a cookie. If we save that cookie and use it to hit a “download” link I am able to download the file. Essentially, Microsoft uses the initial “share link” to send the cookie to the browser, and then redirect to their “View File” website. On that website you need to use the cookie provided (authentication), and select your next function (On screen view, print, download, etc). When you click the download button you hit a different link. I was able to find this link by going to the "view page" website for the file/link, turning on developer tools, and watching the link the browser follows when hitting download. You can then replicate that link for each file. If we use that download link along with the cookie, we can download the file.
curl -i -c cookies.txt SHARE LINK
curl -o docsdownloaded.pdf -b cookies.txt DOWNLOAD LINK
Share Link Ex: https://tenant.sharepoint.com/:b:/s/Folder/EdNUf4xAVzFJgBoO0MqkfppR5tgobxLrmCnRqU4LFJQ?e=rOGNSD
Download Link Ex:https://tenant.sharepoint.com/sites/Folder/_layouts/15/download.aspx?SourceUrl=%2Fsites%2FFolder%2FShared%20Documents%2FGeneral%2FBig%2Dfile%2Epdf
Similar to the answer Zyglute gave, using cURL:
You can export your login cookie using the cookies.txt Chrome extension: https://chrome.google.com/webstore/detail/njabckikapfpffapmjgojcnbfjonfjfg
Then use the following code:
curl -b cookie.txt https://OurDomain.sharepoint.com/_XXX&file=IPS_cleaned.xlsx&action=default
At some point your Sharepoint session will expire (not sure how long that takes), and you will need a new cookie file.
EDIT: If a malicious user gets a hold of your cookie.txt, they could get into your SharePoint account, so be sure to keep it safe.
Use wget adding &download=1 at the end of the link.
wget "<yourlink>&download=1"
it will be download with <yourlink> string as name, then just mv with the correct name after.
I have a problem while sending file from linux to SharePoint. Everything is fine if I am uploading to existing directory, I use this method:
curl --ntlm --user username:password --upload-file myfile.xls https://sharepointserver.com/sites/mysite/myfile.xls
Unfortunately problem arises when I point the target to non existing directory, like:
curl --ntlm --user username:password --upload-file myfile.xls https://sharepointserver.com/sites/mysite/nonexist/myfile.xls
I would like it to create all necessary directorie on the path. I've tried to use "--create-dirs" CURL option, but it doesn't work.
Any ideas how to achieve the goal? It doesn't have to be CURL actually, i can use different method available on linux.
As the name (CLIENT URL) suggests, you will not be able to create new directories on remote SERVERS involving http/https while uploading files.
For downloads involving http/https server, --create-dirs option is applicable only on local machines to create new directories (for instance, when you are downloading a content on to your local linux machine).
However, while using ftp/sftp to a server, you will be able to create new directories on the remote server.
Is it possible to download a file in say /home/... using wget to my local machine? I'm pretty newbish on the bash shell side so perhaps this is just a matter of using the options correctly. What I've gleaned is that something like this should work, but my test aren't downloading the file locally but placeing them within the folder i'm using wget in
root#mysite [/home/username/public_html/themes/themename/images]# wget -O "tester.png"
"http://www.mysite.com/themes/themename/images/previous.png"
--2011-09-08 14:28:49-- http://www.mysite.com/themes/themename/images/previous.png
Resolving www.mysite.com... 173.193.xxx.xxx
Connecting to www.mysite.com|173.193.xxx.xxx|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 352 [image/png]
Saving to: `tester.png'
100%[==============================================================================================>] 352 --.-K/s in 0s
2011-09-08 14:28:49 (84.3 MB/s) - `tester.png' saved [352/352]
Perhaps the above is a bad example but I can't seem how to figure out how to use wget (or some other command) to get something from a non web accessable directory (its a backup file) is wget the correct command for this?
wget uses the http (or ftp) protocol to transfer it's files, so no, you can't use it to transfer anything which is not availible through those services. What you should do is use scp. It uses ssh, and you can use it to get any file (which you have the permission to read, that is).
Say you want /home/myuser/test.file from the computer mycomp, and you want to save it as test.newext. Then you'd invoke it like this:
scp myuser#mycomp:/home/myuser/test.file test.newext
You can do a lot of other nifty stuff with scp so read the manual for more possibilities!
This belongs on superuser, but you want to use scp to copy the file to your local machine.
When a file isn't web accessible, you cant' get it with wget.