Download entire website with videos using wget

Download entire website with videos using wget - linux

I have been using wget to download website but I have come across a bit of trouble if the website has videos from youtube, vimeo or others.
I can't seem to get rid of the ads as well.
The website that I am trying to get at the moment is :
https://www.ctrlpaint.com
I only need it temporarily be cause I have to work at a place where there is no internet. So I don't want to go to the hassle of downloading all the videos from vimeo.
Thanks for the help, let me know if you need more precision or if you want me to try anything.
I'm using gentoo.
The command I used was:
$ wget \ --recursive \ --no-clobber \ --page-requisites \ --html-extension \ --convert-links \ --restrict-file-names=windows \ --domains website.org \ --no-parent \ website_to download
It left me with the full website but looks to connect to the internet for the videos.

That's because the videos are on a different host, I think.
This would work
wget -H -r --level=1 -k -p --no-clobber https://www.ctrlpaint.com/
The -H option includes other hosts. That being said, the video host here is vimeo, and when I tried it they detected the wget user agent and refused to actually send the video.
As an aside, this kind of thing is generally considered bad form, as the host you are mirroring has to pay for bandwidth. (And in fact may refuse to fulfill some requests, sending a too many requests error response.)

The reason why the videos are not downloading is because they are not a single file, they are a stream of multiple files or chunks.
Websites like Vimeo or YouTube will be most likely using DASH or HLS, which is all HTTP Video Streaming. This requires that you open the video with one of their players. After you make the initial request for the video to the server, the server sends back a manifest file with a list of all the links for the movie chunks. From there the player will send the subsequent requests for each of the movie chunks.
The server denies you access to the manifest or chunks when using wget or curl because there are some requirements and auth necessary for you to be able to get access to the files. The player takes care of all that, that's why you have to use one of their players.
You are probably in need of an app that can download the YouTube videos. I'm pretty sure you can find some options out there.
Good luck!

Related

Upload file on Linux (CLI) to Dropbox (via bash/sh)?

I need to save (and overwrite) a file via the cron (hourly) to my dropbox account. The file needs to be stored in a predefined location (which is shared with some other users).
I have seen the possibility to create a Dropbox App, but that create its own dropbox folder.
Also looked at Dropbox Saver but that seems for browsers.
I was thinking (hoping) something super lightweight, a long the lines of CURL, so i don't need to install libraries. Just a simple sh script would be awesome. I only need to PUT the file (overwrite), no need to read (GET) it back.
Was going thru the dropbox developer API documentation, but kind of got lost.
Anybody a good hint?

First, since you need to access an existing shared folder, register a "Dropbox API" app with "Full Dropbox" access:
https://www.dropbox.com/developers/apps/create
Then, get an access token for your account for your app. The easiest way is to use the "Generate" button on your app's page, where you'll be sent after you create the app. It's also accessible via the App Console.
Then, you can upload to a specified path via curl as shown in this example:
This uploads a file from the local path matrices.txt in the current folder to /Homework/math/Matrices.txt in the Dropbox account, and returns the metadata for the uploaded file:
echo "some content here" > matrices.txt
curl -X POST https://content.dropboxapi.com/2/files/upload \
--header "Authorization: Bearer <ACCESS_TOKEN>" \
--header "Dropbox-API-Arg: {\"path\": \"/Homework/math/Matrices.txt\"}" \
--header "Content-Type: application/octet-stream" \
--data-binary #matrices.txt
<ACCESS_TOKEN> should be replaced with the OAuth 2 access token.

#Greg's answer also works but seems like a long chore.
I used Dropbox's official command-line interface here: https://github.com/dropbox/dbxcli.
As the date of posting, it is working fine and provides lots of helpful commands for downloading and uploading.

Another solution I just tried is a bash utility called Dropbox-Uploader.
After configuration through the same steps as above (app creation and token generation), you can just do: ./dropbox_uploader.sh upload mylocal_file my_remote_file, which I find pretty convenient.

How to get realtime updates for post edits from the Instagram API when subscribed to a tag?

I've subscribed to realtime updates for a specific tag via the Instagram API using this code (as provided in the API docs):
curl -F 'client_id=CLIENT_ID' \
-F 'client_secret=CLIENT_SECRET' \
-F 'object=tag' \
-F 'aspect=media' \
-F 'verify_token=MY_SECURITY_TOKEN' \
-F 'object_id=TAG_NAME' \
-F 'callback_url=MY_CALLBACK_URL' \
https://api.instagram.com/v1/subscriptions/
This is working well, Instagram calls MY_CALLBACK_URL whenever there is a new post with the tag TAG_NAME.
My callback script fetches and stores all the data from Instagram in my local database so I don't have to fetch everything each time somebody visits my site. The problem is I don't get a notification when a post is edited or deleted, so often times the data in my local DB will be outdated.
To solve that I suppose I could ...
... set up a real time subscription for every single post I get (which doesn't sound like a good idea for obvious reasons)
... not keep a local copy of the data and instead fetch everything from Instagram every time somebody visits my site (which would probably push the API limits pretty quick)
What's the best practice here?

I think it's a bit of a grey area regarding storing data. I setup an identical setup with the real-time API that stored image URL's in a MySQL database.
Then, client-side I use the Jquery ImageLoaded library before showing images on the page to determine if they still exist or not. It's a bit crude but it works.

wget command in linux for downloading, But I dont want server to know, I downloaded something

I would like to know how to download data from server without knowing the server, or server must not contain any log of my downloading, Is there any way ??
Please Help.

In linux OR in any of the system, if you want to download something from server, you send a request to the server and the server sends some response back. Now,obviously every server at least keeps a log of what all actions are being implemented and what all responses are being generated. At the very minimal, all user history will be flushed at server in minimal server but,even then there is a possible chance that it'd keep a generated log of it's clients actions,although it depends more on the server ...
So, to be more clear, something like the data/log will be stored at the server about the downloading process SO THAT it keeps a track on your download and better serves you as a client.
There is nothing so wget command specific about it. From the linux.about.com's page entry about wget,
wget is a free utility for non-interactive download of files from the
Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval
through HTTP proxies.
wget is non-interactive, meaning that it can work in the background,
while the user is not logged on. This allows you to start a retrieval
and disconnect from the system, letting wget finish the work. By
contrast, most of the Web browsers require constant user's presence,
which can be a great hindrance when transferring a lot of data.
wget can follow links in HTML pages and create local versions of
remote web sites, fully recreating the directory structure of the
original site. This is sometimes referred to as ``recursive
downloading.'' While doing that, wget respects the Robot Exclusion
Standard (/robots.txt). wget can be instructed to convert the links in
downloaded HTML files to the local files for offline viewing.

How to Send Data to an sFTP server? and how to upload/download data to it?

I am completely new to sFTP (Secure File Transfer Protocol) Servers and would like to know how to send data to one.
Imagine I have set up an sFTP server, could someone provide me with the pseudo code (as I am not sure what specifics I'm required to give) for sending a .zip file to it using a Linux box on the command line.
Also could you provide me with the pseudo code that would be needed to extract that same data once it has been uploaded from that server.
Could I please ask that any code supplied be heavily commented (as I really want to understand this!)
Please be gentle with your comments, I am VERY new to all of this. I imagine I will have missed out something key that someone will need to no. If any additional information is required please let me know and I will of course supply it.
Thanks in advance. I really will appreciate any help/advise!

For a GUI, you need an SFTP client like FileZilla. There is a lot of them free.
Linux has a sftp command for bash.

From your client, you can use curl to upload and/or download files to/from your sftp server.
To upload a file:
curl -T /name/of/local/file/to/upload -u username:password sftp://hostname.com/directory/to/upload/file/to
To download a file:
curl -u username:password sftp://hostname.com/name/of/remote/file/to/download -o /name/of/local/directory/to/download/file/to

Send message from linux terminal to some web server

I am trying to figure out how I can post a message to an http server from the linux shell. What I want is for the shell to post the message and then I can write a small php program to reroute the message to its intended recipient based on the contents and the sender. I cant seem to find a command to do this in Linux. I would really like to stick to a built in utility.
If there is a better frame work that you can think of please let me know.
Thanks

The man page for wget has some examples, e.g.
wget --save-cookies cookies.txt \
--post-data 'user=foo&password=bar' \
http://server.com/auth.php

curl and wget can be used for performing http requests from the shell.
You may want to use some sort of authentication and encryption mechanism to avoid abuse of the URL

If you want to stick with built in tools use wget and refer to this SO post about posting data with wget: How to get past the login page with Wget?.
You're going to have to send your data in the post data section and format it on your server side PHP script.

You can use curl for this purpose. Have a look at the --data* and --form options in the manpage.

This is what curl is good at.

--post-data does not work for me because it will report "405 Method Not Allowed"
you can actually use wget as following to send some data to http server.
wget 'http://server.com/auth?name=foo&password=bar'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string