Unable to use wget for downloading git hub search results - linux

I am trying to use wget to download github code search results into a logfile.
I've been using the following command :
wget -o logfile -r -l 2 https://github.com/search?l=Dockerfile&q=openjdk&type=Code&utf8=%E2%9C%93
I do however, get a robots.txt file that says the following :
# If you would like to crawl GitHub contact us at support#github.com.
# We also provide an extensive API: https://developer.github.com/
Do I need some sort of permission from github for this?
Can someone help?

I think the message is pretty clear: you're trying to crawl GitHub site and they don't like that.
They advise you to use the GraphQL API.
The v3 API is still REST, so you could do something like:
wget --output-document search-results.json --user <YOUR_GITHUB_ID> \
"https://api.github.com/search/code?q=openjdk+language:Dockerfile"

Related

cannot wget from uniprot, what does the ssv3 error mean?

EDITED
I am fairly new to bioinformatics, I have a script to download sequence data from uniprot, when I use the wget command however I am getting an error message for some of the weblinks:
OpenSSLL error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure
Unable to establish SSL connection.
So, some of the sequence data downloads (3 in total) and then I get this error message for each of the others.
I am using a virtual box with the ubuntu OS, and so using a bash script.
I have tried installing an updated version of wget but I get a message saying the latest version is installed. Has anyone got any suggestions?
As requested, script is as follows:
VAR=$(cat uniprot_id.txt)
URL="https://www.uniprot.org/uniprot/"
for i in ${VAR}
do
echo "(o) Downloading Uniprot entry: ${i}"
wget ${URL}${i}.fasta
echo "(o) Done downloading ${i}"
cat ${i}.fasta >> muscle_input.fasta
rm ${i}.fasta
done
to confirm, the weblinks that I create using this script are all valid so they open up the sequence data I require when I click the weblinks.
I also had trouble with this. Would you be able to share what changes you made to the SSL configuration please?
In the meantime, because I trust the URL domain, I was able to workaround the issue with this line:
wget ${URL}${i}.fasta --no-check-certificate
I think I need to update domain certificates somewhere, but I'm working on that.

How can I copy image from server to local using NodeJS or Commandline

I'm using nodeJS and I need to save images hosted at Facebook into my local server. currently I am executing command line using exec to get this working. something like
wget https://example.com/image.jpg -O ic_launcher.jpg
it works, but when I have a complex url like
wget https://scontent.xx.fbcdn.net/v/t34.0-12/20206198_1998942567005064_567078929_n.jpg?_nc_ad=z-m&oh=e92a33eb810eeb12f199a567cdaf035d&oe=5972D7E3 -O ic_launcher.jpg
it doesn't work because of the & that is on the url, how can I solve this problem? Thank you!
After I added "" to the url it worked, neat little trick but didn't know it. Well now I do.

Docker 1.6 and Registy 2.0

Has anyone tried successfully the search command with Docker 1.6 and the new registry 2.0?
I've set mine up behind Nginx with SSL, and so far it is working fine. I can push and pull images without problems. But when I try to search for them all the following command give a 404 response:
curl -k -s -X GET https://username:password#my-docker-registry.com/v1/search
404 page not found
curl -k -s -X GET https://username:password#my-docker-registry.com/v2/search
404 page not found
root#ip-10-232-0-191:~# docker search username:password#my-docker-registry.com/hello-world
FATA[0000] Invalid repository name (admin:admin), only [a-z0-9-_.] are allowed
root#ip-10-232-0-191:~# docker search my-docker-registry.com/hello-world
FATA[0000] Error response from daemon: Unexpected status code 404
I wanted to ask if anyone has any ideas why and what is the correct way to use the Docker client to search the registry for images.
Looking at the API v2.0 documentation, do they simply not support a search function? Seems a bit strange to omit such functionality.
At least something works :)
root#ip-10-232-0-191:~# curl -k -s -X GET https://username:password#my-docker-registry.com/v2/hello-world/tags/list
{"name":"hello-world","tags":["latest"]}
To Date - the search api is lacking from registry v2.0.1 and this issue is under discussion here. I believe search api is intended to land in v2.1.
EDIT: /v2/catalog endpoint is available in distribution/registry:master
Before new registry api:
If you are using REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY you may list the contents of that directory
user#host:~# tree $REGISTRY_FS_ROOTDIR/docker/registry/v2/repositories -L 2
***/docker/registry/v2/repositories
└── repository1
└── image1
This may be useful to make a quick web ui you can call to do this or if you have ssh access to the host storing the repositories:
ssh -T user#host -p <port> tree $REGISTRY_FS_ROOTDIR/docker/registry/ -L 2
Do look at the compose example which deploys both v1 & v2 registry behind an nginx reverse proxy
The latest version of Docker Registry available from https://github.com/docker/distribution supports Catalog API. (v2/_catalog). This allows for capability to search repositories.
If interested, you can try docker image registry CLI I built to make it easy for using the search features in the new Docker Registry v2 distribution : (https://github.com/vivekjuneja/docker_registry_cli)
if you're on windows, here's a Powershell script to query the v2/_catalog from windows with basic http auth.
https://gist.github.com/so0k/b59382ea7fd959cf7040
FYI, to use this you have to docker pull distribution/registry:master instead of docker pull registry:2. the registry:2 image version is currently 2.0.1 which does not come with the catalog endpoint.

Download file/folder from sharepoint using Curl/Wget automatically

I have been trying to use Curl and wget to download file from Sharepoint. I am planning to make it as Script which runs automatically everyday and download the file from URL.
I tried using CURL with following command
curl -O --user Myusername:Mypassword https://OurDomain.sharepoint.com/_XXX&file=IPS_cleaned.xlsx&action=default
But it gave me error about SSL connection. I got to know that there is some existing bug in CURL 7.35 So i downgraded it to 7.22. But still gives me same error.
I also tried using Wget
wget --user=Myusername --password=MyPassword --no-check-certificate https://OurDomain.sharepoint.com/_XXX&file=IPS_cleaned.xlsx&action=default
But it still gives me error -- Unable to establish SSL connection
Can someone please let me know how i can accomplish my task
UPDATE
I was able to resolve the error in CURL. Below is the command that i gave
curl -O -L --sslv3 -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13" --user Myusername:Mypassword 'https://OurDomain.sharepoint.com/_%7BB21r-9CA2-345DEF%7D&file=IPS_cleaned.xlsx&action=default'
Now what it downloads is a file, which when i open it shows me Login page of Sharepoint. It does not download the actual excel file.
Any reason?
Another potential solution to this involves taking your sharepoint link and replacing the text after the '?' with download=1:
This:
https://my.sharepoint.com/:u:/g/XXX/XXXX-bunchofRandomText?e=kRlVi
Becomes this:
https://my.sharepoint.com/:u:/g/XXX/XXXX-bunchofRandomText?download=1
Now, you can just:
wget https://my.sharepoint.com/:u:/g/XXX/XXXX-bunchofRandomText?download=1
*Note, this example used a single file and a link where anyone with the link could access the file (no credentials required)
Please use rclone
Download and install the latest one from https://rclone.org/downloads
First option: Use OneDrive to access SharePoint sites/personal folder. This option will help you to upload large files.
1.create rclone configurations using the rclone config command
2.Select New remote and give a name
3.Select cloud storage OneDrive
4.Leave client ID and secret as blank
5.Edit advanced config: n
6.Remote config: Use auto-config: y
7.Open the URL on the browser and give access to rclone
8.Select personal/shared site URL option
8a.Shared site URL option you have to give the site URL. ie; https://sharepoint.com/sites/SiteName
9.Select personal/Documents drive. Documents drive will show if you selected the shared site URL option in the 8th step
Save config and quit
And the configuration file contents will be like the following. If you selected the Personal option drive type will be personal.
[onedrive]
type = onedrive
token =
drive_id =
drive_type = documentLibrary
Second option: In this option, you can upload up to 2 GB-sized files.
1.create rclone configurations using rclone config command
2.Select New remote and give a name
3.Select cloud storage WebDAV
4.Give site URL, username and password
5.Save and quit
And the configuration file contents will be like the following. Password will be in an encrypted format.
vim /root/.config/rclone/rclone.conf
[sharepoint]
type = webdav
url = https://sharepoint.com/sites/SiteName/Documents
vendor = sharepoint
user =
pass =
Download a file from SharePoint.
rclone copy --ignore-times --ignore-size --verbose sharepoint:SourceFolder/file.txt DestFolder
Firefox plugin that captures the link with session ID etc.. and it provides a command you could paste in the console for curl or wget.
If anyone has a better suggestion please let me know.
It gives you a curl or wget command with headers, cookies and all, with a copy to clipboard button, right on the download dialogue.
Download URL: https://addons.mozilla.org/en-US/firefox/addon/cliget
Reference: https://superuser.com/questions/27243/how-to-find-out-the-real-download-url-on-download-sites-that-use-redirects/1239026#1239026
Struggled with the same issue myself, and had my not-so-automatic-but-man-so-convenient way, with a daily log-in.
logged into Sharepoint with a browser,
exported the cookie,
run the following command.
wget --cookies=on --load-cookies cookies.txt --keep-session-cookies --no-check-certificate -m https://yoursharepoint.com
And files were downloaded just fine.
For anyone using CURL to download a file on Sharepoint with an "Anyone with the link" download option. Below are the steps I had to follow to download. Essentially you have to use the cookie from the share link, and then download the file from a different download link they don't provide easily for you.
When sending the CURL command for the “share link” it returns a 302 message, a forward link, and a cookie. If we save that cookie and use it to hit a “download” link I am able to download the file. Essentially, Microsoft uses the initial “share link” to send the cookie to the browser, and then redirect to their “View File” website. On that website you need to use the cookie provided (authentication), and select your next function (On screen view, print, download, etc). When you click the download button you hit a different link. I was able to find this link by going to the "view page" website for the file/link, turning on developer tools, and watching the link the browser follows when hitting download. You can then replicate that link for each file. If we use that download link along with the cookie, we can download the file.
curl -i -c cookies.txt SHARE LINK
curl -o docsdownloaded.pdf -b cookies.txt DOWNLOAD LINK
Share Link Ex: https://tenant.sharepoint.com/:b:/s/Folder/EdNUf4xAVzFJgBoO0MqkfppR5tgobxLrmCnRqU4LFJQ?e=rOGNSD
Download Link Ex:https://tenant.sharepoint.com/sites/Folder/_layouts/15/download.aspx?SourceUrl=%2Fsites%2FFolder%2FShared%20Documents%2FGeneral%2FBig%2Dfile%2Epdf
Similar to the answer Zyglute gave, using cURL:
You can export your login cookie using the cookies.txt Chrome extension: https://chrome.google.com/webstore/detail/njabckikapfpffapmjgojcnbfjonfjfg
Then use the following code:
curl -b cookie.txt https://OurDomain.sharepoint.com/_XXX&file=IPS_cleaned.xlsx&action=default
At some point your Sharepoint session will expire (not sure how long that takes), and you will need a new cookie file.
EDIT: If a malicious user gets a hold of your cookie.txt, they could get into your SharePoint account, so be sure to keep it safe.
Use wget adding &download=1 at the end of the link.
wget "<yourlink>&download=1"
it will be download with <yourlink> string as name, then just mv with the correct name after.

File path for a Cron Job

Hi I want to run a cron job to call a PHP script on my server. I am using Cpanel from my web host and these are the options:
Minute:
Hour:
Day:
Month:
Weekday:
Command:
I am really struggling to point the command to my file I am using this line /home/abbeysof/public_html/adi/cron/daily.php but I am getting this error:
/bin/sh: /home/abbeysof/public_html/adi/cron/daily.php: Permission denied
I asked my web host for help and this is the response:
If you use cpanel to create it, it will fill in the path for you. Typically /home/username/public_html/etc
Can anyone please offer some advice?
Advise 1: use wget command, wget runs the PHP script exactly as if it was called from the web so the PHP environment is exactly the same of when calling the file from the web, it's easier to debug your script then.
wget -O - http://yourdomain.com/adi/cron/daily.php >/dev/null 2>&1
The cron jobs has to be created going into cPanel cron jobs menu. I don't understand if you have this clear by reading your hoster's answer.
And advise 2: change web hosting, try this one they don't leave you alone.
Sorry, I don't know anything about cpanel, but it sounds like:
if you created the file daily.php, then you need to change the permissions on it
if they created the file, then there's a bug in their creation routine.
Good luck!
try this one
/usr/bin/php -q /home/yourCpanelUsername/public_html/filename.php
for some cpanels it might be like this
/usr/local/bin/php -q /home/yourCpanelUsername/public_html/filename.php
Sounds like you need to make /home/abbeysof/public_html/adi/cron/daily.php executable.
The link might help you.
https://www.inmotionhosting.com/support/edu/cpanel/how-to-run-a-cron-job
There is difference if you are using VPS than sharing hosting for giving the command.
You may need to use user-agent & cPanel-Cron along with your url.
curl --user-agent cPanel-Cron http://example.com/cron.php

Resources