wget a raw file from Github from a private repo - linux

I am trying to get a raw file from github private project using wget. Usually if my project is public it is very simple
For Public Repo This is my repo url (you don't have to click on it to answer this question)
https://github.com/samirtendulkar/profile_rest_api/blob/master/deploy/server_setup.sh
I click raw
After I lick raw My URL looks like this
https://raw.githubusercontent.com/samirtendulkar/profile_rest_api/master/deploy/server_setup.sh (Notice only the word "raw" is added to the URL)
which is awesome I then do
ubuntu#ip-172-31-39-47:~$ wget https://raw.githubusercontent.com/samirtendulkar/profile_rest_api/master/deploy/server_setup.sh
when I do ls it shows that the file has been downloaded
ubuntu#ip-172-31-39-47:~$ ls
'server_setup.sh'
For a Private repo The raw file comes with a token
https://github.com/samirtendulkar/my_project/blob/master/deploy/server_setup.sh
So far so good Now when I click Raw (see image above) My URL changes and has a token in it along with the "raw" prefix
https://raw.githubusercontent.com/samirtendulkar/my_project/master/deploy/server_setup.sh?token=AkSv7SycSHacUNlSEZamo6hpMAI6ZhsLks5b4uFuwA%3D%3D
The url has these extra parameters ?token=AkSv7SycSHacUNlSEZamo6hpMAI6ZhsLks5b4uFuwA%3D%3D
My wget does not work. How Do I fix this issue. By the way when I say it does not work I mean instead of the ls showing
ubuntu#ip-172-31-39-47:~$ ls
'server_setup.sh'
It shows as below
which is not making me run futher commands like
ubuntu#ip-172-31-39-47:~$ chmod +x server_setup.sh
and
ubuntu#ip-172-31-39-47:~$ sudo ./server_setup.sh
which I need to get the project on to AWS

The token is from the Personal Access Tokens section that you can find the details in Github.
With Personal Access Tokens, you can create one and pick the first option "repo" to get access control over the private repos for the token.
following line solved my problem which was not being able to download the file.
Hope this will help
wget --header 'Authorization: token PERSONAL_ACCESS_TOKEN_HERE' https://raw.githubusercontent.com/repoOwner/repoName/master/folder/filename

You can use wget's -O option when you're downloading just one file at a time:
wget -O server_setup.sh https://raw.githubusercontent.com/samirtendulkar/my_project/master/deploy/server_setup.sh?token=AkSv7SycSHacUNlSEZamo6hpMAI6ZhsLks5b4uFuwA%3D%3D
The downside is that you have to know the output file name, but I think that's OK if I understand your question well.

Related

Updating a file for a quick-pull using github cli

Currently in the github UI, a user can edit a file and create a new branch in a single action. This can also be done through the github api using something like this:
curl 'https://github.com/<my_org>/<my_repo>/tree-save/master/<path_to_file>' \
-H 'content-type: application/x-www-form-urlencoded' \
--data-raw 'authenticity_token=<generated_token>&filename=<filename>&new_filename=<filename>&content_changed=true&value=<new_contents_of_file>&message=Updated+file+in+my+repo&placeholder_message=Update+<filename>&description=&commit-choice=quick-pull&target_branch=<new_branch_name>&quick_pull=master&guidance_task=&commit=<target_commit_checksum>&same_repo=1&pr='
What I would like to be able to do, is perform the same action using the github cli* (gh). I have tried using the following commands:
gh api <my_org>/<my_repo>/tree-save/master/<path_to_file> -F "filename=<filename>" -F ...
and
gh api repos/<my_org>/<my_repo>/contents/<path_to_file> -F "filename=<filename>" -F ...
For both cases (and many variations on these options), I'm getting a 404** back. Any ideas what I'm doing wrong? Does the github cli even allow the functionality allowed in the above curl?
* For those curious, I want to use the CLI because of how it handles auth and it's statelessness. I can't generate a token to use, like in the curl above. And, due to multiple issues, I also can't clone the repo locally.
** I'm able to retrieve the file just fine using the simple GET command (the second command above without the '-F' flags)
After reading documentation, and then verifying by altering credentials, it appears to be a permissions issue. Evidently, for security reasons, if a token is used that does not meet the required permissions, a 404 is returned instead of a 403.
Interesting that I can still use the curl above through the browser. So, now i need to figure out why the gh cli token does not have the same permissions as my user.

gitlab API to download archive file in git gives bad file but good when called from local machine

I'm trying to retrieve a build file using the gitlab API. This file was created and stored as an artifact from an upstream pipeline. Running
curl -o download --location --header 'PRIVATE-TOKEN:{MY_API_TOKEN}' https://gitlab.foo.com/api/v4/projects/{PROJECT_ID}/jobs/artifacts/{REF_BRANCH}/download?job={JOB_NAME}
on my local machine gives me a proper build file once I run unzip download. However in the runner, the same command returns a much smaller file which I can't unzip. I've checked that the environment variables that are passed in the runner are right.
job in .gitlab-ci.yml
deploy_production_environment:
stage: deploy_prod
image:
name: banst/awscli
script:
- apk --no-cache add curl
- apk add unzip
- echo $JOB_ID
- echo $FE_BUILD_TOKEN
- echo "https://gitlab.foo.com/api/v4/projects/${PROJECT_ID}/jobs/artifacts/${CI_COMMIT_REF_NAME}/download?job=build_prod"
- aws configure set region us-east-1
- "curl -o download --location --header 'PRIVATE-TOKEN:${FE_BUILD_TOKEN}' https://gitlab.foo.com/api/v4/projects/${PROJECT_ID}/jobs/artifacts/${CI_COMMIT_REF_NAME}/download?job=build_prod"
- ls -l
- unzip download
- aws s3 cp build s3://$S3_BUCKET_PROD --recursive
gitlab job output:
`
output from my local terminal:
Why does the API call from inside the runner consistently result in this much smaller (corrupted?) file while the same call pulls the zip file down correctly on my local machine?
The first check to do when a curl brings back a "small" file it to read its content.
Often, the file is not so much corrupted but includes a text-based error message in it, which can give a clue as to the actual issue.
Adding -v to the curl command can also help illustrating the issue during the curl process (when executed in the context of the GitLab job).
Thank you to VonC for the debugging help, recommending the -v flag to the curl command. It turns out that the single quotes around 'PRIVATE-TOKEN:${FE_BUILD_TOKEN}' prevented the variable from being parsed to its correct string value which was giving a 401 'Permission Denied' error. Removing the single quotes did the trick.

Shell script wget download from S3 - Forbidden error

I am trying to download a file from Amazon's S3 using a shell script and the command wget. The file in cuestion has public permissions, and I am able to download it using a standard browsers. So far this I what I have in the script:
wget --no-check-certificate -P /tmp/soDownloads https://s3-eu-west-1.amazonaws.com/myBucket/myFolder/myFile.so
cp /tmp/soDownloads/myFile.so /home/hadoop/lib/native
The problem is a bit odd for me. While I am able to download the file directly from the terminal (just typing the wget command), an error pops up when I try to execute the shell script that contains the very same command line (Script ran with >sh myScript.sh).
--2014-06-26 07:33:57-- https://s3-eu-west-1.amazonaws.com/myBucket/myFolder/myFile.so%0D
Resolving s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)... XX.XXX.XX.XX
Connecting to s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)|XX.XXX.XX.XX|:443... connected.
WARNING: cannot verify s3-eu-west-1.amazonaws.com's certificate, issued by ‘/C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=Terms of use at https://www.verisign.com/rpa (c)10/CN=VeriSign Class 3 Secure Server CA - G3’:
Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 403 Forbidden
2014-06-26 07:33:57 ERROR 403: Forbidden.
Now, I am aware this can just be a begginer error from my side, but I am not able to detect any mispelling or error of any type. I would appreciate any help you can provide me to solve this issue.
As a note, I would like to notice that I am running the script in a EC2 instance provided by Amazon's Elastic MapReduce framework, if it has something to do with the issue.
I suspect that the editor you used to write that script has left you a little "gift."
The command line isn't the same. Look closely:
--2014-06-26 07:33:57-- ... myFolder/myFile.so%0D
^^^ what's this about?
That's urlencoding for ASCII CR, decimal 13 hex 0x0D. You have an embedded carriage return character in the script that shouldn't be there, and wget is seeing it as the last character in the URL, and sending it to S3.
Using the less utility to view the file, or an editor like vi, this stray character might show up as ^M... or, if they're all over the file, with you open it with vi, you should see this at the bottom of the screen:
"foo" [dos] 1L, 5C
^^^^^
If you see that, then inside vi...
:set ff=unix[enter]
:x[enter]
...will convert the line endings, and save the file in what should be a usable format, if this is really the problem you're having.
If you're editing files on windows, you'll want to use an editor that understands how to save files with newlines, not carriage returns.

How to get subfolders and files using gitlab api

I am using gitlab api to get the files and folders and succeded,
But I can able to get only directory names, not its subfolders and files.
So, how can i get full tree of my repository.
Please let me know.
Thanks in advance,
Mallikarjuna
According to the API, we can use
GET /projects/:id/repository/tree
to list files and directories in a project. But we can only get the files and directories in top-level of the repo in this way, and sub-directories of directories in top-level with param path.
If you wanna get directories of script/js/components, for example, you can use
GET /projects/:id/repository/tree?path=script/js/components
Rest API
You can use the recursive option to get the full tree using /projects/:id/repository/tree?recursive=true
For example : https://your_gitlab_host/api/v4/projects/:id/repository/tree?recursive=true&per_page=100
GraphQL API
You can also use the recently released Gitlab GraphQL API to get the trees in a recursive way :
{
project(fullPath: "project_name_here") {
repository {
tree(ref: "master", recursive: true){
blobs{
nodes {
name
type
flatPath
}
}
}
}
}
}
You can go to the following URL : https://$gitlab_url/-/graphql-explorer and past the above query
The Graphql endpoint is a POST on "https://$gitlab_url/api/graphql"
An example using curl & jq :
gitlab_url=<your gitlab host>
access_token=<your access token>
project_name=<your project name>
branch=master
curl -s -H "Authorization: Bearer $access_token" \
-H "Content-Type:application/json" \
-d '{
"query": "{ project(fullPath: \"'$project_name'\") { repository { tree(ref: \"'$branch'\", recursive: true){ blobs{ nodes { name type flatPath }}}}}}"
}' "https://$gitlab_url/api/graphql" | jq '.'
you should do url encoding to the full path of the file. for example lest assume that the path to file under your repository is: javascript/header.js
then you could use:
curl --head --header "PRIVATE-TOKEN: <your_access_token>" "https://<>/api/v4/projects//repository/files/javascript%2Fheader%2Ejs"
Of course, as mentioned in other responses, you have missed the path attribute of the gitlab repositories API which lets you browse the file hierarchy.
In addition, for simplicity, the python gitlab project exposes it through the projects API. Example:
# list the content of the root directory for the default branch
items = project.repository_tree()
# list the content of a subdirectory on a specific branch
items = project.repository_tree(path='docs', ref='branch1')
For getting the whole tree with sub-directories and files, you can pass a parameter called "recursive" to true
By default it's false
Api - {Gitlab_URl}/api/v4/projects/{Project_id}/repository/tree?recursive=true
Thanks!

wget and curl somehow modifying bencode file when downloading

Okay so I have a bit of a weird problem going on that I'm not entirely sure how to explain... Basically I am trying to decode a bencode file (.torrent file) now I have tried 4 or 5 different scripts I have found via google and S.O. with no luck (get returns like this in not a dictionary or output error from same )
Now I am downloading the .torrent file like so
wget http://link_to.torrent file
//and have also tried with curl like so
curl -C - -O http://link_to.torrent
and am concluding that there is something happening to the file when I download in this way.
The reason for this is I found this site which will decode a .torrent file you upload online to display the info contained in the file. However when I download a .torrent file by not just clicking on the link through a browser but instead using one of the methods described above it does not work either.
So Has anyone experienced a similar problem using one of these methods and found a solution to the problem or even explain why this is happening ?
As I can;t find much online about it nor know of a workaround that I can use for my server
Update:
Okay as was suggested by #coder543 to compare the file size of download through browser vs. wget. They are not the same size using wget style results in a smaller filesize so clearly the problem is with wget & curl not the something else .. idea's?
Updat 2:
Okay so I have tried this now a few times and I am narrowing down the problem a little bit, the problem only seems to occur on torcache and torrage links. Links from other sites seems to work properly or as expected ... so here are some links and my results from the thrre different methods:
*** differnet sizes***
http://torrage.com/torrent/6760F0232086AFE6880C974645DE8105FF032706.torrent
wget -> 7345 , curl -> 7345 , browser download -> 7376
*** same size***
http://isohunt.com/torrent_details/224634397/south+park?tab=summary
wget -> 7491 , curl -> 7491 , browser download -> 7491
*** differnet sizes***
http://torcache.net/torrent/B00BA420568DA54A90456AEE90CAE7A28535FACE.torrent?title=[kickass.to]the.simpsons.s24e12.hdtv.x264.lol.eztv
wget -> 4890 , curl-> 4890 , browser download -> 4985
*** same size***
http://h33t.com/download.php?id=cc1ad62bbe7b68401fe6ca0fbaa76c4ed022b221&f=Game%20of%20Thrones%20S03E10%20576p%20HDTV%20x264-DGN%20%7B1337x%7D.torrent
wget-> 30632 , curl -> 30632 , browser download -> 30632
*** same size***
http://dl7.torrentreactor.net/download.php?id=9499345&name=ubuntu-13.04-desktop-i386.iso
wget-> 32324, curl -> 32324, browser download -> 32324
*** differnet sizes***
http://torrage.com/torrent/D7497C2215C9448D9EB421A969453537621E0962.torrent
wget -> 7856 , curl -> 7556 ,browser download -> 7888
So I it seems to work well on some site but sites which really on torcache.net and torrage.com to supply files. Now it would be nice if i could just use other sites not relying directly on the cache's however I am working with the bitsnoop api (which pulls all it data from torrage.com so it's not really an option) anyways, if anyone has any idea on how to solve this problems or steps to take to finding a solution it would be greatly appreciated!
Even if anyone can reproduce the reults it would be appreciated!
... My server is 12.04 LTS on 64-bit architecture and the laptop I tried the actual download comparison on is the same
For the file retrieved using the command line tools I get:
$ file 6760F0232086AFE6880C974645DE8105FF032706.torrent
6760F0232086AFE6880C974645DE8105FF032706.torrent: gzip compressed data, from Unix
And sure enough, decompressing using gunzip will produce the correct output.
Looking into what the server sends, gives interesting clue:
$ wget -S http://torrage.com/torrent/6760F0232086AFE6880C974645DE8105FF032706.torrent
--2013-06-14 00:53:37-- http://torrage.com/torrent/6760F0232086AFE6880C974645DE8105FF032706.torrent
Resolving torrage.com... 192.121.86.94
Connecting to torrage.com|192.121.86.94|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 200 OK
Connection: keep-alive
Content-Encoding: gzip
So the server does report it's sending gzip compressed data, but wget and curl ignore this.
curl has a --compressed switch which will correctly uncompress the data for you. This should be safe to use even for uncompressed files, it just tells the http server that the client supports compression, but in this case curl does look at the received header to see if it actually needs decompression or not.

Resources