Github API to download a zipball/tarball which includes LFS files - github-api

I am retrieving a tarball from Github using the v3 API, i.e. https://api.github.com/repos/my-account/my-project/tarball/my-ref.
However, this project uses Git-LFS for some files, and the resulting archive doesn't contain the files but the LFS link:
version https://git-lfs.github.com/spec/v1
oid sha256:fc03a2eadf6ac4872de8ad96a865ed05c45c416c5ad40c9efa3c4bcbe5d0dc9e
size 1284
What can I do in order to get an archive having the LFS links replaced by the real file content?

The Git-lfs API shows how and where to make requests.
In your case, assuming you know the OID you're looking for (it's stored in the pointer) you should:
POST https://github.com/your-account/your-repo/objects with something like:
{
"operation": "download",
"objects": [
{
"oid": "fc03a2eadf6ac4872de8ad96a865ed05c45c416c5ad40c9efa3c4bcbe5d0dc9e",
"size": 1284
}
]
}
Maybe you can ommit the size part - it's not really specified. Also you can request several OIDs together in the batch request
The response will look something like this and will contain download links to the blobs themselves if they exist or some error for each of them (the response as a whole always returns as 200 if you are authenticated).

Related

How to assign tags to every file and then filter by these tags in GCS?

I'm using Google Cloud Storage to store 100-150 files, mostly pictures (<30MG) and a few videos (<100MG). The number of files should remain around what it is right now, so I'm not that worried about scalability. All of the files are in the same storage bucket.
My goal is to assign at least one tag (string) and a number of upvotes (integer) to every file.
I want to be able to:
Filter all of the files by their tag(s) and only retrieve the ones with said tag.
A file may have several tags, such as "halloween", "christmas" and "easter", and must be retrieved if any of its tags is specified
Then, sort the filtered files by their number of upvotes (descending order).
In other words, the end goal is to get a sorted list of the the filtered files' URLs (to show on a website similar to Imgur).
Right now, I've come up with two (potential) ways to do that:
CUSTOM METADATA KEY
When adding a file to the bucket, add a custom key for the tags and another one for the number of upvotes. Then, when I want to retrieve all files with the "halloween" tag, I can:
Loop through every file and get its metadata (doc)
Filter and sort all of the files based on the metadata's keys
Get the URL of each file
Seems like this could cause some performance issues, but would be an easy structure.
INDEX FILE
Create an index file (in JSON). When adding a file to the bucket, also add the file's info to the index...
Eg:
{
"files": [
{
"filename": "file1",
"path": "/path/to/file1",
"tags": ["tag1", "tag2", "tag3"],
"upvotes": 10
},
{
"filename": "file2",
"path": "/path/to/file2",
"tags": ["tag4", "tag5"],
"upvotes": 5
},
{
"filename": "file3",
"path": "/path/to/file3",
"tags": [],
"upvotes": 0
}
]
}
Then, when I want to retrieve all files with the "halloween" tag, I can:
Parse the json file to get all the files with the "halloween" tag
Sort based on the upvotes key
Get the URL of each file
I think this would be better performance-wise? I'm leaning towards this one right now...
But I don't love the idea of having an external index file.
Are both methods possible to implement? Is there a better way to do this?
Disclaimer: I'm pretty new to databases (especially GCS).
If not, which one would have the best performance? Doesn't need to be blazing fast, it's a school project.
Thanks!

ExpressJS: How to cache on demand

I'm trying to build a REST API with express, sequelize (PostgreSQL dialect) and node.
Essentially I have two endpoints:
Method
Endpoint
Desc.
GET
/api/players
To get players info, including assets
POST
/api/assets
To create an asset
And there is a mechanism which updates a property (say price) of assets, over a cycle of 30 seconds.
Goal
I want to cache the results of GET /api/players, but I want some control over it, so that whenever a user creates an asset (using POST /api/assets) and right after that a request to GET /api/players should give the updated data (i.e. including the property which updates for every 30 seconds) and cache it until it gets updated in the next cycle.
Expected
The following should demonstrate it:
GET /api/players
JSON Response:
[
{
"name": "John Doe"
"assets": [
{
"id":1
"price": 10
}
]
}
]
POST /api/assets
JSON Request:
{
"id":2
}
GET /api/players
JSON Response:
{
"name": "John Doe"
"assets": [
{
"id":1
"price": 10
},
{
"id":2
"price": 7.99
}
]
}
What I have managed to do so far
I have made the routes, but GET /api/players has no cache mechanism and basically queries the database every time it is requested.
Some solutions I have found, but none seem to meet my scenario
apicache (https://www.youtube.com/watch?v=ZGymN8aFsv4&t=1360s): But I don't have a specific duration, because a user can create an asset anytime.
Example implementation
I have seen (kind off) similar implementation (that I desire) in Github actions workflow for implementing cache, where you define a key and unless the key has changed it uses the same packages and doesn't install packages everytime, (example: https://github.com/python-discord/quackstack/blob/6792fd5868f28573bb8f9565977df84e7ba50f42/.github/workflows/quackstack.yml#L39-L52)
Is there any package, to do that? So that while processing POST /api/assets I can change the key in its handler, and thus GET /api/players gives me the updated result (also I can change the key in that 30 seconds cycle too), and after that it gives me the cached result (until it is updated in the next cycle).
Note: If you have a solution please try to stick with some npm packages, rather than something like redis, unless its the only/best solution.
Thanks in advance!
(P.S. I'm a beginner and this is my first question in SO)
Typically caching is done with help of Redis. Redis is in-memory key-value store. You could handle the cache in the following manner.
In your handler for POST operation update/reset cached entry for players.
In your handler for GET operation if the Redis has the entry in cache return it, otherwise do the logic query the data, add the entry to the cache and return the data.
Alternatively, you could use Memcached.
A bit late to this answer but I was looking for a similar solution. I found that the apicache library not only allows for caching for specified durations, but the cache can also be manually cleared.
apicache.clear([target]) - clears cache target (key or group), or entire cache if no value passed, returns new index.
Here is an example for your implementation:
// POST /api/assets
app.post('/api/assets', function(req, res, next) {
// update assets then clear cache
apicache.clear()
// or only clear the specific players cache by using a parameter
// apicache.clear('players')
res.send(response)
})

repos/{org}/{repo}/git/trees/{sha} to query files in repo returns element with type=commit - how to deal with that?

I have some code to retrieve a list of files from a repository (using REST API v3) and it worked great for many cases, but now I've hit a problem where it didn't work. Looking into this, I found that one of the elements of the response had this:
{
"mode": "160000",
"path": "folderA/folderB/folderC",
"sha": "84419db012d987a1705eea28055b278c17411a93",
"type": "commit"
}
If I look at that path using the browser, the folder is shown as:
- so after some confusion and embarrassement, I looked at things and concluded this must be a submodule (indeed it is mentioned in /.gitmodules.
I wonder how to best deal with that when I want to retrieve everything: currently my plan is to use path-entry to find a matching [submodule "path"] and then retrieve the tree from the repo using ther url from .gitmodules and sha from the response that I quoted. There is a path in .gitmodules, too - I guess that's just redundant? (Would be grateful for any other comments and suggestions. Maybe there is an easier approach to get thinghs?)
Also, I did not find doc about this - so is submodule the only reason for having commit in the respone, or could there be other cases to consider?
Update: it worked as I described! But I'm really concerned I might be hit by other unexpected items in the reply, so the question is really about documentation regarding the items in a response to querying the tress-API.

Rename a file with github api?

I thought that the update file method of the Github API could be used to rename a file (by providing the new path as parameter) but it does not seem to work.
The only way to rename is to delete the file and to create a similar one with the new name?
I thought that the update file method of the Github API could be used to rename a file (by providing the new path as parameter) but it does not seem to work.
There's no way to rename a file with a single request to the API.
The only way to rename is to delete the file and to create a similar one with the new name?
That's one way, but the downside is that you get two commits in the history (one for the delete, and one for the create).
A different way is to use the low-level Git API:
https://developer.github.com/v3/git/
With that, you can modify the tree entry containing the blob to list it under a different name, then create a new commit for that tree, and finally update the branch to point to that new commit. The whole process requires more API requests, but you get a single commit for the rename.
I found this article useful: Renaming files using the GitHub api but it didn't work for me completely. It was duplicating files.
Since deleting files is available just through changing the tree I came up with such replacement of the tree in step 3 of that article:
{
"base_tree": "{yourbaseTreeSHA}",
"tree": [
{
"path": "archive/TF/service/DEV/service_SVCAAA03v3DEV.tf',
"mode": "100644",
"type": "blob",
"sha": "{yourFileTreeSHA}"",
},
{
"path": "TF/service/DEV/service_SVCAAA03v3DEV.tf",
"mode": "100644",
"type": "blob",
"sha": null,
}
],
}
and it really does the trick.
So to have the rename/move of the file done you need to do 5 calls to GitHub API but the result is awesome:
view of the commit on github
With the help of the following articles, I figured out how to rename a file with Github API.
node package: github-api
Commit directly to GitHub via API with Octokit
Commit a file with the GitHub API
First, find and store the tree that latest commit.
# Gem octokir.rb 4.2.0 and Github API v3
api = Octokit::Client.new(access_token: "")
ref = 'heads/master'
repo = 'repo/name'
master_ref = api.ref repo, ref
last_commit = api.commit(repo, master_ref[:object][:sha])
last_tree = api.tree(repo, last_commit[:sha], recursive: true)
Use The harder way described in article Commit a file with the
GitHub API to create a new tree. Then, do the rename just like the
nodeJs version does and create a new tree based on the below changes.
changed_tree = last_tree[:tree].map(&:to_hash).reject { |blob| blob[:type] == 'tree' }
changed_tree.each { |blob| blob[:path] = new_name if blob[:path] == old_name }
changed_tree.each { |blob| blob.delete(:url) && blob.delete(:size) }
new_tree = api.create_tree(repo, changed_tree)
Create a new commit then point the HEAD to it.
new_commit = api.create_commit(repo, "Rename #{File.basename(old_name)} to #{File.basename(new_name)}", new_tree[:sha], last_commit[:sha])
api.update_ref(repo, ref, new_commit.sha)
That's all.
For those who end up here looking for more options, there is a better way to rename a file/folder. Please refer to the below link:
Rename a file using GitHub api
Works for folder rename as well. There you would need to specify the folder path for new and old using the same payload structure as in the sample request in above link.

Is there any way to retrieve the name for a gist that github displays

When I browse to a gist on gist.github.com it displays a friendly name. E.g. for
https://gist.github.com/stuartleeks/1f4e07546db69b15ade2 it shows stuartleeks/baz
This seems to be determined by the first file that it shows in the list of files for the gist, but I was wondering whether there is any way to retrieve this via the API?
Not directly, but you can get with the Gist API the JSON information associated to a gist, reusing the id of your url:
GET /gists/:id
In your case: https://api.github.com/gists/1f4e07546db69b15ade2
It includes:
"files": {
"baz": {
"filename": "baz",
and:
"owner": {
"login": "stuartleeks",
That should be enough to infer the name stuartleeks/baz.

Resources