JGit: Is there a thread safe way to add and update files

JGit: Is there a thread safe way to add and update files - multithreading

The easy way to add or update files in JGit is like this:
git.add().addFilepattern(file).call()
But that assumes that the file exists in the Git working directory.
If I have a multi-threaded setup (using Scala and Akka), is there a way to work only on a bare repository, writing the data directly to JGit, avoiding having to first write the file in the working directory?
For getting the file, that seems to work with:
git.getRepository().open(objId).getBytes()
Is there something similar for adding or updating files?

"Add" is a high-level abstraction that places a file in the index. In a bare repository, you lack an index, so this is not a 1:1 correspondence between the functionality. Instead, you can create a file in a new commit. To do this, you would use an ObjectInserter to add objects to the repository (one per thread, please). Then you would:
Add the contents of the file to the repository, as a blob, by inserting its bytes (or providing an InputStream).
Create a tree that includes the new file, by using a TreeFormatter.
Create a commit that points to the tree, by using a CommitBuilder.
For example, to create a new commit (with no parents) that contains only your file:
ObjectInserter repoInserter = repository.newObjectInserter();
ObjectId blobId;
try
{
// Add a blob to the repository
ObjectId blobId = repoInserter.insert(OBJ_BLOB, "Hello World!\n".getBytes());
// Create a tree that contains the blob as file "hello.txt"
TreeFormatter treeFormatter = new TreeFormatter();
treeFormatter.append("hello.txt", FileMode.TYPE_FILE, blobId);
ObjectId treeId = treeFormatter.insertTo(repoInserter);
// Create a commit that contains this tree
CommitBuilder commit = new CommitBuilder();
PersonIdent ident = new PersonIdent("Me", "me#example.com");
commit.setCommitter(ident);
commit.setAuthor(ident);
commit.setMessage("This is a new commit!");
commit.setTreeId(treeId);
ObjectId commitId = repositoryInserter.insert(commit);
repoInserter.flush();
}
finally
{
repoInserter.release();
}
Now you can git checkout the commit id returned as commitId.

Related

"index not backed by a repository" error from libgit2 when writing tree from index for 'pull' commit

When I do index.write_tree()) (index is an Index resulting from a merge using merge_commits), I get an error "Failed to write tree. the index file is not backed up by an existing repository". I have an existing repository. [It was bare in the original version of this post, but I have changed it to non-bare, and it is still not working.] What am I doing wrong?
More generally, I am trying to implement git pull (a fetch then a merge). In the catch-up case, I understand that after the fetch and the merge, I need to write out a commit. That is what I am trying to do. How do I do that?
Basically, my code is
let index = repo.merge_commits(&our_commit, &their_commit, Some(&MergeOptions::new()))?;
if !index.has_conflicts() {
let new_tree_oid = index.write_tree()?; // error occurs here
let new_tree = repo.find_tree(new_tree_oid)?;
//...
}
This is using the rust git2 crate, which wraps the libgit2 library.

You have an in-memory index, you don't have the repository's index. This is the distinction that is preventing you from writing it.
You have a few options for dealing with an in-memory index:
You can convert it to a tree (and then you could create a commit that uses that tree), using git_index_write_tree_to. The write_tree_to function will let you specify the repository that you want to write to.
You can make it the repository's index, using git_repository_set_index.
Though I would question why you're using git_merge_commits instead of just git_merge, which should take care of all of this for you, including dealing with conflicts. If you're truly doing a git pull emulation, then you'll need to cope with that.

node-storage close a storage and save changes before creating a new storage

I am using node-storage in the following code to store a value in a file, however when I create a new storage object changes from another storage object are not yet saved. I need a way to save the changes before creating the new storage object.
Below is a program called code.js which I am running like so in the console: node code.js. If you run it you will see that the first time it is run the key value pair doesn't yet exist however it does exist the second time.
key = "key"
storage = require('node-storage')
const store1 = new storage("file")
const store2 = new storage("file")
store1.put(key,'val')
console.log(store2.get(key))
My motivation for this is that I want to be able to have a function called "set" which takes a key and a value and sets the key value pair in a dictionary of values that is store in a file. I want to be able to refer to this dictionary later, with for example a 'get' function, and have the changes present.
I am thinking there might be a function called "save" or something similar that applies the changes to the file. Is there such a function or some other solution?

node-storage saves the changes in the dictionary to disk after every call to put or remove. This is not the issue.
Your problem is that the dictionary in store2 has not been updated with the new properties. node-storage only loads the file from disk when the object is first created.
My suggestion would be to only have one instance of storage per file.
However, if this is not possible, then you might want to consider updating store2's cache before you get the property. This can be done using:
store2.store = store2._load();
This may not be the best for performance, as _load loads the entire file from disk synchronously every time it is called, so try to limit its use.

Rename a file with github api?

I thought that the update file method of the Github API could be used to rename a file (by providing the new path as parameter) but it does not seem to work.
The only way to rename is to delete the file and to create a similar one with the new name?

I thought that the update file method of the Github API could be used to rename a file (by providing the new path as parameter) but it does not seem to work.
There's no way to rename a file with a single request to the API.
The only way to rename is to delete the file and to create a similar one with the new name?
That's one way, but the downside is that you get two commits in the history (one for the delete, and one for the create).
A different way is to use the low-level Git API:
https://developer.github.com/v3/git/
With that, you can modify the tree entry containing the blob to list it under a different name, then create a new commit for that tree, and finally update the branch to point to that new commit. The whole process requires more API requests, but you get a single commit for the rename.

I found this article useful: Renaming files using the GitHub api but it didn't work for me completely. It was duplicating files.
Since deleting files is available just through changing the tree I came up with such replacement of the tree in step 3 of that article:
{
"base_tree": "{yourbaseTreeSHA}",
"tree": [
{
"path": "archive/TF/service/DEV/service_SVCAAA03v3DEV.tf',
"mode": "100644",
"type": "blob",
"sha": "{yourFileTreeSHA}"",
},
{
"path": "TF/service/DEV/service_SVCAAA03v3DEV.tf",
"mode": "100644",
"type": "blob",
"sha": null,
}
],
}
and it really does the trick.
So to have the rename/move of the file done you need to do 5 calls to GitHub API but the result is awesome:
view of the commit on github

With the help of the following articles, I figured out how to rename a file with Github API.
node package: github-api
Commit directly to GitHub via API with Octokit
Commit a file with the GitHub API
First, find and store the tree that latest commit.
# Gem octokir.rb 4.2.0 and Github API v3
api = Octokit::Client.new(access_token: "")
ref = 'heads/master'
repo = 'repo/name'
master_ref = api.ref repo, ref
last_commit = api.commit(repo, master_ref[:object][:sha])
last_tree = api.tree(repo, last_commit[:sha], recursive: true)
Use The harder way described in article Commit a file with the
GitHub API to create a new tree. Then, do the rename just like the
nodeJs version does and create a new tree based on the below changes.
changed_tree = last_tree[:tree].map(&:to_hash).reject { |blob| blob[:type] == 'tree' }
changed_tree.each { |blob| blob[:path] = new_name if blob[:path] == old_name }
changed_tree.each { |blob| blob.delete(:url) && blob.delete(:size) }
new_tree = api.create_tree(repo, changed_tree)
Create a new commit then point the HEAD to it.
new_commit = api.create_commit(repo, "Rename #{File.basename(old_name)} to #{File.basename(new_name)}", new_tree[:sha], last_commit[:sha])
api.update_ref(repo, ref, new_commit.sha)
That's all.

For those who end up here looking for more options, there is a better way to rename a file/folder. Please refer to the below link:
Rename a file using GitHub api
Works for folder rename as well. There you would need to specify the folder path for new and old using the same payload structure as in the sample request in above link.

Subscribe to new file(s) in directory in Puppet

I know I can sync directory in Puppet:
file { 'sqls-store':
path => '/some/dir/',
ensure => directory,
source => "puppet:///modules/m1/db-updates",
recurse => true,
purge => true
}
So when the new files are added they are copied to '/some/dir/'. However what I need is to perform some action for every new file. If I "Subscribe" to such resource, I don't get an array of new files.
Currently I created external shell script which finds new files in that dir and executes action for each of them.
Naturally, I would prefer not to depend on external script. Is there a way to do that with Puppet?
Thanks!
The use case for that is applying changes to DB schema that are being made from time to time and should be applied to all clients managed by puppet. In the end it's mysql [args] < update.sql for every such file.

Not sure I would recommend to have puppet applying the db changes for me.
For small db, it may work but for real world db... you want to be aware of when and how these kind of changes got applied (ordering of the changes, sometime require temp disk space adjustement, db downtime, taking backup before/after, reorg,...), most of the times your app should be adapted at the same time. You want more orchestration (and puppet isn't good at orchestration)
Why not using a tool dedicated to this task like
liquid-base
rails db migrations and capistrano
...
A poor men solution would be to use vcs-repo module and an exec to list modified files since last "apply".

I agree with mestachs, puppet dealing with db updates it's not a great idea
You can try some kind of define:
define mydangerousdbupdate($name, $filename){
file { "/some/dir/$filename":
ensure => present,
source => "puppet:///modules/m1/db-updates/$filename",
}
exec{"apply $name":
command => "/usr/bin/mysql [args] < /some/dir/$filename > /some/dir/$filename.log",
creates => "/some/dir/$filename.log"
}
}
And then, you can instantiate with the different patches, in the preferred order
mydangerousdbupdate{"first_change":
name => "first",
filename => "first.sql",
}->mydangerousdbupdate{"second_change":
name => "second",
filename => "second.sql",
}

MongoDB GridFS driver in NodeJS overwrites files with the same name

I have the following code (removed error checking to keep it concise) which uses the node-mongodb-native.
var mongo = require('mongodb').MongoClient;
var grid = require('mongodb').GridStore;
var url = 'mongodb://localhost:27017/mydatabase';
mongo.connect(url, function(err, db) {
var gs = new grid(db, 'myfile.txt', 'w', {
"metadata": {
// metadata here
}
});
gs.open(function(err, store) {
gs.writeFile('~/myfile.txt', function(err, doc) {
fs.unlink(req.files.save.path, function (err) {
// error checking etc
});
}
});
});
If I run that once it works fine and stores the file in GridFS.
Now, if I delete that file on my system and create a new one with the same name, but different contents, and run it though that code again it uploads it. However, it seems to overwrite the file that is already stored in GridFS. _id stays the same, but md5 has been updated to the new value. So even though the file is different, because the name is the same it overwrites the current file in GridFS.
Is there a way to upload two files with the same name? If _id is unique, why does the driver overwrite the file based on file name alone?
I found a similar issue on GitHub, but I am using the latest version of the driver from npm and it does what I explained above.

Like a real filesystem, the filename becomes the logical key in GridFS for reading and writing. You cannot have two files with the same name.
You'll need to come up with a secondary index of some sort or a new generated file name.
For example, add a timestamp to the file name.
Or, create another collection that maps generated file names to the GridFS structure to whatever it is that you need.

To avoid having to create additional unique identifiers for your files you should omit the 'write mode' option. This will allow gridfs to create a new file even if it contains the exact same data.
'w' overwrites data, which is why you are overwriting the existing file.
http://mongodb.github.io/node-mongodb-native/api-generated/gridstore.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

JGit: Is there a thread safe way to add and update files - multithreading

Related

"index not backed by a repository" error from libgit2 when writing tree from index for 'pull' commit

node-storage close a storage and save changes before creating a new storage

Rename a file with github api?

Subscribe to new file(s) in directory in Puppet

MongoDB GridFS driver in NodeJS overwrites files with the same name

Categories

Resources