Identifying actual branch names of git commits - python-3.x

This is a simple git repository. I have tagged the commits with numbers for easy referencing. The repo has the following branches:
master: 13 commits (1,2,3,4,5,6,7,8,9,10,11,12,13)
new_branch: 8 commits (1,2,3,4,5,6,14,15)
test_branch: 3 commits (1,2,3)
yet_another_branch: 14 commits (1,2,3,4,5,6,7,8,9,10,11,12,16,17)
Commit 5,6 belong to a pull request thus the blue portion with 5,6 is not a branch.
Please note that commit 1,2 are considered as part of all branches, but I want to consider all black colored commits as part of the master. Similarly, for 'test-branch', I want to consider only commit 3 as part of the branch.
from git import Repo
git_url = "https://github.com/unimamun/test-repo.git"
repo_dir = "/mnt/hdd/aam/J2_Repos/test-repo/test-repo"
repo = Repo.clone_from(git_url, repo_dir)
# get all commits by branches
def get_commits(repo, ref_name):
commits = []
for commit in repo.iter_commits(rev=ref_name):
commits.append(commit)
return commits
print('\nCommits in Branches:')
for ref in repo.references:
print(ref.name,': ', str(len(get_commits(repo, ref.name))))
print('\nCommits in master:')
commits = list(repo.iter_commits('master'))
commits.reverse()
i = 0
for commit in commits:
i += 1
print(i,': ', commit.hexsha)
# to see parents of the commit
#print('Parents: ',commit.parents)
From the above code, I have the following output:
Commits in Branches:
master : 13
origin/HEAD : 13
origin/master : 13
origin/new_branch : 8
origin/test-branch : 3
origin/yet_another_branch : 14
Commits in master:
1 : 694df9fee2f9c03a33979725e76a484bce1738a0
2 : c0fe1b76131b7fcb103f171fd93d85cda17b756c
3 : 0199ad335f65d52a2895a678a19e209e1e16a1a7
4 : dd0903259b0aadbf2d8fb00e566eee014264f7c0
5 : 7ed55c51e2527f47bc6344cd960ff5beb90cc65d
6 : d10f19c85fbc1c27b7719a2dc64989255697181d
7 : c41bdfaeae1f801776420ce161ca2555dffc5aad
8 : 56b5d6e1831a477c79e0fd336acc96ca266d5dea
9 : 6305a72d4e257ebe74b10ca538906f1eceb091bf
10 : 4c5d1ebe5f2f8168ee8bf4a969855821d04caf09
11 : 362bc52be00af3fb917196cf27a8ddc0bb8fd4ba
12 : 5a70a46394eb08b4b48f9eb05798048ca7269a9d
13 : f4a8bdd318b2678191d06616a55df26416a28363
I want the following output. So that 'master' is printed for every black dots in the figure and other branch names for non-black color commits (in this case for green colored commit 3, test-branch should be printed)
Commits in master:
1 : 694df9fee2f9c03a33979725e76a484bce1738a0 master
2 : c0fe1b76131b7fcb103f171fd93d85cda17b756c master
3 : 0199ad335f65d52a2895a678a19e209e1e16a1a7 test-branch
4 : dd0903259b0aadbf2d8fb00e566eee014264f7c0 master
5 : 7ed55c51e2527f47bc6344cd960ff5beb90cc65d master
6 : d10f19c85fbc1c27b7719a2dc64989255697181d master
7 : c41bdfaeae1f801776420ce161ca2555dffc5aad master
8 : 56b5d6e1831a477c79e0fd336acc96ca266d5dea master
9 : 6305a72d4e257ebe74b10ca538906f1eceb091bf master
10 : 4c5d1ebe5f2f8168ee8bf4a969855821d04caf09 master
11 : 362bc52be00af3fb917196cf27a8ddc0bb8fd4ba master
12 : 5a70a46394eb08b4b48f9eb05798048ca7269a9d master
13 : f4a8bdd318b2678191d06616a55df26416a28363 master
I need to iterate from commit 1 to 13 and along the way I need to determine which commit belong to which branch. Thanks a lot.

As you note:
commit 1,2 are considered as part of all branches
That is, the set of reachable commits from any given branch, as determined by starting at the branch tip commit and working backwards through the Directed Acyclic Graph of commits, always includes commits 1 and 2.
but I want to consider all black colored commits as part of the master [branch]
In that case, start by finding the graph of all commits. As you probably know, a graph is defined as G = (V, E) where V is the set of all vertices and E is the set of all edges. Git stores the vertex and edge data together, in a commit: the commit's identity is its hash ID and its edges—outgoing arcs, really, since this is a directed graph—are its parent commit hash IDs.
Next, use the name you wish to designate as the "most important" branch (i.e., master) to find the hash ID of its tip commit. Assign this commit to the master set. Walk the reachable portion of the graph, starting from this commit, adding each commit to the set of commits in master.
Now, for each remaining branch—in some order, and this order will determine your results in many cases, so you may wish to use a topological sort—start at the tip of the branch and walk the reachable portion of the graph:
For any commit that is already assigned to some branch, ignore it—and you can immediately stop walking the graph at this point since all its predecessors will, by definition, be assigned to some branch.
The set of commits you reached during this walk is the set of commits you wish to claim "belong to" this branch.
There are multiple ways to implement this, including walking a subgraph determined by set-subtraction: simply subtract each branch's subgraph from the original G.
If it's more convenient—it may well be, since you won't have to find G—you can do this in the other direction: start with master and find reachable commits that are not in some set that's initially empty. Add each commit to the set, while listing them as "in master". Then iterate through the remaining branches: if a commit is in the set-so-far it has already been claimed, else it gets claimed by this branch. The problem with working this way is that you might pick some branch (feature-X) that contains all commits that are contained by some other branch (develop) before picking the smaller branch (develop): you cannot do a topological sort without the full graph.
Once you have done this for all branch tips, you have now assigned each reachable-from-a-branch-tip commit to a single branch (instead of, as Git does, assigning it to every branch from which it is reachable).
Note that there may exist commits in the Git graph that are not reachable from any branch tip (e.g., are reachable from a tag but not from a branch). If you dig into the internals of Git, you can find commits that are reachable only from reflog entries, or even some that are completely unreachable, discoverable only by iterating through the entire object key-value database. The latter is essentially what git gc does: walk the database to find all objects, then do a mark-and-sweep garbage collection operation, much like Lisp would do, retaining reachable objects and discarding the unreachable ones.

You probably want to try the "--first-parent" option:
git log --oneline --first-parent master
Mathematically speaking, this is a graph, which makes that at merge point, no branch is supposed to be "more important" than the other one. But in the facts, the problem always raises and when performing a "merge" operation, one actually "brings" an external branch into the current one. This current branch is therefore stated as the first one inside the commit object.
If you try this on the master branch of a large project such as the linux kernel, you'll mainly fall on merge points, with only a few direct changesets on the branch.
And if this is precisely what you want to know about, you can additionally specify "--no-merges" to explicitly exclude merge points.
git --oneline --first-parent --no-merges master
This, for instance, would exclude points 4 and 7 from your graph.
Finally, to restrict a search to commits that only belongs to a specific branch and that are not inherited from the master one, use the ".." operator:
git log master..yourbranch
… would only show commits that are reachable from "yourbranch" but NOT from "master".

Related

How exactly do I set the protected branches regex format to limit what can be pushed to the repo?

I'm trying to figure out the regex pattern usage in GitLab in order to
Prevent any branch that does not follow the naming convention from be pushed up
Specify for example only these name formats can be used and at least 1 must be used: ^(bug)?(release)?(feature)?/.*\n
Has anyone done this with GitLab and can assist?
Tried the 'Settings > Repository > Protected branches > Protect a branch wildcard' to try and do this but it does not appear to work. I get 0 matches
Protected branch rule patterns do not support regex. Wildcard (*) is the only supported metacharacter. If you want a similar effect to you have in your regex ^(bug|release|feature).*, you will need to make multiple protected branch rules.
For example, you could make three rules:
bug*
feature*
release*

How to get all merged commits in git that includes a specific sentence?

I want to write a function that get all merged commits on my master, and check for merged commits with a specific sentence.
I have written this function, and it gets only the last commit that includes this sentence, it is as shown here:
def get_commit_message():
commit_message = subprocess.check_output(["git", "log", "-1", "--pretty=format:\'%B\'", "--grep","THE REQUIRED CHANGES"], stderr=subprocess.STDOUT).decode("utf-8").split('\n')
return commit_message
How can i find each and every merged commit message in master that has "THE REQUIRED CHANGES" in it, and not only the last one that has it.
git log -1 ... will limit the output of git log to one single commit.
Drop the -1.

What does `only: -master` in gitlab-ci.yml match?

We have a .gitlab-ci.yml file containing lines like
a_task:
only:
- /^production\/mybranch.*$/
which are clearly meant to match the target git ref.
But we also have:
another_task:
only:
- master
My question is: does this "master" match a part of the git ref as well (so that a tag my-master-123 would match, too) or is it a symbolic thing?
The reason why am asking is that there is also:
third_task:
only:
- tags
That would have to be symbolic, right?
Which would mean that the syntax does e.g. not support a branch named tags, right?
Update
Looks like there are special keywords, tags being one of them.
So indeed that would mean that refs with those special names (external, pipelines, tags, triggers, ...) would not be supported.
from the docs:
only and except are two keywords that set a job policy to limit when jobs are >created:
only defines the names of branches and tags the job runs for.
except defines the names of branches and tags the job does not run for.
Matching via regular expressions is supported, as in your first case, but not default. only: master tasks will run for all refs named master.

How to count total lines changed by a specific author in Gitlab CE?

Is there a method(API) I can invoke which will count the lines changed by a specific author in Gitlab CE?
for example:
someone add del
xxx 100 10
and i don't want to pull all code to my local, because it's so large.

Perforce - how to back-out changelist from master branch

I have following changelists in perforce:
1 - some work on //depot/A/file
2 - some work on //depot/A/file
3 - branching of //depot/A to //depot/B
4 .... - some work on //depot/A/file
And I want to backout changelist 2 on //depot/B.
I've tried following:
p4 sync //depot/B/file#1
p4 edit //depot/B/file
p4 sync //depot/B/file#2
....
but error occured on first line.
//depot/B/file#1 - no file(s) at that changelist number.
Is there any way how to achieve this without submitting into //depot/A branch?
Here's what I'd do:
p4 copy //depot/A/...#1 //depot/B/...
p4 submit
p4 merge //depot/A/...#2 //depot/B/...
p4 resolve -ay
p4 submit
p4 merge //depot/A/... //depot/B/...
p4 resolve -am
p4 resolve
p4 submit
You could potentially do this all within a single changelist as well, but it gets a little trickier then -- the above keeps it simple and leaves a history that is easy to follow (i.e. each revision is clearly "copied from this change," "ignored this change", or "merged these changes" rather than a single revision that mushes those actions all together).
You can't simply take out 2 from B because it came together from A as one change (1 & 2).
I think the only way to achieve this is:
roll back 3 on B (p4 edit //depot/B/file; p4 sync //depot/B/file#0; p4 submit //depot/B/file or p4 delete //depot/B/file; p4 submit //depot/B/file)
integrate 1 from A to B
integrate 4 from A to B
Having said that, this has two drawbacks:
if you ever want to re-integrate 2 from A to B in the future, P4 will be confused because it knows that it already has integrated 2 from A to B
if you want to integrate back from B to A, this will propagate the reversal of 2 on B back to A, which probably isn't what you want.
So, even though it's more elaborate, the only correct way to revert an integration is exactly what you don't want to do:
roll back 2 on A
integrate A to B
re-submit 2 on A
Based on your attempt to sync to //depot/B/file#1, I'm assuming the file did not previously exist on //depot/B/...?
If my assumption is correct, you'll want to delete the file:
p4 delete //depot/B/file
and submit it.
If my assumption is incorrect and your newly-branched file is #2 or higher, then:
p4 edit //depot/B/file#1
p4 resolve -ay //depot/B/file
p4 submit

Resources