git p4 branches not exported - perforce

I'm trying to migrate some p4 depots to github using git-p4.py script but I have trouble in exporting branches.
p4 structure is like this:
//depot1/first_directory/second_directory/projects_directory [eg: project_AA] (multiple directories that I want to migrate to separate github repositories)
//depot1/versions_directory/project_AA-or-BB-or-CC_xx_yy [eg: project_AA_01_01) (other directories that contains different release versions of project_directories that should go in github repo project_aa, project_bb etc
P4 branches exist like AA-version_branch and have one or multiple mapping views.
I managed to get tags, commit history (without commits from //depot1/versions_directory/project_AA-or-BB- etc) and content.
The main issue that branches are detected when 'git p4 sync --detect-branches --verbose' is executed, but there are no branch resulted to be added in github.
I executed 'git p4 clone --detect-branches --verbose //depot1#all' just to check what branches will be exported at the depot level, but the result was far away from the expectation like:
remotes/p4/depot1/first_directory/second_directory/project_AA/directory_inside_project_AA
or remotes/p4/depot1/other_directory/tags/a_tag_or_label/project_BB/directory_inside_project_BB
But all resulted branches are not in p4 branch list
I tried the method described on #37607393 but without any result. I believe this is due to perforce repository structure.
Any info or idea that I should try further?

Related

Remove one specific empty git commit with command-line tool

For a nodejs command-line tool I add an empty commit to a repo and then want to remove it later.
Later I have at least 3 commits. The first one is a merge commit, the second one is the empty one I created and the third one is likely one from another now merged repo. Now that my tool has done it's task I want to remove the empty commit.
git rebase --onto emptyCommitID^ emptyCommitID
resulted in: fatal: Does not point to a valid commit 'emptyCommitID^'
(since the ID is the correct one I assume the commit is invalid due to it being empty)
git rebase --keep-base --onto thirdCommit^ headCommit
resulted in fatal: cannot combine '--keep-base' with '--onto'
trying rebase -i HEAD~3 after the tool had done it's main job resulted in:fatal: invalid upstream 'HEAD~3' which might be due to either the empty commit or the merged unrelated histories idk.
I do not want to use git filter-branch --prune-empty, because the tool shall leave other potentially empty commits untouched.
(The tool is for merging repos with unrelated histories. I create the empty commit so that files are staged when merged instead of committed directly which also happens when the --no-commit flag is set in an just initialized repo without prior commits)
maybe it is possible to solve this with git rebase --interactive, but I had the described problem with the invalid upstream and view this as very difficult to implement with a command line tool, mostly using exec to execute it's commands. Do you know a solution?
I think git rebase --onto emptyCommitID^ emptyCommitID should work.
fatal: Does not point to a valid commit 'emptyCommitID^' means that the emptyCommitID has no parent. It contradicts that the second one is the empty one and the first one is its parent.

branching when ther has been a a branch with the same name that doesn't exist anymore p4python

While testing my application using p4python I came across an intressting issue. I branch a while ago from a main stream directory to a testing directory, I did a revert on that branching since something was wrong with it so the testing branch disappeared (revert and submit). after fixing the issue, I decided to branch again with the same name but P4python said Can't populate target path when files already exist. That branch isn't there any more I don't understand why p4python would output such error. This is the code I use for branching:
result = p4.run("populate", path +"#"+ changelist, destination)
so my question is how to be able to branch again with the same name if the old branch wth that name is deleted?
The populate command only works for the specific case where you're creating a brand new branch; it doesn't handle any cases where you might potentially need to resolve the source against the target, so it will automatically fail if there are any files (even deleted ones) in the target.
If the branch was just for testing, you could obliterate it:
p4 obliterate -y destination/...
Or you could change your code to account for existing files:
p4.run("integrate", f"{path}#{changelist}", destination)
p4.run("resolve", "-as")
result = p4.run("submit", "-d",
f"integrated from {path}#{changelist} to {destination}")

How to get source branch from a pending cherrypick CL?

I am trying to find a source branch of a CL being cherrypicked.
I have the following scenario:
One CL containd same changes to 3 branches: A, B, C. Someone cherrypicked it to branch D.
Obviously when they were doing the cherrypick process, they had to put in a source branch and a target branch, possibly as a branch mapping.
However, when another user is given the pending CL number, how can they work out which of the branches A, B or C was used for cherrypicking?
Where is the information about branch mapping stored? Is there any command in p4 to obtain it?
I need this information before the pending CL is submitted.
I have checked Perforce documentations, but I haven't found anything helpful.
P4 describe command shows only target branch.
Use p4 resolved and/or p4 resolve -n to view the source of a pending integration.
If you're on another client, do p4 -H otherHost -c otherClient resolved to see resolved integrations for the owning client.
If the change is shelved, you can unshelve it (p4 unshelve -s CHANGE) and then run p4 resolved in your own client.
Note that this does not in itself tell you exactly what branch mapping was used (just the individual files), but in practice it's not usually hard to infer the branch mapping based on the paths of the individual files.

Git - max depth in gitignore/exclude

A short story: I have recently made a clean install of Arch Linux on my PC because my old install got very bloated with unnecessary packages and config directories. Now I want to keep my home directory clean and simple. I decided to use git to supervise every file and folder there but I can't just exclude every log(or any other constantly updating dir/file) as it is too much of a hassle.
The idea is to include only the first level of files and directories in $HOME/, $HOME/.config/, and $HOME/.local/share/. For instance, include .config/foo/ and exclude its contents i.e. .config/foo/* so I could check the git log when I uninstall a package what directory(es) did it create and remove them manually(of course, if I won't use it anymore)
I tried to accomplish this by adding this to my .git/info/exclude
*/*
*/*/*
*/*/*/*
*/*/*/*/*
.local/share/*/*
.local/share/*/*/*
.local/share/*/*/*/*
.local/share/*/*/*/*/*
.config/*/*
.config/*/*/*
.config/*/*/*/*
.config/*/*/*/*/*
because I read that git needs a separate wildcard for every directory level. As you probably have already understood - it didn't work.
So, the question is - how can I monitor only the files and directories in $HOME/, $HOME/.config/, and $HOME/.local/share/ without monitoring their contents. Thanks!
TL;DR
What you'll want is to use .gitignore to specifically ignore certain files and subdirectories:
*/
!.config
!.config/*
.config/*/
!.local
!.local/*
.local/*/
To see how this works, and what it does (and doesn't do) for you, read the long version. (The !.config/* is almost certainly unnecessary; I put it in when I had * as part of not saving any top level files, which isn't quite what you asked for. The same holds for !.local/*. Without actually testing it, though, I'm not sure if .config/afile matches the .config rule.)
(But note that you probably do want to source-control additional .config files. I also recommend doing this an entirely different way, using symlinks for the .foorc type files—that's what I do.)
Long
There isn't any maximum depth, other than any system-imposed maximum (which varies depending on your OS). But there's a big problem here: Git doesn't store directories.1
What Git does store, underneath its top level storage item which is the commit, are files (which Git calls blobs), with associated path names. If you ask Git to extract commit #1234567..., Git looks inside that commit, finds the path names of the various blobs, and creates directories (new, empty ones) if and when necessary to hold the specific blobs (i.e., files) that Git is extracting from that commit with the names they have as stored in that commit.
This doesn't mean that your idea is doomed, just that you're starting with a misconception. Git won't save the directory .config at all, for instance. It will just save the file .config/Trolltech.conf, for instance. If Git has saved that file in some commit, and you git checkout that specific commit, Git will create a new, empty .config if required. If the directory already exists, Git won't do anything about that. In some cases, such as moving from a commit in which that file exists to one in which it does not, Git will remove the directory as well, but in some cases it won't, and you will need to use git clean -d to make Git really remove it (if that's possible, i.e., if it's empty).
Having saved that particular file, if Git is being instructed to ignore the subdirectory .config/git, Git may not save the file .config/git/ignore. This is where things get complicated. You need to understand how Git commits work, what the index is and how (to some extent) it works, and what Git does to work with, and maintain, a work-tree.
1Git does store tree entries, which could work as a flag by which to save empty directories, but other parts of Git combine in strange ways to make this whole concept fail.
Git is built around the concept of commits
As we noted above, what Git stores, fundamentally, is the commit. A commit is a complete, mostly-standalone snapshot of some set of files, which Git calls blobs. (This deliberately ignores submodules and symbolic links, but they're stored as blobs as well, using tree entries of a type that distinguishes them from plain files.) I say "mostly-standalone" because each commit records some number of parent commit hash IDs, though most commonly, just one. A commit that stores three parent hash IDs depends on those three parent commits' existence: a repository that's missing the three parents is somehow incomplete.2 The parent linkage is not important for this particular application, but it's good to know how this works.
There is, though, one particularly difficult event in the life a commit: creating it. Once a commit is created, it is read-only. It has a unique hash ID, determined solely by the commit's content (including all its parent hash IDs). But what files go into a commit? This is the key question and is where .gitignore eventually comes into the picture.
2This is the essence of a shallow clone. A clone that is not shallow (and hence is complete) starts with the tip commits of each branch (and any tagged commits or annotated tag objects). These commits (or annotated tag objects) point back to earlier, ancestor, commits through their parent hash IDs. Since the repository is complete, those objects exist as well; they contain their parent hash IDs, and those commit objects exist, and so on. The whole process stops only when we reach some commit(s) that have no parent. Usually this is the first commit ever made, which obviously can't have a parent. Such a commit is called a root commit, and in any non-empty but complete repository, there is always at least one root commit.
The files in a new commit are set up in the index
Besides the repository itself—the repository being a database of Git objects, i.e., commits and blobs and the intermediate thing Git calls a tree (these store the files' names, among other data)—Git has this key data structure with three different names. It's variously called the index, the staging area, and the cache.
The index is normally pretty much invisible. There is one Git command, git ls-files, that can show you the contents of the index directly (git ls-files --stage, or even more verbosely, git ls-files --debug), but it's not really useful to end users. A good top-level description of the index, though, is that it's where you build your next commit.
When you run git commit, Git takes every file that is currently in the index, in whatever form it currently has in the index, and makes a new commit out of that. Those are the files stored in the new commit. The new commit's author and committer are you; the time stamp is "now"; and the parent of the new commit is whatever commit you had checked out before; but the files—the blobs and their associated names—are entirely set by whatever is in the index.3 Likewise, when you use git checkout to extract some particular commit, what Git does first is to copy that commit's files into the index.
Note that when you do make a new commit, that new commit becomes the current commit. When that happens, Git updates the current branch name—the branch you have checked out, such as master—so that it records the new commit. In fact, each branch name records just one hash ID. Git calls this the tip of the branch. As we saw in footnote 2 above, Git works backwards, starting from branch tips, to find all the commits contained within a branch. So making a new commit shoves the new commit's hash ID into the branch name table.
3Even if you use git commit -a or git commit <file>, Git really just copies files into the index—or sometimes, an (auxiliary) index—and builds the commit from that index.
The work-tree
All the files stored inside Git, both in the repository and in the index, are in a special, Git-only format. Few if any other programs on the computer can work with these files, so Git extracts each file into a usable version, where you can do work. This is your work-tree.
In general, every file that's in the current commit also appears in the work-tree. The current commit is, of course, the one you ran git checkout on. If you just ran git checkout master to check out the master branch, what you did in terms of current commit was to check out whatever commit the name master identifies: the tip commit of that branch.
As we mentioned above, all the files (blob objects) got copied into the index, at that point. Git was also able to use whatever was in the index to know what was in your work-tree before that point: for any file that was in the index (and hence in the work-tree) and now isn't in the index because of this checkout, Git should remove that file from the work-tree. And it does! For any file that Git has to replace in the index, or add to the index, Git should copy the index version to the work-tree—and it does.
What's in the index after the git checkout is exactly whatever blobs are (via any intermediate tree objects) in the commit you checked out. The work-tree versions of those files will match the index versions of those files, except that the work-tree versions are actually usable. The index versions of those files will match the commit's versions of those files—and in fact, they share the underlying storage, as the index stores just the path names and blob hash IDs.
Now, there may be files in the work-tree that Git doesn't know about. These files are, by definition, not in the index. These are untracked files. That is what an untracked file is, in Git: it's a file that's not in the index. There is nothing more to it.
(Well, you can remove a file from the index. Then it's not in the index, and hence untracked. That's not really anything more, but it's worth remembering.)
Ignoring untracked files
The problem with untracked files is that Git whines about them. :-) It's constantly griping at you, telling you that files A, B, and C are untracked. So this is where .gitignore comes in—but .gitignore is about the work-tree, and unlike commits, the work-tree does have directories.
You can list specific files in .gitignore. If those files are not in the index (are untracked), but are in the work-tree, Git would complain about them ... but then it sees that they're listed in .gitignore and shuts up.
You can also git add files en-masse, using git add . or git add --all. This has Git scan the work-tree for files, and upon finding them, git add each one to the index, to copy the work-tree version into the builds-the-next-commit index version. Clearly, if files A, B, and C are currently both untracked and ignored, though, Git shouldn't add them. So .gitignore also tells Git not to add existing untracked-and-ignored files to the index.
Existing files that are in the index are automatically tracked, so any en-masse git add that might potentially add those files, will add them, regardless of what's listed in .gitignore. In other words, adding a tracked file to .gitignore has no effect on it. Being in .gitignore only affects untracked files.
But that's files, not directories. This is where everything gets squirrelly. Files exist inside directories, in the normal file system (i.e., not in Git, but in the work-tree).
One of the big reasons Git has the index (and calls it the cache) is that looking at every file in a big file-tree tends to be extremely slow. Git can use the index to record information about all the tracked files, including information that speeds up en-masse git add --all style operations. That's fine for files that are in the index, but what about for whole subdirectories that (a) aren't in the index, so by definition they're untracked and (b) will be ignored, so they won't go into the index and will remain untracked?
Git can avoid scanning those subdirectories entirely. If .config/dir/ is going to be ignored, and Git has just come across the name .config/dir and it's a directory, why then, Git can just skip reading inside it. That's a lot faster than reading it and checking every file to see if it should be ignored.
When Git is scanning the work-tree, it starts at the top and reads the whole contents of the tree: all file names and all sub-directory names. It knows which are files and which are sub-directories, but it has not yet looked inside any of the subdirectories.
Now, Git checks all the files: are they in the index? If so, they're tracked: see if they should be updated. If not, they're untracked: see if Git should whine about them.
Next, Git checks all the sub-directories. For each sub-directory: are there any files for it that are in the index? If so, the sub-directory must be examined. But if not, is the sub-directory ignored? If so, don't even look inside it. Otherwise, look inside it, just as we would if there were files in the index.
Now, for each file or sub-directory, there can be one or more .gitignore entries. An entry ending with * matches files and directories. An entry ending with */ matches directories. An entry starting with ! means: explictly not ignored.
So, suppose Git is scanning the top level and comes across the name .a, and it's a file. Git will look for any ignore entry matching .a. If there's an entry */, well, that doesn't match .a; so .a is added, unless there's a later entry overriding it. There isn't, so we add the file .a.
Next, Git encounters .adir, which is a directory. There are no .adir files in the index, so a scan isn't forced, so Git will check for an ignore entry matching .adir. Since */ is the only match, Git gets to ignore the directory. It will now not look inside .adir at all (unless and until you somehow add .adir/file to the index, which forces Git to read .adir to check whether .adir/file still exists).
When Git comes across .config (which is a directory), there's a */ that says to ignore it, but it's overridden by !.config which says not to ignore it. There's a .config/* but this is just .config-the-directory, not .config/something. So !.config is the last applicable entry, and Git must scan .config.
Sooner or later,4 Git will look inside .config. It may find .config/afile; this matches !.config/*. The last entry that it matches tells Git that the file isn't ignored, so it will be added to the index. Then Git comes across .config/git, which is a directory. It matches !.config/*, then .config/*/; so it gets ignored. Git never looks inside .config/git at all.
This repeats for the rest of .config. There may be more .-files, which Git will process as usual, until Git comes across .local, which works just like .config here.
As always, remember that this cannot affect any existing commits. Checking out any existing commit that has some file that violates the .gitignore rules here will cause Git to extract that file, creating its parent directory or directories if needed. Moving from that commit to one that lacks that same file, Git will remove the file, and if the directory containing it goes empty, usually5 remove the directory as well.
4This is where depth-first vs breadth-first scan comes in. Git currently does ASCII-sorted, depth-first directory traversal (so it's actually "right now") because of the way Git organizes the index. It doesn't matter from our "what gets ignored and what doesn't" perspective, though.
5Every once in a while I see weird behavior here that convinces me that there must be some bugs in this. The occasional git clean -ndf to see what would be cleaned, perhaps followed by git clean -df to actually do the cleaning, is useful. But I can never reproduce it, and it's never important enough to try... :-)

How to let git check for updates on the master server?

I have very poor knowledge about git and would like to ask for help.
I have a linux(-only) application which shall only be "downloaded" (i.e. cloned) with git. On startup, the app shall ask the git "master server" (github) for whether there are updates.
Does git offer a command to check for whether there is an update (without really updating - only checking)? Furthermore, can my app read the return value of that command?
If you do not want to merge, you can just git fetch yourremote/yourbranch, the remote/branch specification usually being origin/master. You could then parse the output of the command to see if new commits are actually present. You can refer to the latest fetched commit as either yourremote/yourbranch or possibly by the symref FETCH_HEAD.
Note: I was reminded that FETCH_HEAD refers to the last branch that was fetched. Hence in general you cannot rely on git fetch yourremote with FETCH_HEAD since the former fetches all tracked branches, thus the latter may not refer to yourbranch. Additionally,
you end up fetching more than strictly necessary.
also refer to Jefromi's answer to view but not actually downloaded changes
the following are not necessarily the most compact formats, just readable examples.
That being said, here are some options for checking for updates of a remote branch, which we will denote with yourremote/yourbranch:
0. Handling errors in the following operations:
0.1 If you attempt to git fetch yourremote, and git gives you an error like
conq: repository does not exist.
that probably means you don't have that remote-string defined. Check your defined remote-strings with git remote --verbose, then git remote add yourremote yourremoteURI as needed.
0.2 If git gives you an error like
fatal: ambiguous argument 'yourremote/yourbranch': unknown revision or path not in the working tree.
that probably means you don't have yourremote/yourbranch locally. I'll leave it to someone more knowledgeable to explain what it means to have something remote locally :-) but will say here only that you should be able to fix that error with
git fetch yourremote
after which you should be able to repeat your desired command successfully. (Provided you have defined git remote yourremote correctly: see previous item.)
1. If you need detailed information, git show yourremote/yourbranch and compare it to the current git show yourbranch
2. If you only want to see the differences, git diff yourbranch yourremote/yourbranch
3. If you prefer to make comparisons on the hash only, compare git rev-parse yourremote/yourbranch to git rev-parse yourbranch
4. If you want to use the log to backtrack what happened, you can do something like git log --pretty=oneline yourremote/yourbranch...yourbranch (note use of three dots).
If you really don't want to actually use bandwidth and fetch new commits, but just check whether there is anything to fetch, you can use:
git fetch --dry-run [remote]
where [remote] defaults to origin. You'll have to parse the output, though, which looks something like this:
From git://git.kernel.org/pub/scm/git/git
2e49dab..7f41b6b master -> origin/master
so it's really much easier to just fetch everything (git fetch [remote]), and then look at the diff/log e.g. between master and [remote]/master.
I'd say git fetch is a potential solution. It only updates the index, not working code. In cases of large commit sets, this would involve a download of compressed files/info, so it may be more than you want, but it is the most useful download you can do.

Resources