Branching & Merging strategy - Linux kernel

Branching & Merging strategy - Linux kernel - linux

Background
From Linux GitHub page in the last month, 500+ authors have pushed 1946 commits to all branches, excluding merges.
On master, 11,000+ files have changed and there are 549K additions and 308K deletions.
800K loc have been touched in 1 month by 600 devs.
All devs globally distributed and work independently. They are not managed by a single manager.
Linux kernel still works!!!
Workflow
Below are the two workflow that I came across:
TBD :
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/5yVy4.png
Gitflow workflow:
[![enter image description here][2]][2]
[2]: https://i.stack.imgur.com/KRcIZ.png
What is the work flow that a Linux kernel developer follow? Based on the workflow, what is the merging strategy? to keep master branch code with no conflicts.
Before git pushing the changes from a local repo(laptop), do you pre-verify the commit with some baseline? to verify VCS or non-VCS conflicts
In what cases, does Linux kernel developer tag/label a commit?

Page 123 of ProGit tells me that the Linux Kernel uses the "Dictator and Lieutenants Workflow" (Section Distributed Workflows in ProGit)
This way the verification of the commits is splitted into different stages and each blob is reviewed multiple times I assume. Since I am no Linux Kernel developer myself, I do not know the guidelines how the commits should be tagged.

Related

Managing large quantity of files between two systems

We have a large repository of files that we want to keep in sync between one central location and multiple remote locations. Currently, this is being done using rsync, but it's a slow process mainly because of how long it takes to determine the changes.
My current thought is to find a VCS-like solution where instead of having to check all of the files, we can check the diffs between revisions to determine what gets sent over the wire. My biggest concern, however, is that we'd have to re-sync all of the files that are currently in-sync, which is a significant effort. I've been told that the current repository is about .5 TB and consists of a variety of files of different sizes. I understand that an initial commit will most likely take a significant amount of time, but I'd rather avoid the syncing between clusters if possible.
One thing I did look at briefly is git-annex, but my first concern is that it may not like dealing with thousands of files. Also, one thing I didn't see is what would happen if the file already exists on both systems. If I create a repo using git-annex on the central system and then set up repos on the remote clusters, will pushing from central to a remote repo cause it to sync all of the files?
If anyone has alternative solutions/ideas, I'd love to see them.
Thanks.

What is the fastest way to keep a subversion repository in sync for migrating to a new environment?

I'm attempting to migrate large subversion repositories from a system with subversion 1.6 to one with subversion 1.7. When I make the final sync, I can take downtime and simply use rsync, but I didn't choose this because I wanted the new repositories to be created/upgraded with version 1.7.
So I chose to use svnsync because it can pick up where it left off and because the repositories are not mounted on the same device. The only problem is for large repositories svnsync copy-revprops takes FOREVER, as it goes through every revision over again.
So my question is can I do one of:
Reduce the time it takes to do svnsync copy-revprops
Skip copy-revprops entirely
Use a faster method of incrementally syncing repositories from one version of subversion to another.
?

According to http://svnbook.red-bean.com/en/1.7/svn.ref.svnsync.c.copy-revprops.html:
Because Subversion revision properties can be changed at any time,
it's possible that the properties for some revision might be changed
after that revision has already been synchronized to another
repository. Because the svnsync synchronize command operates only on
the range of revisions that have not yet been synchronized, it won't
notice a revision property change outside that range. Left as is, this
causes a deviation in the values of that revision's properties between
the source and mirror repositories. svnsync copy-revprops is the
answer to this problem. Use it to resynchronize the revision
properties for a particular revision or range of revisions.
This means that you can choose to skip copy-revprops if the revision properties have not been changing. Even if you're not sure, there is no point in doing copy-revprops multiple times--once at the very end of all sync's will suffice.

Perforce: How does files get stored with branching?

A very basic question about branching and duplicating resources, I have had discussion like this due to the size of our main branch, but put aside it is great to know how this really works.
Consider the problem of branching dozens of Gb.
What happens when you create a branch of this massive amount of information?
Am reading the official doc here and here, but am still confused on how the files are stored for each branch on the server.
Say a file A.txt exists in main branch.
When creating the branch (Xbranch) and considering A.txt won't have changes, will the perforce server duplicate the A.txt (one keeping the main changes and another for the Xbranch)?
For a massive amount of data, it becomes a matter because it will mean duplicate the dozens of Gb. So how does this really work?

Some notes in addition to Bryan Pendleton's answer (and the questions from it)
To really check your understanding of what is going on, it is good to try with a test repository with a small number of files and to create checkpoints after each major action and then compare the checkpoints to see what actual database rows were written (as well as having a look at the archive files that the server maintains). This is very quick and easy to setup. You will notice that every branched file generates records in db.integed, db.rev, db.revcx and db.revhx - let alone any in db.have.
You also need to be aware of which server version you are using as the behavior has been enhanced over time. Check the output of "p4 help obliterate":
Obliterate is aware of lazy copies made when 'p4 integrate' creates
a branch, and does not remove copies that are still in use. Because
of this, obliterating files does not guarantee that the corresponding
files in the archive will be removed.
Some other points:
The default flags for "p4 integrate" to create branches copied the files down to the client workspace and then copied them back to the server with the submit. This took time depending on how many and how big the files were. It has long been possible to avoid this using the -v (virtual) flag, which just creates the appropriate rows on the server and avoids updating the client workspace - usually hugely faster. The possible slight downside is you have to sync the files afterwards to work on them.
Newer releases of Perforce have the "p4 populate" command which does the same as an "integrate -v" but also does not actually require the target files to be mapped into the current client workspace - this avoids the dreaded "no target file(s) in client view" error which many beginners have struggled with! [In P4V this is the "Branch files..." command on right click menu, rather than "Merge/Integrate..."]
Streams has made branching a lot slicker and easier in many ways - well worth reading up on and playing with (the only potential fly in the ointment is a flat 2 level naming hierarchy, and also potential challenges in migrating existing branches with existing relationships into streams)
Task streams are pretty nifty and save lots of space on the server
Obliterate has had an interesting flag -b for a few releases which is like being able to quickly and easily remove unchanged branch files - so like retro-creating a task stream. Can potentially save millions of database rows in larger installations with lots of branching

In general, branching a file does not create a copy of the file's contents; instead, the Perforce server just writes an additional database record describing the new revision, but shares the single copy of the file's contents.
Perforce refers to these as "lazy copies"; you can learn more about them here: http://answers.perforce.com/articles/KB_Article/How-to-Identify-a-Lazy-Copy-of-a-File
One exception is if you use the "+S" filetype modifier, as in this case each branch will have its own copy of the content, so that the +S semantics can be performed properly on each branch independently.

Implementing an update/upgrade system for embedded Linux devices

I have an application that runs on an embedded Linux device and every now and then changes are made to the software and occasionally also to the root file system or even the installed kernel.
In the current update system the contents of the old application directory are simply deleted and the new files are copied over it. When changes to the root file system have been made the new files are delivered as part of the update and simply copied over the old ones.
Now, there are several problems with the current approach and I am looking for ways to improve the situation:
The root file system of the target that is used to create file system images is not versioned (I don't think we even have the original rootfs).
The rootfs files that go into the update are manually selected (instead of a diff)
The update continually grows and that becomes a pita. There is now a split between update/upgrade where the upgrade contains larger rootfs changes.
I have the impression that the consistency checks in an update are rather fragile if at all implemented.
Requirements are:
The application update package should not be too large and it must also be able to change the root file system in the case modifications have been made.
An upgrade can be much larger and only contains the stuff that goes into the root file system (like new libraries, kernel, etc.). An update can require an upgrade to have been installed.
Could the upgrade contain the whole root file system and simply do a dd on the flash drive of the target?
Creating the update/upgrade packages should be as automatic as possible.
I absolutely need some way to do versioning of the root file system. This has to be done in a way, that I can compute some sort of diff from it which can be used to update the rootfs of the target device.
I already looked into Subversion since we use that for our source code but that is inappropriate for the Linux root file system (file permissions, special files, etc.).
I've now created some shell scripts that can give me something similar to an svn diff but I would really like to know if there already exists a working and tested solution for this.
Using such diff's I guess an Upgrade would then simply become a package that contains incremental updates based on a known root file system state.
What are your thoughts and ideas on this? How would you implement such a system? I prefer a simple solution that can be implemented in not too much time.

I believe you are looking wrong at the problem - any update which is non atomic (e.g. dd a file system image, replace files in a directory) is broken by design - if the power goes off in the middle of an update the system is a brick and for embedded system, power can go off in the middle of an upgrade.
I have written a white paper on how to correctly do upgrade/update on embedded Linux systems [1]. It was presented at OLS. You can find the paper here: https://www.kernel.org/doc/ols/2005/ols2005v1-pages-21-36.pdf
[1] Ben-Yossef, Gilad. "Building Murphy-compatible embedded Linux systems." Linux Symposium. 2005.

I absolutely agree that an update must be atomic - I have started recently a Open Source project with the goal to provide a safe and flexible way for software management, with both local and remote update. I know my answer comes very late, but it could maybe help you on next projects.
You can find sources for "swupdate" (the name of the project) at github.com/sbabic/swupdate.
Stefano

Currently, there are quite a few Open Source embedded Linux update tools growing, with different focus each.
Another one that is worth being mentioned is RAUC, which focuses on handling safe and atomic installations of signed update bundles on your target while being really flexible in the way you adapt it to your application and environment. The sources are on GitHub: https://github.com/rauc/rauc
In general, a good overview and comparison of current update solutions you might find on the Yocto Project Wiki page about system updates:
https://wiki.yoctoproject.org/wiki/System_Update

Atomicity is critical for embedded devices, one of the reasons highlighted is power loss; but there could be others like hardware/network issues.
Atomicity is perhaps a bit misunderstood; this is a definition I use in the context of updaters:
An update is always either completed fully, or not at all
No software component besides the updater ever sees a half installed update
Full image update with a dual A/B partition layout is the simplest and most proven way to achieve this.
For Embedded Linux there are several software components that you might want to update and different designs to choose from; there is a newer paper on this available here: https://mender.io/resources/Software%20Updates.pdf
File moved to: https://mender.io/resources/guides-and-whitepapers/_resources/Software%2520Updates.pdf
If you are working with the Yocto Project you might be interested in Mender.io - the open source project I am working on. It consists of a client and server and the goal is to make it much faster and easier to integrate an updater into an existing environment; without needing to redesign too much or spend time on custom/homegrown coding. It also will allow you to manage updates centrally with the server.

You can journal an update and divide your update flash into two slots. Power failure always returns you to the currently executing slot. The last step is to modify the journal value. Non atomic and no way to make it brick. Even it if fails at the moment of writing the journal flags. There is no such thing as an atomic update. Ever. Never seen it in my life. Iphone, adroid, my network switch -- none of them are atomic. If you don't have enough room to do that kind of design, then fix the design.

Git/Linux: What is a good strategy for maintaining a Linux kernel with patches from multiple Git repositories?

I am maintaining a custom Linux kernel which is comprised of merged changes from a variety of sources. This is for an embedded system.
The chip vendor we are working with releases a board support package as a changes against a mainline kernel (2.6.31). I have since made changes to support our custom hardware, and also merged with the stable (2.6.31.y) kernel releases. I've also merged in bug fixes for a specific file system driver that we use, sometimes before the changes make it to the mainline kernel.
I haven't been very systematic about how I have managed the various contributing sources and my own changes. If the change was large I tended to merge; if it was small I tended to rebase the third party changeset on to my own. Generally speaking merge conflicts are rare, since most of my work affects drivers that are not in the mainline kernel anyway.
I'm wondering if there is a better way to manage all of this. One concern is that my changes are mixed in with merges. The history might look something like:
2.6.31 + board support package + my changes (1) + 2.6.31.12 changes + my changes (2) + file system driver update + my changes (3) + 2.6.31.14 changes + my changes(4) + ....
It worries me a bit that my changes are mixed in, sometimes on the other side of merges. Is there a better way to do this? In particular, is there a way to do this that will make life easier when I switch to a newer kernel?

I don't think it will be easy to clean up your current set-up, unless your history is reasonably short, but I would suggest this:
Set up a Master repository which has remotes set up for each of the other places that your code comes from -- the mainline kernel, patches, ...
Keep a separate branch specifically for the updates from your driver supplier.
When you fetch, the updates will not mess with your branches.
When you are ready to merge, merge into some kind of "release" branch. The idea is to keep each source separate from the others, except when it needs to be merged in. Base your changes off of this branch, merging/rebasing as necessary.
Here's a quick diagram which I hope is helpful:
mainline-\----------\-------------------------------\
\ \ /you---\---/-/ \
\release----\-/---/----/-/--------\-/ / --\-----
patches-----------------/---/ / /
/ /
driver-------------------------/--------------/
With so many branches, it is difficult to diagram effectively, but I hope that gives you an idea of what I mean. release holds the official code for your system, you holds your own modifications, driver holds patches from the driver supplier, patches holds patches from some other repo, and mainline is the mainline kernel. Everything gets merged into release, and you base your changes off of release but only interact by merging in each direction, not making changes directly to release.

I think the generally accepted best policy is
(a) patch sets
(b) topic branches
Topic branches are essentially the same but are regularly rebased onto mainline. Topgit is a known tool that makes handling topic branches 'easier' if you have a lot of them. Otherwise: plan ahead and limit the number of branches

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string