Perforce recommended way to work on a task while keeping stable state

Perforce recommended way to work on a task while keeping stable state - perforce

Recently I started working at a new place and my company uses Perforce as their source control.
From what I saw, my team members are working on a task and when they finish the task they just submit the files. In case they need to stop working on the current task and move to another one, they shelve the files and move to the different task. This means, there is a shelve per task, but there is no way to view local history of things that were done as part of the task. This results large changes at every submit, and it is not possible to easily return to a working state as part of the current task.
In the past, I have been working with git. With git, I would commit very often and I was able to easily view the history of the changes I have done, even in the short term.
For example, before renaming a variable I would always commit, and then if something gets messed up, I would just revert it without even thinking and trying to dive into debugging of what is wrong. As well, when developing a feature and having basic things that work, I would commit so I would just be able to easily return to that state.
What I started to do is manually copy the files I am working on to a local git repo and then commit things over there, and then copy them back to perforce before submitting them. This is definitely a bad idea.
I am aware to the fact the git and perforce are fundamentally different, and I wanted to know what is the recommended way to work when working on a large task at perforce without accidentally destroying my work during the development.
I am working on a gigantic project, and working with git-p4 and syncing all the changes is impossible. As well, I tried to look at: Perforce equivalent of git local commit but it still means there is a shelve saved for each state I want to save, which is not very convenient.

The standard way of doing this in Perforce is to use a branch; unlike shelves, branches maintain full version history of everything. There are three basic approaches to consider:
Create a single personal dev branch that you stabilize all your changes in before merging them back to the mainline branch. After everything has been stabilized and merged from your dev branch to the mainline, your dev branch can be updated to the latest mainline state and you have a "clean" basis for your next batch of changes.
Create a new task branch (or task stream if you're using streams) per major change. This can be a bit heavyweight if you have a lot of files in a "classic" branching model, but if you're using task streams, you can unload the task stream after the task is done to prune the unchanged branched files from the active depot history (similar to what happens when you squash a branch in git).
Create a personal server via p4 clone. This is the most git-like model -- you instantiate an entire new server on your local machine, where you can create all the branches you want with no impact on the shared server. When your changes are ready, you p4 push them back to the shared server. p4 fetch is used to pull newer changes from the shared server and "rebase" your local changes onto them if required.
I've usually opted for the first approach; the main potential drawback to it is that if you have multiple major destabilizing changes in flight at the same time, it's hard to isolate them from each other in a single branch, but in practice this is a situation that hasn't come up very often for me.

Related

How to work concurrently with terraform on the same "workspace"?

Assume the following scenario. There's one remote workspace that needs to be configured with terraform. Since there're multiple subtasks there are different persons working on them, concurrently. There's a single git repo and devops are working with branching approach. There's also a remote state configured for terraform. Now, consider the following scenario:
devops1 introduces some changes and applies them, then issues a PR
devops2 who is working on a different task cannot apply/test his/her changes because of the remote state being already modified by devops1 and not having this changes locally in the code (PR was not accepted and merged yet)
How this scenario can be addressed in the way other for having separate workspaces? What else comes to my mind is that all the changes to the remote state should be revoked before issuing a PR and applied after it's merged.
Anything else?

How to find the changelist creation time in a perforce sandbox

In my sandbox S, I have created a changelist X and it gets submitted to perforce as Y. From Y , I want to get the exact creation time of X. That is the first time this changelist was created.

The unit of versioning in Perforce is the submitted changelist; there is not generally a detailed record of everything that happened in the workspace prior to the submit, including edits made to the changelist while it was in a pending state. (If you want more fine-grained versioning, submit more fine-grained changelists.)
That said, if you're willing to do the work, you can parse this information out of the server journal files (which are primarily used for server recovery rather than end-user consumption, but since they represent a plaintext record of every database transaction you can mine a LOT of data out of them if you've got access and a good understanding of the server database schema). Look for modifications to the db.change table; each one is timestamped. If you need to know when files were opened prior to the creation of the changelist, those updates are in db.working.

Does GIT STASH persist even after a computer shutdown? [duplicate]

This question already has answers here:
How many / how long are stashes saved by git?
(2 answers)
Closed 7 years ago.
I've read about using git stash to save work on a particular branch when needing to work on another, but my question is do those saved changes only stay saved for a particular session, or would they remain saved until they are destroyed (even after rebooting a computer) and be recovered later?
The root of the problem is:
I have a computer with me at work which I develop on, and which cannot access the internet. Thus, I cannot push changes to git remotely. I would need to save them temporarily, shut down my computer, and push them when I get home. Is this possible?

Yes, the stash is persisted to disk, and thus survives reboot.
git doesn't retain any content in-memory (or in an alternate fragile state, such as unlinked files) between command invocations; doing so would require an out-of-process daemon or other component that isn't presently included -- thus, substantial extra complexity for no significant gain.
That said, given the workflow you've described, I don't see why you'd need to use the stash day-to-day when working disconnected. Just commit your changes locally, and push (without using --force) when connected. Depending on your team's workflow, it may be appropriate to rebase onto the current state of the branch, or to merge down new changes before pushing. Ask your team's dev lead which approach they prefer, if explicit workflow documentation local to your company or project isn't available.

How to move a perforce depot between two different servers such that revision history is copied but user info and workspaces are not?

I need to copy a depot from one Perforce server to another. The file revision history needs to be intact but the user information and workspace information can not be copied to the new server.
I've tried a standard checkpoint creation and restore procedure, but if there exist users or workspaces with the same name on both servers, the source server will overwrite this info on the destination server. This is pretty bad if those user accounts and workspaces do not have exactly identical details.
The goal of this sort of operation is to allow two separate, disconnected groups to view a versioned source tree with revision history. Updates would be single directional with one group developing and one just viewing. Each group's network is completely enclosed, no outside connections of any kind.
Any ideas would be appreciated, i've been busting my brains on this one for a while.
EDIT:
Ultimately my solution was to install an intermediate Perforce server on the same machine as my source server. Using that I could do a standard backup/restore from the source server to the intermediate server and then delete all unwanted meta data in the intermediate server before backing up from the intermediate server to the final destination server. Pretty complicated but it got the job done and it can all be done programatically in Windows Power Shell.

There are a few ways, but I think you are going about this one the hard way.
Continue to do what you are doing, but delete the db.user, db.view(I think) and db.group. Then when you start the perforce server, it will create these, but they will be empty, which will make it hard for anyone to log in. So you'll have to create users/groups. I'm not sure if you can take those db files from another server and copy them in, never tried that.
The MUCH easier way, make a replica. http://www.perforce.com/perforce/r10.2/manuals/p4sag/10_replication.html Make sure you look at the p4d -M flag to make sure it's a read only replica. I assume you have a USB drive or something to move between networks, so you can just issue a p4 pull onto the USB drive, then move the drive, and either run it off the USB, or issue another p4 pull, pulling to a final server. Never tried this, but with some work it should be possible, you'll have to run a server off the USB to issue the final p4 pull.
You could take a look at perforce git fusion, and make some git clones.

You could also look at remote depots. Basically you create a new depot on your destination server, and point it at a depot on your source server. This works if you have a fast connection between the 2 servers. Protections are handled by the destination server, as to who has access to that new depot. The source server can be set up to share it out as read only to the destination server. Here is some info
http://answers.perforce.com/articles/KB_Article/Creating-A-Remote-Depot
Just make sure you test it during a slow period, as it can slow down the destination server. I tried it from 2 remote locations, both on the east coast US, and it was acceptable, but not too useful. If both servers are in the same building it would be fine.

What's the best way to keep multiple Linux servers synced?

I have several different locations in a fairly wide area, each with a Linux server storing company data. This data changes every day in different ways at each different location. I need a way to keep this data up-to-date and synced between all these locations.
For example:
In one location someone places a set of images on their local server. In another location, someone else places a group of documents on their local server. A third location adds a handful of both images and documents to their server. In two other locations, no changes are made to their local servers at all. By the next morning, I need the servers at all five locations to have all those images and documents.
My first instinct is to use rsync and a cron job to do the syncing over night (1 a.m. to 6 a.m. or so), when none of the bandwidth at our locations is being used. It seems to me that it would work best to have one server be the "central" server, pulling in all the files from the other servers first. Then it would push those changes back out to each remote server? Or is there another, better way to perform this function?

The way I do it (on Debian/Ubuntu boxes):
Use dpkg --get-selections to get your installed packages
Use dpkg --set-selections to install those packages from the list created
Use a source control solution to manage the configuration files. I use git in a centralized fashion, but subversion could be used just as easily.

An alternative if rsync isn't the best solution for you is Unison. Unison works under Windows and it has some features for handling when there are changes on both sides (not necessarily needing to pick one server as the primary, as you've suggested).
Depending on how complex the task is, either may work.

One thing you could (theoretically) do is create a script using Python or something and the inotify kernel feature (through the pyinotify package, for example).
You can run the script, which registers to receive events on certain trees. Your script could then watch directories, and then update all the other servers as things change on each one.
For example, if someone uploads spreadsheet.doc to the server, the script sees it instantly; if the document doesn't get modified or deleted within, say, 5 minutes, the script could copy it to the other servers (e.g. through rsync)
A system like this could theoretically implement a sort of limited 'filesystem replication' from one machine to another. Kind of a neat idea, but you'd probably have to code it yourself.

AFAIK, rsync is your best choice, it supports partial file updates among a variety of other features. Once setup it is very reliable. You can even setup the cron with timestamped log files to track what is updated in each run.

I don't know how practical this is, but a source control system might work here. At some point (perhaps each hour?) during the day, a cron job runs a commit, and overnight, each machine runs a checkout. You could run into issues with a long commit not being done when a checkout needs to run, and essentially the same thing could be done rsync.
I guess what I'm thinking is that a central server would make your sync operation easier - conflicts can be handled once on central, then pushed out to the other machines.

rsync would be your best choice. But you need to carefully consider how you are going to resolve conflicts between updates to the same data on different sites. If site-1 has updated
'customers.doc' and site-2 has a different update to the same file, how are you going to resolve it?

I have to agree with Matt McMinn, especially since it's company data, I'd use source control, and depending on the rate of change, run it more often.
I think the central clearinghouse is a good idea.

Depends upon following
* How many servers/computers that need to be synced ?
** If there are too many servers using rsync becomes a problem
** Either you use threads and sync to multiple servers at same time or one after the other.
So you are looking at high load on source machine or in-consistent data on servers( in a cluster ) at given point of time in the latter case
Size of the folders that needs to be synced and how often it changes
If the data is huge then rsync will take time.
Number of files
If number of files are large and specially if they are small files rsync will again take a lot of time
So all depends on the scenario whether to use rsync , NFS , Version control
If there are less servers and just small amount of data , then it makes sense to run rysnc every hour.
You can also package content into RPM if data changes occasionally
With the information provided , IMO Version Control will suit you the best .
Rsync/scp might give problems if two people upload different files with same name .
NFS over multiple locations needs to be architect-ed with perfection
Why not have a single/multiple repositories and every one just commits to those repository .
All you need to do is keep the repository in sync.
If the data is huge and updates are frequent then your repository server will need good amount of RAM and good I/O subsystem

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string