How to store large amounts of text and retain versioning?

How to store large amounts of text and retain versioning? - text

Any tips on how to store large amounts of text, such as programming code. So I need to retain the tabs, spaces, etc?
Also how could i keep versions like say someone edits one line, i can see the changes that have been made?

That's what various revision control systems are for.
Any of git, cvs, rcs, subversion and a host of others will work.

I agree with other posters that you probably want to use what has already been done. Sometimes rolling your own can be fun.
You could write a wrapper for the command line diff utils. Each user could have their own config to choose their preferred editor. The script would made a copy of the file, so you would have an orig and new. When done editing the script would kick off a diff and store that to disk and delete the original backup of the file. This way, you would only store the latest versions plus all diffs so you can revert back and also see the changes.
I would keep a log of all diffs created and tag them in a csv with the userid of the person who modified the file and the timestamp of the modification.

Related

shell script to create backup file when creating new file in particular directory

Recently I was asked the following question in an interview.
Suppose I try to create a new file named myfile.txt in the /home/pavan directory.
It should automatically create myfileCopy.txt in the same directory.
A.txt then it automatically creates ACopy.txt,
B.txt then BCopy.txt in the same directory.
How can this be done using a script? I may know that this script should run in crontab.
Please don't use inotify-tools.

Can you explain why you want to do?
Tools like VIM can create a backup copy of a file you're working on automatically. Other tools like Dropbox (which works on Linux, Windows, and Mac) can version files, so it backs up all the copies of the file for the last 30 days.
You could do something by creating aliases to the tools you use for creating these file. You edit a file with the tools you tend to use, and the alias could create a copy before invoking a tool.
Otherwise, your choice is to use crontab to occasionally make backups.
Addendum
let me explain suppose i have directory /home/pavan now i create the file myfile.txt in that directory , immediately now i should automatically generate myfileCopy.txt file in the same folder
paven
There's no easy user tool that could do that. In fact, the way you stated it, it's not clear exactly what you want to do and why. Backups are done for two reasons:
To save an older version of the file in case I need to undo recent changes. In your scenario, I'm simply saving a new unchanged file.
To save a file in case of disaster. I want that file to be located elsewhere: On a different computer, maybe in a different physical location, or at least not on the same disk drive as my current file. In your case, you're making the backup in the same directory.
Tools like VIM can be set to automatically backup a file you're editing. This satisfy reason #1 stated above: To get back an older revision of the file. EMACs could create an infinite series of backups.
Tools like Dropbox create a backup of your file in a different location across the aether. This satisfies reason #2 which will keep the file incase of a disaster. Dropbox also versions files you save which also is reason #1.
Version control tools can also do both, if I remember to commit my changes. They store all changes in my file (reason #1) and can store this on a server in a remote location (reason #2).
I was thinking of crontab, but what would I backup? Backup any file that had been modified (reason #1), but that doesn't make too much sense if I'm storing it in the same directory. All I would have are duplicate copies of files. It would make sense to backup the previous version, but how would I get a simple crontab to know this? Do you want to keep the older version of a file, or only the original copy?
The only real way to do this is at the system level with tools that layer over the disk IO calls. For example, at one location, we used Netapps to create a $HOME/.snapshot directory that contained the way your directory looked every minute for an hour, every hour for a day, and every day for a month. If someone deleted a file or messed it up, there was a good chance that the version of the file exists somewhere in the $HOME/.snapshot directory.
On my Mac, I use a combination of Time Machine - which backs up the entire drive every hour, and gives me a snapshot of my drive that stretches back over a year and a half) and Dropbox which keeps my files stored in the main Dropbox server somewhere. I've been saved many times by that combination.
I now understand that this was an interview question. I'm not sure what was the position. Did the questioner want you to come up with a system wide way of implementing this, like a network tech position, or was this one of those brain leaks that someone comes up with at the spur of the moment when they interview someone, but were too drunk the night before to go over what they should really ask the applicant?
Did they want a whole discussion on what backups are for, and why backing up a file immediately upon creation in the same directory is a stupid idea non-optimal solution, or were they attempting to solve an issue that came up, but aren't technical enough to understand the real issue?

Vim editing diff file, updating diff ranges automatically

Suppose I generate diff of my project before commit, let say using svn. Having one diff for all files in project is a very nice way to review changes before committing them. However sometimes I wish to edit those changes without reopening each file, simple edit the diff and have it reapplied. So I have made such key map(I have setup svn diff to use unified format with fair amount of surrounding lines):
map scde :w! tmpdiff<cr>:!svn revert -R .<cr>:!patch -p0 <tmpdiff<cr>
It works, but only partially; you can edit added lines, but if you mark lines as removed or want to add another line you get some trouble since specified diff ranges do not match with actual text present in the diff. One can update them for simple changes like adding a line, however it is tedious and quickly gets complicated if you make more sophisticated changes. Is there a way to edit diff in so that range would automatically update correctly? I have found that emacs has some diff mode for this(however I have not tried it), however I was unable to find solution for my needs using vim. Maye there someone can give some suggestions?

take a look at rediff. It automatically fixes the offsets within a patch file.

Interactive program to selectively exclude parts of a diff file

Is there a program (preferably available on Cygwin) which I can use to "filter" a diff file interactively? i.e. I want something like git interactive add, except that I want to operate on a diff file. I have already discovered filterdiff, but I don't think that it supports interactive editing, only inclusion/exclusion of hunks based on a pre-defined search criteria.
My usage scenario: I have a patch in MQ, which I would like to split up per the tutorial here: https://www.mercurial-scm.org/wiki/MqTutorial#Split_a_patch_into_multiple_patches. In order to do so, I have to edit a patch file so that it includes only the (many) changes I want, and doing this manually with a text editor is kind of a pain.
Thanks!

You can probably get what you want using the record extension: https://www.mercurial-scm.org/wiki/RecordExtension
Apply the patch (but don't commit it) and then selectively commit chunk by chunk using record. You could do that with or without mq in the works.

Emacs’ diff-mode has commands to (un)apply or delete individual hunks of a diff. It also allows editing the hunks (keeping the hunk headers up to date automatically), and it has a hunk-split command that is a bit more powerful than that in git add -p.

What's the proper way to refactor a single file into multiple files and maintain version history in Perforce?

I have a file which has gotten too large, and should be refactored into two smaller files. What's the best way to do this in Perforce such that the relationship to the original file is maintained?
I'm adding two new files, and deleting the original in this case, but I would expect there to be some general solution to this problem.
I think the simplest case would be to add one new file which contains a subset of the content of the original, and delete that content from the original file, but leave it in place (it's trivial to delete it later anyway).
It would be nicest if the operation could be done in a single changelist to avoid any checkins which would break the build.

This can't be done in a single checking, but it can certainly be done without "breaking the build". Let's say you want to split bigFile.cs into smallFile1.cs and smallFile2.cs. First integrate the big file into the two little files and submit them.
p4 integrate bigFile.cs smallFile1.cs
p4 integrate bigFile.cs smallFile2.cs
p4 submit
You now have two extra files in your project directory, but they're doing no harm. Now check out smallFile1.cs and smallFile2.cs, and your project file(s). Add references to the smaller files, remove the reference to the big, edit the small files accordingly, etc. Finally, mark bigFile.cs for delete and submit your changes.
You have now split your big file into two smaller files and the smaller files' history will show you their big file origins.

You can use the integrate command.
When you've made changes to a file
that need to be propagated to another
file, start the process with p4
integrate.
The simplist form of the command would be
p4 integrate fromFile toFile
I would therefore perform the following tasks:
run p4 integrate with toFile being the new file and fromFile being the large, original file
p4 submit
p4 edit both fromFile and toFile to be the smaller versions of the original.
p4 submit
With this method, your file history information is kept in tact for all future revisions of the files.

This actually can be done with a single checkin. The steps are as follows:
integrate from the source file to both destination files as both raven and Scott Saad suggest
before submitting the new files, do a p4 edit on both files
make changes
p4 submit
The complete file history shows up in the revision graph and time-lapse views. The only disadvantage that I can see to skipping the intermediate submit is that the action type changes from 'integrate' to 'add'. Because of that, other people might not realize there is more to the file history.
I think I slightly prefer the two-checkin process.

perforce: create a local backup of current pendinglist

in perforce, i have a pending list with some changed files. now i want to revert to the base, but without loosing my changes, so i want to back them up somewhere. like saving the DIFFs of each file. at a later time, i want to restore those changes and continue my work.
is this possible? if so, how?
thanks!

there is no need for external tools at all, assuming you are on a unix machine (or have a proper cygwin setup under Windows, haven't tested it.) The only caveat is that Perforce's p4 diff produces an output that is slightly incompatible with patch, therefore you need it to point to your unix diff-command. In your client-root you can do
P4DIFF=/usr/bin/diff p4 diff -du > pending-changes.patch
optional (if you want to revert the open files from the command-line, otherwise use p4v):
p4 revert `p4 opened|awk -F\# '{print $1}'`
Later you would open the files for edit (can be automated by extracting the affected files from the patchfile pending-changes.patch and then:
patch < pending-patches
Depending on your path-layout in your client-root, you have to use the -p#num option of patch to get the patch applied cleanly.

You should be able to do a shelve. It's a way of saving a changelist for future editing. The link below is a Python add-in for Perforce that implements shelve. Also, I know that Practical Perforce has a couple of ways to shelve current changes without an external script. I don't have the book in front of me but I'll try to update this question tonight when I do.
http://public.perforce.com/wiki/P4_Shelve

Linked to from P4Shelve, is P4tar which looks very useful, and does the operations on the client, rather than branching on the server.
Certainly I'll be looking into doing similar things soon.

See also this question:
Sending my changelist to someone else without checking in.
It's basically the same thing.

Create a branch of these files in some appropriate location
Check out the branch versions of the files you have edited
Copy the edited files over from the trunk and submit them
Revert the files on the trunk
Now you've got those "diffs" you wanted safely archived. When your ready to apply those changes later on, just integrate them back into the trunk.
This is what the Python script, that Brett mentioned, does. This is the way to do it manually without any special tools.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string