How does one delete a stream depot in Perforce? - perforce

Basically what I want to do is delete a depot off the server. I have access to both P4Admin and the P4Helix.
Now I understand that I must first delete all workspaces then streams then delete the depot. But I am looking for a workaround to this process (personally I find it stupid, what if someone left the company and doesn't want to cooperate?), is there one?
Can I force delete a depot and its Streams without going to every single possible user that may have used this depot and ask them to delete workspaces for it?

First, obliterate all the files in the depot.
Next, delete all the clients of all the streams in the depot.
Next, delete all the streams in the depot.
Last, delete the depot.
This sort of thing is always easier to do via the CLI than P4Admin, IMO. Here are the four command lines that do those four things:
p4 obliterate -y //yourdepot/...
p4 -F "clients -S %stream%" streams //yourdepot/... | p4 -F "client -df %domainName%" -x - run | p4 -x - run
p4 -F "stream -d %stream%" streams //yourdepot/... | p4 -x - run
p4 depot -d yourdepot
Note that you need admin/super access to run most of these commands. p4 client -df is what allows you to force the deletion of a client that you don't own.
You may also need to run the third command (the one that deletes the streams) a couple of times, since child streams need to be deleted before their parents. You could write a clever script that does a graph search and then deletes the streams in a bottom-up order, but it's a lot easier to just brute force it by running multiple passes, since each time through you'll take out at least one entire layer of the hierarchy. :)
(I agree it'd be nice it there was more of a one-shot way of doing this, but the command line at least makes it easy to plumb queries together so you don't need to hunt each thing down manually.)

Related

gsutil copy with multithread doesn't finish copying all files

We have around 650 GB of data on google compute engine.
We need to move them to Cloud Storage to a Coldline bucket, so the best options we could find is to copy them with gsutil with parallel mode.
The files are from kilobytes to 10Mb max, and there are few million files.
The command we used is
gsutil -m cp -r userFiles/ gs://removed-websites/
On first run it copied around 200Gb and stopped with error
| [972.2k/972.2k files][207.9 GiB/207.9 GiB] 100% Done 29.4 MiB/s ETA 00:00:00
Operation completed over 972.2k objects/207.9 GiB.
CommandException: 1 file/object could not be transferred.
On second run it finished almost at the same place, and stopped again.
How can we copy these files successfully ?
Also the buckets that have the partial data are not being removed after deleting them. Console just says preparing to delete, and nothing happens, we waited more than 4 hours, any way to remove those buckets ?
Answering your first question, I can propose the several options. All of them based on data split and uploading by small portions of data.
You can try distributed upload from several machines.
https://cloud.google.com/storage/docs/gsutil/commands/cp#copying-tofrom-subdirectories-distributing-transfers-across-machines
In this case you are splitting data by safe chunks, like 50GB, and uploading it from several machines in parallel. But it requires machines, that is not required actually.
You still can try such splited upload on a single machine, but you need then some splitting mechanism, which will not upload all files at once, but by chunks. In this case, if some thing fails, you will need to reload only this chunk. In addition, you will have better accuracy and you'll be able to localize possible fail place if something happens.
Regarding, how you can delete them. Well, same technique as for upload. Divide data on chunks and delete them by chunks. Or, you can try to remove whole project, if it suitable for your situation.
Update 1
So, I checked gsutil interface and it is supports glob syntax. You can match with glob syntax, for example 200 folders, and launch this command 150 time (this will upload 200 x 500 = 30 000 folders).
You can use such approach and combine it with -m option, so this is partially that your script did, but might work faster. This will work for folders names and files as well.
If you provide examples of the folders names and files names it would be easier to propose appropriate glob pattern.
It could be that you are affected by gs-util issue 464. This happens when you are running multiple gs-util instances concurrently wit the -m option. Apparently these instances share a state directory which causes weird behavior.
One of the workarounds is to add parameters: -o GSUtil:parallel_process_count=1 -o GSUtil:parallel_thread_count=24.
E.g.:
gsutil -o GSUtil:parallel_process_count=1 -o GSUtil:parallel_thread_count=24 -m cp -r gs://my-bucket .
I've just run into the same issue, and turns out that it's caused by the cp command running into an uncopyable file (in my case, a broken symlink) and aborting.
Problem is, if you're running a massively parallel copy with -m, the broken file may not be immediately obvious. To figure out which one it is, try a dry run rsync -n instead:
gsutil -m rsync -n -r userFiles/ gs://removed-websites/
This will clearly flag the broken file and abort, and you can fix or delete it and try again. Alternatively, if you're not interested in symlinks, just use the -e option and they'll be ignored entirely.

Ansible: copying one unique file to each server in a group

I have a series of numbered files to be processed separately by each server. Each split file made using linux split and then xz compressed to save transfer time.
split_001 split_002 split_003 ... split_030
How can I push these files out to a group of 30 servers with ansible? It does not matter which server gets which file so long as they each have a single unique file.
I had used a bash file but I am looking for a better solution. Hopefully using ansible. Then I plan to run a shell command to run an at command to start the several hours or days of computation.
scp -oStrictHostKeyChecking=no bt_5869_001.xz usr13#<ip>:/data/
scp -oStrictHostKeyChecking=no bt_5869_002.xz usr13#<ip>:/data/
scp -oStrictHostKeyChecking=no bt_5869_003.xz usr13#<ip>:/data/
...
http://docs.ansible.com/ansible/copy_module.html
# copy file but iterate through each of the split files
- copy: src=/mine/split_001.xz dest=/data/split_001.xz
- copy: src=/mine/compute dest=/data/ owner=root mode=0755
- copy: src=/mine/start.sh dest=/data/ owner=root mode=0755
- shell: xz -d *.xz
- shell: at -f /data/start.sh now
For example:
tasks:
- set_fact:
padded_host_index: "{{ '{0:03d}'.format(play_hosts.index(inventory_hostname)) }}"
- copy: src=/mine/split_{{ padded_host_index }}.xz dest=/data/
You can do this with Ansible. However, this seems like the wrong general approach to me.
You have a number of jobs. You need them each to be processed, and you don't care which server processes which job as long as they only process each job once (and ideally do the whole batch as efficiently as possible). This is precisely the situation a distributed queueing system is designed to work in.
You'll have workers running on each server and one master node (which may run on one of the servers) that knows about all of the workers. When you need to add tasks to get done, you queue them up with the master, and the master distributes them out to workers as they become available - so you don't have to worry about having an equal number of servers as jobs.
There are many, many options for this, including beanstalkd, Celery, Gearman, and SQS. You'll have to do the legwork to find out which one works best for your situation. But this is definitely the architecture best suited to your problem.

How to delete a feature branch (or any branch) in Perforce?

I'm a Perforce newbie and I'm just starting to familiarize myself with Perforce's branching functionality. One thing I do not understand is how to delete a feature branch after I'm done working with it and the changes have been merged back into the mainline branch like you would do with a feature branch in Git.
Can you delete branches in perforce or do they remain permanently in Perforce?
If it's a task stream (which is what I'd recommend for a short-lived "feature branch" type stream), you probably want to "unload" it:
p4 unload -s //depot/task_stream
This is basically like deleting the stream with "p4 stream -d", except that you can get it back later if you want to. As with "p4 stream -d", it also doesn't get rid of all of the files in the stream; the ones that you modified stay in the depot (so that you can follow the merge records back to the original submits if you want to), but all the unmodified files are unloaded (whereas with "stream -d" they're gone and there isn't any convenient record of what exact version they matched in the parent -- you can reconstruct it after the fact but it's harder). Using "p4 reload" brings the task stream back to life.
If it's a normal stream and/or you want to get rid of it forever including the original changes in its depot path, you need to be an administrator (submitted changes in Perforce are generally considered Very Important and immutable unless you're an admin) and use the "obliterate" command, followed by deleting the stream spec:
p4 obliterate -y //depot/your_stream/...
p4 stream -d //depot/your_stream
Given your description I'd definitely recommend using task streams for features and "unloading" them when you're done.
If you're not using streams at all, the standard practice with branches is to either just leave them when you're done with them, or to reuse them (i.e. have an ongoing development branch that you merge into the mainline as you complete each feature). You can obliterate a branch (as described above the in the stream example) but since this requires admin permissions it's not typical.

How to copy multiple files simultaneously using scp

I would like to copy multiple files simultaneously to speed up my process I currently used the follow
scp -r root#xxx.xxx.xx.xx:/var/www/example/example.example.com .
but it only copies one file at a time. I have a 100 Mbps fibre so I have the bandwidth available to really copy a lot at the same time, please help.
You can use background task with wait command.
Wait command ensures that all the background tasks are completed before processing next line. i.e echo will be executed after scp for all three nodes are completed.
#!/bin/bash
scp -i anuruddha.pem myfile1.tar centos#192.168.30.79:/tmp &
scp -i anuruddha.pem myfile2.tar centos#192.168.30.80:/tmp &
scp -i anuruddha.pem myfile.tar centos#192.168.30.81:/tmp &
wait
echo "SCP completed"
SSH is able to do so-called "multiplexing" - more connections over one (to one server). It can be one way to afford what you want. Look up keywords like "ControlMaster"
Second way is using more connections, then send every job at background:
for file in file1 file2 file3 ; do
scp $file server:/tmp/ &
done
But, this is answer to your question - "How to copy multiple files simultaneously". For speed up, you can use weaker encryption (rc4 etc) and also don't forget, that the bottleneck can be your hard drive - because SCP don't implicitly limit transfer speed.
Last thing is using rsync - in some cases, it can be lot faster than scp...
I am not sure if this helps you, but I generally archive (compression is not required. just archiving is sufficient) file at the source, download it, extract them. This will speed up the process significantly.
Before archiving it took > 8 hours to download 1GB
After archiving it took < 8 minutes to do the same
You can use parallel-scp (AKA pscp): http://manpages.ubuntu.com/manpages/natty/man1/parallel-scp.1.html
With this tool, you can copy a file (or files) to multiple hosts simultaneously.
Regards,
100mbit Ethernet is pretty slow, actually. You can expect 8 MiB/s in theory. In practice, you usually get between 4-6 MiB/s at best.
That said, you won't see a speed increase if you run multiple sessions in parallel. You can try it yourself, simply run two parallel SCP sessions copying two large files. My guess is that you won't see a noticeable speedup. The reasons for this are:
The slowest component on the network path between the two computers determines the max. speed.
Other people might be accessing example.com at the same time, reducing the bandwidth that it can give you
100mbit Ethernet requires pretty big gaps between two consecutive network packets. GBit Ethernet is much better in this regard.
Solutions:
Compress the data before sending it over the wire
Use a tool like rsync (which uses SSH under the hood) to copy on the files which have changed since the last time you ran the command.
Creating a lot of small files takes a lot of time. Try to create an archive of all the files on the remote side and send that as a single archive.
The last suggestion can be done like this:
ssh root#xxx "cd /var/www/example ; tar cf - example.example.com" > example.com.tar
or with compression:
ssh root#xxx "cd /var/www/example ; tar czf - example.example.com" > example.com.tar.gz
Note: bzip2 compresses better but slower. That's why I use gzip (z) for tasks like this.
If you specify multiple files scp will download them sequentially:
scp -r root#xxx.xxx.xx.xx:/var/www/example/file1 root#xxx.xxx.xx.xx:/var/www/example/file2 .
Alternatively, if you want the files to be downloaded in parallel, then use multiple invocations of scp, putting each in the background.
#! /usr/bin/env bash
scp root#xxx.xxx.xx.xx:/var/www/example/file1 . &
scp root#xxx.xxx.xx.xx:/var/www/example/file2 . &

Syncing two files when one is still being written to

I have an application (video stream capture) which constantly writes its data to a single file. Application typically runs for several hours, creating ~1 gigabyte file. Soon (in a matter of several seconds) after it quits, I'd like to have 2 copies of file it was writing - let's say, one in /mnt/disk1, another in /mnt/disk2 (the latter is an USB flash drive with FAT32 filesystem).
I don't really like an idea of modifying the application to write 2 copies simulatenously, so I though of:
Application starts and begins to write the file (let's call it /mnt/disk1/file.mkv)
Some utility starts, copies what's already there in /mnt/disk1/file.mkv to /mnt/disk2/file.mkv
After getting initial sync state, it continues to follow a written file in a manner like tail -f does, copying everything it gets from /mnt/disk1/file.mkv to /mnt/disk2/file.mkv
Several hours pass
Application quits, we stop our syncing utility
Afterwards, we run a quick rsync /mnt/disk1/file.mkv /mnt/disk2/file.mkv just to make sure they're the same. In case if they're the same, it should just run a quick check and quit fairly soon.
What is the best approach for syncing 2 files, preferably using simple Linux shell-available utilities? May be I could use some clever trick with FUSE / md device / tee / tail -f?
Solution
The best possible solution for my case seems to be
mencoder ... -o >(
tee /mnt/disk1/file.mkv |
tee /mnt/disk2/file.mkv |
mplayer -
)
This one uses bash/zsh-specific magic named "process substitution" thus eliminating the need to make named pipes manually using mkfifo, and displays what's being encoded, as a bonus :)
Hmmm... the file is not usable while it's being written, so why don't you "trick" your program into writing through a pipe/fifo and use a 2nd, very simple program, to create 2 copies?
This way, you have your two copies as soon as the original process ends.
Read the manual page on tee(1).

Resources