Can multiple users commit to SVN simultaneously? Is it thread safe? - multithreading

The application must be desgined in such a way that it must support multiple users while commiting into SVN Repositiry. I'm done with the application and the related stuff, however, i'm stuck with this multi-user thing. How i can achieve this? I saw somewhere that for every thread you have to instantiate a separate SVNRepository driver. This tell it's not thread safe..or may be i'm getting the whole thing wrong. Any help on this issue is appreciated. thanks.
I get the above info (the one in italics) from Here .

I'm SVNKit developer, let me explain how the things work.
SVNRepository class represents one SVN connection with it's own credentials. It is thread-unsafe that means that you can perform only sequential operations on it. See this article for more details:
http://vcs.atspace.co.uk/2012/09/21/are-svnkit-methods-reenterable/
So if your application tries to create several commits at the same time, you should use several independent SVNRepository instances. Good news is that no special synchronization code is required, all synchronization is performed on the server side. Another good news is that when commit for a certain SVNRepository object is finished or cancelled, you can reuse the connection to start another commit. But note that if you use http protocol, you can't reuse the same connection to commit on behalf another user even if you change credentials for the connection (SVNRepository#setAuthenticationManager).
To create a commit without working copy use SVNRepository#getCommitEditor which starts the commit transaction. To stop the transaction use either ISVNCommitEditor#closeEdit or ISVNCommitEditor#abortEdit; you can perform other operations on SVNRepository instance, until commit transaction is finished.
ISVNCommitEditor instance should describe your virtual working copy: it tells to SVNKit about your current knowledge of the latest working copy state. If the description doesn't correspond to the real latest change you get "File or directory is out of date; try updating" error.
http://vcs.atspace.co.uk/2012/07/20/subversion-remote-api-committing-without-working-copy/
You can use -1 instead of real revision in ISVNEditor#openFile/openDir to disable checks, but that can cause another problem: you could overwrite changes without knowing about them.
Another option is to commit using real working copies and real changes on filesystem (using SvnOperationFactory#createCommit). But even in this case have a look at the first link to learn which objects can/can't be reused across threads.
Hope this helps, if you have other questions, feel free to ask on SVNKit mailing list.

Related

Perforce change-submit trigger to run script on client

I figured I'd post here, after posting on SuperUser, since I want to get input from software developers who might have encountered this scenario before!
I would like to initiate a series of validation steps on the client side on files opened within a changelist before allowing the changelist to be submitted.
For example, I wish to ensure that if a file is opened for add, edit, or remove as part of a changelist, that a particular related file will be treated appropriately based on a matrix of conditions for that corresponding file:
Corresponding file being opened for add/edit/remove
Corresponding file existing on disk vs. not existing on disk
Corresponding file existing in depot vs. not existing in depot
Corresponding file having been changed vs. not having been changed relative to depot file
These validation steps must be initiated before the submit is accepted by the Perforce server. Furthermore, the validation must be performed on the client side since I must be able to reconcile offline work with the copies on clients' disks.
Environment:
Perforce 2017.2 server
MacOS and Windows computers submitting to different branches
Investigative Avenues Already Covered
Initial design was a strictly client-side custom tool, but this is not ideal since this would be a change of the flow that users are familiar with, and I would also have to implement a custom GUI.
Among other approaches, I considered creating triggers in 2017.2; however, even if I were to use a change-content trigger with all the changelist files available on the server, I would not be able to properly perform the validation and remediation steps.
Another possibility would be using a change-submit trigger and to use the trigger script variables in 2017.2 to get the client's IP, hostname, client's current working directory, etc. so that you could run a script on the server to try to connect remotely to the client's computer. However, running any script on the client's computer and in particular operating on their local workspace would require credentials that most likely will not be made available.
I would love to use a change-submit trigger on the Perforce server to initiate a script/bundled executable on the client's computer to perform p4 operations on their workspace to complete the validation steps. However, references that I've found (albeit from years ago) indicate that this is not possible:
https://stackoverflow.com/a/16061840
https://perforce-user.perforce.narkive.com/rkYjcQ69/p4-client-side-submit-triggers
Updating files with a Perforce trigger before submit
Thank you for reading and in advance for your help!
running any script on the client's computer and in particular operating on their local workspace would require credentials that most likely will not be made available.
This is the crux of it -- the Perforce server is not allowed to send the client arbitrary code to execute. If you want that type of functionality, you'd have to punch your own security hole in the client (and then come up with your own way of making sure it's not misused), and it sounds like you've already been down that road and decided it's not worth it.
Initial design was a strictly client-side custom tool, but this is not ideal since this would be a change of the flow that users are familiar with, and I would also have to implement a custom GUI.
My recommendation would be to start with that approach and then look for ways to decrease friction. For example, you could use a change-submit trigger to detect whether the user skipped the custom workflow (perhaps by having the custom tool put a token in the change description for the trigger to validate), and then give them an error message that puts them back on track, like "Please run Tools > Change Validator, or contact wanda#yourdomain.com for help"

Can I check via script if my NotesView is corrupt?

Last week I worked on an incident, where it was stated that a Java agent caused to terminate a server. First dagnose was that objects probably were not being recycled in a sufficient manner.
After some testing (and bring down the server multiple times :-) ) I notified in my Notes client that the view was corrupt.
I could have avoid this if I were able to check if a view is OK or not.
for a database I can check if it exists
for a view I can check if it exists
But can I also check if a view is in good condition or not? or is only a client (Notes, Admin) capable of doing this?
I wish there was a programmatic way Patrick. The fixup task (load fixup -C) is one of the sure fire ways to get details of corruption, but not helpful to you in this scenario.

Lotus notes running or not? using registry

I have an application written based on lotus notes client. I wanted to check whether lotus notes is running before starting my application, so that I can skip asking for password from the user if "Don't prompt for password from other notes-based programs" is checked.
One method is get all the running process and look for nlnotes.exe and notes2.exe process to confirm.
Is there any other method to achieve the same.
To be more specific, I want to know whether any registry entries are made to say that notes is currently running. We can't open two instances of notes client, this made me think IBm might have used registry entry to check for running instance.
Kindly correct me if I'm wrong.
The registry would not be a good place for info like that, because if the client crashed the registry data would need to be cleaned up. The same is true for lock files. So while I can't say for sure, I believe IBM detects whether the client is already running by looking for in-memory objects - e.g., shared memory sections, mutexes, etc. Using Process Explorer, I see several shared memory sections associated with the Notes processes. One likely candidate is a section called -LTSCS-22275429-MEM9, but I don't know how that name is generated, if it ever changes with reinstall, reboot, etc. It would take a fair amount of experimentation to determine that - and then of course one would have to figure out how to write the code to detect it, but that's my best guess as to how it's done.

Process text files ftp'ed into a set of directories in a hosted server

The situation is as follows:
A series of remote workstations collect field data and ftp the collected field data to a server through ftp. The data is sent as a CSV file which is stored in a unique directory for each workstation in the FTP server.
Each workstation sends a new update every 10 minutes, causing the previous data to be overwritten. We would like to somehow concatenate or store this data automatically. The workstation's processing is limited and cannot be extended as it's an embedded system.
One suggestion offered was to run a cronjob in the FTP server, however there is a Terms of service restriction to only allow cronjobs in 30 minute intervals as it's shared-hosting. Given the number of workstations uploading and the 10 minute interval between uploads it looks like the cronjob's 30 minute limit between calls might be a problem.
Is there any other approach that might be suggested? The available server-side scripting languages are perl, php and python.
Upgrading to a dedicated server might be necessary, but I'd still like to get input on how to solve this problem in the most elegant manner.
Most modern Linux's will support inotify to let your process know when the contents of a diretory has changed, so you don't even need to poll.
Edit: With regard to the comment below from Mark Baker :
"Be careful though, as you'll be notified as soon as the file is created, not when it's closed. So you'll need some way to make sure you don't pick up partial files."
That will happen with the inotify watch you set on the directory level - the way to make sure you then don't pick up the partial file is to set a further inotify watch on the new file and look for the IN_CLOSE event so that you know the file has been written to completely.
Once your process has seen this, you can delete the inotify watch on this new file, and process it at your leisure.
You might consider a persistent daemon that keeps polling the target directories:
grab_lockfile() or exit();
while (1) {
if (new_files()) {
process_new_files();
}
sleep(60);
}
Then your cron job can just try to start the daemon every 30 minutes. If the daemon can't grab the lockfile, it just dies, so there's no worry about multiple daemons running.
Another approach to consider would be to submit the files via HTTP POST and then process them via a CGI. This way, you guarantee that they've been dealt with properly at the time of submission.
The 30 minute limitation is pretty silly really. Starting processes in linux is not an expensive operation, so if all you're doing is checking for new files there's no good reason not to do it more often than that. We have cron jobs that run every minute and they don't have any noticeable effect on performance. However, I realise it's not your rule and if you're going to stick with that hosting provider you don't have a choice.
You'll need a long running daemon of some kind. The easy way is to just poll regularly, and probably that's what I'd do. Inotify, so you get notified as soon as a file is created, is a better option.
You can use inotify from perl with Linux::Inotify, or from python with pyinotify.
Be careful though, as you'll be notified as soon as the file is created, not when it's closed. So you'll need some way to make sure you don't pick up partial files.
With polling it's less likely you'll see partial files, but it will happen eventually and will be a nasty hard-to-reproduce bug when it does happen, so better to deal with the problem now.
If you're looking to stay with your existing FTP server setup then I'd advise using something like inotify or daemonized process to watch the upload directories. If you're OK with moving to a different FTP server, you might take a look at pyftpdlib which is a Python FTP server lib.
I've been a part of the dev team for pyftpdlib a while and one of more common requests was for a way to "process" files once they've finished uploading. Because of that we created an on_file_received() callback method that's triggered on completion of an upload (See issue #79 on our issue tracker for details).
If you're comfortable in Python then it might work out well for you to run pyftpdlib as your FTP server and run your processing code from the callback method. Note that pyftpdlib is asynchronous and not multi-threaded, so your callback method can't be blocking. If you need to run long-running tasks I would recommend a separate Python process or thread be used for the actual processing work.

What's the best way to keep multiple Linux servers synced?

I have several different locations in a fairly wide area, each with a Linux server storing company data. This data changes every day in different ways at each different location. I need a way to keep this data up-to-date and synced between all these locations.
For example:
In one location someone places a set of images on their local server. In another location, someone else places a group of documents on their local server. A third location adds a handful of both images and documents to their server. In two other locations, no changes are made to their local servers at all. By the next morning, I need the servers at all five locations to have all those images and documents.
My first instinct is to use rsync and a cron job to do the syncing over night (1 a.m. to 6 a.m. or so), when none of the bandwidth at our locations is being used. It seems to me that it would work best to have one server be the "central" server, pulling in all the files from the other servers first. Then it would push those changes back out to each remote server? Or is there another, better way to perform this function?
The way I do it (on Debian/Ubuntu boxes):
Use dpkg --get-selections to get your installed packages
Use dpkg --set-selections to install those packages from the list created
Use a source control solution to manage the configuration files. I use git in a centralized fashion, but subversion could be used just as easily.
An alternative if rsync isn't the best solution for you is Unison. Unison works under Windows and it has some features for handling when there are changes on both sides (not necessarily needing to pick one server as the primary, as you've suggested).
Depending on how complex the task is, either may work.
One thing you could (theoretically) do is create a script using Python or something and the inotify kernel feature (through the pyinotify package, for example).
You can run the script, which registers to receive events on certain trees. Your script could then watch directories, and then update all the other servers as things change on each one.
For example, if someone uploads spreadsheet.doc to the server, the script sees it instantly; if the document doesn't get modified or deleted within, say, 5 minutes, the script could copy it to the other servers (e.g. through rsync)
A system like this could theoretically implement a sort of limited 'filesystem replication' from one machine to another. Kind of a neat idea, but you'd probably have to code it yourself.
AFAIK, rsync is your best choice, it supports partial file updates among a variety of other features. Once setup it is very reliable. You can even setup the cron with timestamped log files to track what is updated in each run.
I don't know how practical this is, but a source control system might work here. At some point (perhaps each hour?) during the day, a cron job runs a commit, and overnight, each machine runs a checkout. You could run into issues with a long commit not being done when a checkout needs to run, and essentially the same thing could be done rsync.
I guess what I'm thinking is that a central server would make your sync operation easier - conflicts can be handled once on central, then pushed out to the other machines.
rsync would be your best choice. But you need to carefully consider how you are going to resolve conflicts between updates to the same data on different sites. If site-1 has updated
'customers.doc' and site-2 has a different update to the same file, how are you going to resolve it?
I have to agree with Matt McMinn, especially since it's company data, I'd use source control, and depending on the rate of change, run it more often.
I think the central clearinghouse is a good idea.
Depends upon following
* How many servers/computers that need to be synced ?
** If there are too many servers using rsync becomes a problem
** Either you use threads and sync to multiple servers at same time or one after the other.
So you are looking at high load on source machine or in-consistent data on servers( in a cluster ) at given point of time in the latter case
Size of the folders that needs to be synced and how often it changes
If the data is huge then rsync will take time.
Number of files
If number of files are large and specially if they are small files rsync will again take a lot of time
So all depends on the scenario whether to use rsync , NFS , Version control
If there are less servers and just small amount of data , then it makes sense to run rysnc every hour.
You can also package content into RPM if data changes occasionally
With the information provided , IMO Version Control will suit you the best .
Rsync/scp might give problems if two people upload different files with same name .
NFS over multiple locations needs to be architect-ed with perfection
Why not have a single/multiple repositories and every one just commits to those repository .
All you need to do is keep the repository in sync.
If the data is huge and updates are frequent then your repository server will need good amount of RAM and good I/O subsystem

Resources