If I want to transfer data using RPC or component technology, but the size of data can be very big, how deal with this situation ?
for example, I want to transfer a file to remote as a parameter, but I don't want put the whole file into memory for transferring . How should I do?
I think you should consider the file transfer solution, smth like establishing FTP connection in the background and make operations supposed to perform on this file data to wait until file transferring completes. Also you should take care of correctness of transferred data, checksumming for instance. The other solution probably is mounting remote directory containing files as a local volume or even setting up a distributed file system if you have all files in one place and you are powered with Linux.
Let's me answer my question.
The answer is MTOM, make sure the framework you are using support it.
Related
So this is a bit of a question of theory as well as specific (temporary use case)
Two servers are to be in sync of each other. One On-Site, the other an Off-Site backup.
However, the Off-Site should have the data duplicated and accessible if need be (not storing archive images of server1)
server1 and server2 are connected over internet via VPN connection
server1 uses ZFS Raid 10
server2 uses ext4 Raid5 (temporary setup, will be replaced in future with ZFS and this use case vanishes)
Can you take a ZFS snapshot on server1, send it to server2 and have it be unpacked/applied to the raid5 array, essentially duplicating server1 via incremental snapshots?
I know that there are some other tools for duplication of filesystems, but i was wondering if we can use snapshots in a non zfs fs. (documentation leads me to believe this is not possible, but i do not know enough about this)
Yes, there are two theoretical options. Both use async replication so will have a nonzero RPO (although from your description that seems acceptable to some extent):
Use zfs send to create a stream on the source system, and then use some tool that can understand the contents of that stream and translate to POSIX filesystem primitives on the receiving system.
Take a snapshot on the source system and then use an FS-agnostic tool to copy stuff from that snapshot over.
The first one has the benefit of being the most performant option, because ZFS knows what parts of its pool have been changed and only has to look at / send those parts. However, I don’t know of any tool that can actually do this. (Prototypes have been built at ZFS developer hackathons, but there is not a big audience for this type of tool so they’ve never been made production quality AFAIK.)
The second one is less performant because it will have to inspect the data to see what changed, but it has the benefit that tools exist — although you may have to fight with it a little, you can use rsync for this. Also, its RPO might be higher since transferring the data will take a bit longer. The slightly tricky parts will be:
Writing its metadata to a writable part of the pool on the source side, since the snapshot you’re copying will be read-only. (Look in the .zfs/ directory in the root of the filesystem you want to copy to find a readable copy of the snapshot.)
Making the failover target not have intermediate state if the source system dies during an rsync run. Hopefully your target filer has the ability to snapshot before you start an rsync run, so that you can roll back to the “last good state” if the run fails. Otherwise, hopefully your data / application can tolerate some inconsistencies. (Or maybe there’s an rsync option that does this that I haven’t used before.)
I have two scenarios.
Scenario 1: Machine A contains 1000 documents as folders. This folder of machine A is mounted in machine B. I process documents within these folders in machine B and store the output result in mounted path in machine B.
Scenario 2: The documents in machine A is directly copied into machine B and processed
Scenario 2 is much faster than Scenario 1. I could guess its because there is no data transfer happening over the network between 2 machines. Is there a way I can use mounting and still achieve better performance?
Did you try enabling a cache? - for NFS: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/fscachenfs.html - CIFS should have caching enabled by default (unless you disabled it)
The other option would be to use something like Windows’ offline files, which copies files and folders between client and server in the background, so you don’t need to deal with it. The only thing I’ve found for linux is OFS.
But the performance depends on the size of the files and if you read them randomly or sequentially. For instance when I am encoding videos, I access the file right away via the network from my NFS, because it takes as much time as it would take to read and write the file. This way no additional time is “wasted” on the encoding, as the application can encode the stream which is coming from the network.
So for large files you might want to change the algorithms to a sequential read, on the other hand small files which are copied within seconds, could be also synced between server and client using rsync, bittorrent sync, dropbox or one of the other hundreds of tools. And this is actually quite commonly done.
I would like to create something like "file honeypot" on Windows OS.
The problem I would like to answer is this:
I need to detect that file is accessed (Malware wants to read file to send it over internet) so I can react to it. But I do not know how exacly tackle this thing.
I can periodically test file - Do not like this sollution. Would like some event driven without need to bother processor every few ms. But could work if file is huge enought so it cannot be read between checks.
I could exclusively open file myselve and somehow detect if file is accessed. But I have no idea how to do this thing.
Any idea about how to resolve this issue effectively? Maybe creating specialized driver could help but I have little experience in this.
Thanks
Tracking (and possibly preventing) filesystem access on Windows is accomplished using filesystem filter drivers. But you must be aware that kernel-mode code (rootkits etc) can bypass the filter driver stack and send the request directly to the filesystem. In this case only the filesystem driver itself can log or intercept access.
I'm going to assume that what you're writing is a relatively simple honeypot. The integrity of the system on which you're running has not been compromised, there is no rootkit or filter driver installation by malware and there is no process running that can implement avoidance or anti-avoidance measures.
The most likely scenario I can think of is that a server process running on the computer is subject to some kind of external control which would allow files containing sensitive data to be read remotely. It could be a web server, a mail server, an FTP server or something else but I assume nothing else on the computer has been compromised. And the task at hand is to watch particular files and see if anything is reading them.
With these assumptions a file system watcher will not help. It can monitor parts of the system for the creation of new files or modification or deletion of existing ones, but as far as I know it cannot monitor for read only access.
The only event-driven mechanism I am aware of is a filter driver. This is a specialised piece of driver software that can be inserted into the driver chain and monitor access to files. With the constraints above, it is a reliable solution to the problem at the cost of being quite hard to write.
If a polling mechanism is sufficient then I can see two avenues. One is to try to lock the file exclusively, which will fail if it is open. This is easy, but slow.
The other is to monitor the open file handles. I know it can be done because I know programs that do it, but I can't tell you how without some research.
If my assumptions are wrong, please edit your question and provide additional information.
I have a separate server that processes the media uploaded to my main, web facing server. For now I upload files to it using FTP but the problem with this is that to ensure the files are done uploading I have a timeout running, which adds a delay in the overall processing time. I can't seem to get it to wait less than 5 seconds and still guarantee to pick up the media and this delay is no longer acceptable. So:
Is there a better way to implement this cleanly? I've considered sticking with FTP and sending another file after the initial upload that will indicate it's done but then there are two uploads for every upload = expensive. Another option I've considered is implementing a custom server that will just get a content-length header, do some authentication, and then receive the file and kickoff the processing as soon as its ready. Socket programming doesn't seem too intimidating but I have some worries about sending binary files and different formats, is this a valid concern? Also are there any other protocols out there I could implement to do this, rather than reinvent the wheel? Something like FTP but with a little verification.
I'd be glad for any pointers or tips you can share, thanks!
I suggest you use rsync. This runs over ssh, will move entire directories / hierarchies of files, do incremental copies, in short everything you could possibly want.
Hi it's a question and it may be redundant but I have a hunch there is a tool for this - or there should be and if there isn't I might just make it - or maybe I am barking up the wrong tree in which case correct my thinking:
But my problem is this: I am looking for some way to migrate large virtual disk drives off a server once a week via an internet connection of only moderate speed, in a solution that must be able to be throttled for bandwidth because the internet connection is always in use.
I thought about it and the problem is familar: large files that can moved that also be throttled that can easily survive disconnection/reconnection/large etc etc - the only solution I am familiar with that just does it perfectly is torrents.
Is there a way to automatically strategically make torrents and automatically "send" them to a client download list remotely? I am working in Windows Hyper-V Host but I use only Linux for the guests and I could easily cook up a guest to do the copying so consider it a windows or linux problem.
PS: the vhds are "offline" copies of guest servers by the time I am moving them - consider them merely 20-30gig dum files.
PPS: I'd rather avoid spending money
Bittorrent is an excellent choice, as it handles both incremental updates and automatic resume after connection loss very well.
To create a .torrent file automatically, use the btmakemetainfo script found in the original bittorrent package, or one from the numerous rewrites (bittornado, ...) -- all that matters is that it's scriptable. You should take care to set the "disable DHT" flag in the .torrent file.
You will need to find a tracker that allows you to track files with arbitrary hashes (because you do not know these in advance); you can either use an existing open tracker, or set up your own, but you should take care to limit the client IP ranges appropriately.
This reduces the problem to transferring the .torrent files -- I usually use rsync via ssh from a cronjob for that.
For point to point transfers, torrent is an expensive use of bandwidth. For 1:n transfers it is great as the distribution of load allows the client's upload bandwidth to be shared by other clients, so the bandwidth cost is amortised and everyone gains...
It sounds like you have only one client in which case I would look at a different solution...
wget allows for throttling and can resume transfers where it left off if the FTP/http server supports resuming transfers... That is what I would use
You can use rsync for that (http://linux.die.net/man/1/rsync). Search for the --partial option in man and that should do the trick. When a transfer is interrupted the unfinished result (file or directory) is kept. I am not 100% sure if it works with telnet/ssh transport when you send from local to a remote location (never checked that) but it should work with rsync daemon on the remote side.
You can also use that for sync in two local storage locations.
rsync --partial [-r for directories] source destination
edit: Just confirmed the crossed out statement with ssh