Does BitTorrent Sync share my data? - bittorrent

I have been interested in using BitTorrent Sync to synchronize my files. My question is: Is my encrypted torrent data being shared through 3rd party computers or only the computers I register? For example, if I install Sync on 3 machines, will all of that sync'd data only ever stay on those 3 machines and no data - encrypted or not - ever travels to other computers with Sync installed.

While Sync uses BitTorrent technology, it creates swarm only from your machines. Particularly in your case it will create a swarm from your 3 machines, that nobody could join, and will use p2p to effectively send data between them.
PS. I am involved in Sync development.

This is from bittorrent website:
The data is transferred in pieces from each of the syncing devices,
and BitTorrent Sync chooses the optimal algorithm to make sure you
have a maximum download and upload speed during the process.
EDIT: there is also a statement in unofficial faq:
Will me devices still sync when switched off?
BitTorrent Sync is not a cloud storage solution like
SkyDrive/DropBox/GDrive, etc - it doesn't sync your files to a "cloud",
therefore for one device to sync with another, both devices need to be online
First statement does not tell anything about communication with devices of other users.
Last statement could hint, that there is no way to get to your data without
having at least one of your own syncing machines turned on.

Related

Creating a file server in Azure

Our company has an on-prem file server that I'd like to move to the cloud. I followed these directions and was successfully able to map a drive on my local work computer to connect to an Azure File Share. Our company has about 20 locations, ~5 TB of data (mostly "office" type of files) in total, and about 500 users accessing them.
There are two issues I would like to improve but I'm not sure how:
There's somewhat of a lag when opening files. Other than increasing our office's internet speed, is there anything to be done to make it faster? Would some kind of site-to-site VPN help? Would adding some type of server or VM in the "middle" (maybe one per location?) that would perhaps somehow cache the files reduce the lag?
Also, we have and use an Office 365 subscription. What's the easiest way to use our existing AD structure to transfer over the NTFS permissions that are currently in place?
I Googled around and found a bunch of companies advertising their services, notable among them was Talon Storage. But it seems like something that could be done without hiring a company. What I'm hoping for is a DIY direction to optimally solve these issues. Perhaps there's a standard or commonly recommended solution for such issues. Any guidance would be greatly appreciated.
L-A-T-E-N-C-Y. The number one enemy for any cloud-based file server attempt. It ranges from annoying to down right unusable, depending on how far you are from the Azure datacenter of choice.
Imagine a poor soul trying to "stream" a large 20-meg Excel file with 20 references to external files. What used to take maybe 8 seconds on-prem will now take 40 in the cloud (on a good day). It's game over for productivity. Your marketing department that sometimes used to cut video in iMovie over the network? Those days are over.
I understand this is not the answer you were after, but it's the crude reality.
Do not panic, there are solutions, here's a good one - https://azure.microsoft.com/en-us/services/storsimple/
I'm sure you wanted to get rid of boxes not buy more, but it is what it is.

Good distributed general purpose filesystem in my case?

I've been researching the idea of using distributed file system along with my dedicated servers instead of going with Amazon S3 and the results are nothing but massive headaches!
My project have the following characteristics/requirements:
User files are stored in dedicated servers. Each file is stored in 2 separate machines, located in different data centers (150-200 miles away from each other)
I'm using Amazon RDS to host the associated mysql database (*). It's fairly compact (only hold IDs/files metadata)
Files/data is around 50TB. Naturally, data does change and will definitely grow with time
My question is: is there a good general-purpose, distributed parallel fault-tolerant file system that have the following characteristics:
Stable & reasonably fast (upload/download)
Fairly easy to setup & maintain
Handle data storage so that I only have to care about removing/adding new servers if the need arise (ie. add new servers to the filesystem's server pool by editing a simple config, or something like that)
I've read about OpenStack, GlusterFS, MogileFS, XtreemFS, etc...but the more I read, the more I get confused!
(*) Yes, I realize the contradiction. Cost-wise it does make sense to host the database on RDS. But storing (up to) 50TB of users files on amazon is way too expensive compared to using dedicated servers (provided it's good enough).
PS. my app isn't live yet, so I'm open to suggestion if someone have a good idea that fits well in my case.
EDIT I'm not trying to make a S3 clone, I just need to use an existing hosting infrastructure to build small-scale cloud solution, my question is about finding the right distributed file system to handle/automate this.
We recently switched from an expensive storage solution to the opensource Lizardfs for our Distributed storage solution. It is quite simple to set up and scale once your understand the basic concept.
Check out https://docs.lizardfs.com/introduction.html#architecture for a quick overview. But forget about shadow master en meta loggers for now. What you need to know is that there are
a master: that regulates the traffic (make sure that has enough cpu)
chunkservers: which actually store the data. Use any kind of off the shelf hardware with a bunch of harddisks attached.
Clients: which are just simple mount points. So you can get a giant 50TB mount if you want. The master will tell the client where to find/store the files. The actual data is being transfered straight from the client->chunkserver and back.
You can add as many chunkservers as you want, the master will automatically try to balance your storage usage across them. Adding storage is a matter of adding harddrives, or adding servers. They don't have to be actual bare metal machines, but that is probably the cheapest.
There are 2 amazing features in lizardfs that allow georeplication.
Goals (see https://docs.lizardfs.com/adminguide/replication.html#standard-goals): How important are files to you. You can define, on a file level/folder level how many times a file needs to be replicated. Do you want 2 copies 3? 10? You could define a goal of 2 copies for old files that are simply there for archiving purposes. And define a goal of 4 copies on SSD drives for all new files.
Those same goals can also be used to do georeplication. You define that your data has to be stored it least two different locations by labeling your chunkservers accordingly. (e.g. DC1 and DC2)
Rack awareness (see https://docs.lizardfs.com/adminguide/advanced_configuration.html#configuring-rack-awareness-network-topology): you basically define IP ranges to teach the system how your network looks like. This way, clients will try to serve files from the closest server.
The ease of setting it up is what sold lizardfs for me. I've heard very good things about Ceph, but setting it up is another matter...
What worried me at first was how proven the technology is/was. So I spent quite a lot of research on figuring out who uses it.
Orange Poland (A large telecom provider) is one of the users.
And Cloudweavers/opennebula actualy built a business around it selling complete solutions.
Won't it take more than one person a few months a year to manage these servers? That will cost some $, then you have the cost of hosting the data yourself, then you have the added huge cost that the business / system you are building is not obviously scalable? In addition any likely investor will be turned away by a complex home grown data hosting system. How will you ensure integrity/security on par with Amazon? Your max savings per year look like $30,000 or so.
You could save money by doing a de-duplicated storage system where you just store all the unique chunks of data - also see rsync. Don't know how redundant your data is though.
I recommend LizardFS and GfarmFS.
IMHO Ceph is a major disappointment and so is XtreemFS.

Considerations regarding a p2p social network

While the are many social networks in the wild, most rely on data stored on a central site owned by a third party.
I'd like to build a solution, where data remains local on member's systems. Think of the project as an address book, which automagically updates contact's data as soon a a contact changes its coordinates. This base idea might get extended later on...
Updates will be transferred using public/private key cryptography using a central host. The sole role of the host is to be a store and forward intermediate. Private keys remain private on each member's system.
If two client are both online and a p2p connection could be established, the clients could transfer data telegrams without the central host.
Thus, sender and receiver will be the only parties which are able create authentic messages.
Questions:
Do exist certain protocols which I should adopt?
Are there any security concerns I should keep in mind?
Do exist certain services which should be integrated or used somehow?
More technically:
Use e.g. Amazon or Google provided services?
Or better use a raw web-server? If yes: Why?
Which algorithm and key length should be used?
UPDATE-1
I googled my own question title and found this academic project developed 2008/09: http://www.lifesocial.org/.
The solution you are describing sounds remarkably like email, with encrypted messages as the payload, and an application rather than a human being creating the messages.
It doesn't really sound like "p2p" - in most P2P protocols, the only requirement for central servers is discovery - you're using store & forward.
As a quick proof of concept, I'd set up an email server, and build an application that sends emails to addresses registered on that server, encrypted using PGP - the tooling and libraries are available, so you should be able to get that up and running in days, rather than weeks. In my experience, building a throw-away PoC for this kind of question is a great way of sifting out the nugget of my idea.
The second issue is that the nature of a social network is that it's a network. Your design may require you to store more than the data of the two direct contacts - you may also have to store their friends, or at least the public interactions those friends have had.
This may not be part of your plan, but if it is, you need to think it through early on - you may end up having to transmit the entire social graph to each participant for local storage, which creates a scalability problem....
The paper about Safebook might be interesting for you.
Also you could take a look at other distributed OSN and see what they are doing.
None of the federated networks mentioned on http://en.wikipedia.org/wiki/Distributed_social_network is actually distributed. What Stefan intends to do is indeed new and was only explored by some proprietary folks.
I've been thinking about the same concept for the last two years. I've finally decided to give it a try using Python.
I've spent the better part of last night and this morning writing a sockets communication script & server. I also plan to remove the central server from the equation as it's just plain cumbersome and there's no point to it when all the members could keep copies of their friend's keys.
Each profile could be accessed via a hashed string of someone's public key. My social network relies on nodes and pods. Pods are computers which have their ports open to the network. They help with relaying traffic as most firewalls block incoming socket requests. Nodes store information and share it with other nodes. Each node will get a directory of active pods which may be used to relay their traffic.
The PeerSoN project looks like something you might be interested in: http://www.peerson.net/index.shtml
They have done a lot of research and the papers are available on their site.
Some thoughts about it:
protocols to use: you could think exactly on P2P programs and their design
security concerns: privacy. Take a great care to not open doors: a whole system can get compromised 'cause you have opened some door.
services: you could integrate with the regular social networks through their APIs
People will have to install a program in their computers and remeber to open it everytime, like any P2P client. Leaving everything on a web-server has a smaller footprint / necessity of user action.
Somehow you'll need a centralized server to manage the searches. You can't just broadcast the internet to find friends. Or you'll have to rely uppon email requests to add somenone, and to do that you'll need to know the email in advance.
The fewer friends /contacts use your program, the fewer ones will want to use it, since it won't have contact information available.
I see that your server will be a store and forward, so the update problem is solved.

Distributing a bundle of files across an extranet

I want to be able to distribute bundles of files, about 500 MB per bundle, to all machines on a corporate "extranet" (which is basically a few LANs connected using various private mechanisms, including leased lines and VPN).
The total number of hosts is roughly 100, and the goal is to get a copy of the bundle from one host onto all the other hosts reliably, quickly, and efficiently. One important issue is that some hosts are grouped together on single fast LANs in which case the network I/O should be done once from one group to the next and then within each group between all the peers. This is as opposed to a strict central server system where multiple hosts might each fetch the same bundle over a slow link, rather than once via the slow link and then between each other quickly.
A new bundle will be produced every few days, and occasionally old bundles will be deleted (but that problem can be solved separately).
The machines in question happen to run recent Linuxes, but bonus points will go to solutions which are at least somewhat cross-platform (in which case the bundle might differ per platform but maybe the same mechanism can be used).
That's pretty much it. I'm not opposed to writing some code to handle this, but it would be preferable if it were one of bash, Python, Ruby, Lua, C, or C++.
I think all these problems have been solved by modern research into p2p networking and well packaged into nice forms. A bit of script and bit torrent should solve these problems. torrent clients exist for all modern OSs, then a script on each machine to check a location for a new torrent file, start the DL, then delete the old bundle once the DL has finished.
What about rsync?
I'm going to suggest you use compie's idea of rysnc to copy the files in which case you can use a scripting language of your choice.
On the propagating system you will need a script containing some form of representation of the hosts and a matrix between them weighted with the speed. You then need to calculate a minimum spanning tree from that information. From that, you can then send messages to the systems to which you intend to propagate detailing the MST and the bundle to fetch, whereby that script/daemon begins transfer. That host then contacts the hosts over the fastest links...
You could implement it in bash - python might be better or a custom C daemon.
When you update the network you'll need to update the matrix based on latest information.
See: Prim's Algorithm.

what is the purpose of Offline mode in a web browser?

What is actually the purpose of offline mode. does it somehow manages the caching system of the browser. or it just skips the host resolution part, dont know for what.
what probably the designer had in mind, when he developed the offline mode feature.
It's something that most of us wouldn't use very often, but it was useful in the past when we used dial-up modems (and especially for those of us who had time limited plans). You could connect to the internet, go to a website, quickly browse through everything you thought could be useful, disconnect, go to offline mode and then browse through the site as though you were connected (for everything was cached).
I didn't realise browsers still had them (I can't remember the last time I went to the File menu in Firefox though).
I think it's something from the old days, where connections were mostly dial-up (read: slow, and not always connected), and browsers relied on caching material off-line, so if you're not connected, you'd see the cached version in an off-line mode.
I guess it still applies to countries that still don't have DSL everywhere.
You might use it to find out what happens to people who are using your web application if their internet connection is flaky or drops out. You can also use it if you're only interested in accessing local files, not websites.
It doesn't try to access the network, but instead fetches all content from the cache.
in developed countries (asia especially), internet are rare and expensive (there are quota or time based) so offline is very good feature to have for them.
Sometimes you are out and you need to check something on web .. you can use that offline cache.
I wonder if it will became more used in the west these days as people with net-books and other ultra portable devices travel round lots? While wi-fi hot spots and 3g data connections are popular they are still not universal.
I know when I designed my html/javascript open source chess clock, I deliberately designed it to work off-line very easily, so I could just download it onto my netbook and meet my chess friends in cafes without having to worry if the cafe had a good wifi signal or not.
Now, where's the "Blatant Plug" tag?
It helps you to may be browse through a site fast & with a large cache size set , U go to offline mode & read the pages with your Time based Internet connection switched off.
As has been stated by David Johnstone, I used to use the Offline Mode during dial-up days for reading articles/pages from cache when connection to the internet was disconnected.
As my experience, normally offline servers are using while online servers are down or accessibility problem with our web server via internet, meanwhile we are using our offline server to get the informations. Then once the web server is up then we can synchronic the datas from offline server to online server.

Resources