In this series I take a look at file synchronization tools like syncthing, tahoe-lafs, keybase, ipfs and perkeep (previously camlistore) and compare them to Dropbox.
Obviously my conclusions are related to my circumstances, and your mileage my vary. In particular I am mostly interested in desktop and mobile file syncing of work documents, photos, private data, and to a lessor degree larger files like app binaries and video media.
However one feature I am really looking for is something that would allow syncing subsets of files to untrusted machines.
Recently my dropbox has become pretty bloated… with a decent fibre internet
connection and 1TB of space, I’d got used to just chucking files into ~/Dropbox
and forgetting about them. However this cavalier attitude has to hit some
limits, and over the last few weeks, I’ve started to find those limits.
The first limit I hit was that my work laptop has a 500GB ssd disk, which is
blazing fast but not very big. I use docker and vagrant for testing, and so
the ~/Dropbox
folder has to compete with a stack of container images
for what remains of that 500GB after the operating system takes its share.
Dropbox selective sync seemed to stave off that problem for a while, but today I hit another snag. I booted my windows 10 machine after some months of disuse and the Dropbox sync time is now taking hours to catch up with 10s of GB of synced files, and wow, I forgot about the pain of windows update!
So today, after an hour or so, and 4 or 5 reboots, windows 10 was finally updated with patches. However once it was finally finished, all it did was hand over the batton to dropbox, which is still chugging away syncing now. Unfortunately it still hasn’t synced the files I am actually interested in. As a result of this inconvenience I decided to look at some alternatives.
Note: I am sure that box.net and google drive are viable alternatives (and I have a substantial amount of stuff on google drive and S3) but I wanted to focus on open source tools foremost. Though I expect that several of the tools I have looked at can use box, gdrive, S3 etc for cloud storage. I am also interested in tools that provide some additional layer of features, such as encryption, image editing/viewing, and versioning.
So to begin with…
Dropbox
Dropox is not open source, but it’s the standard by which other tools have to compare. I don’t want to be installing some other app, and keeping another file hierarchy in my home directory unless it provides some substantial benefit over the big dog.
Despite my current complaints it’s hard to overstate how good dropbox is. Before dropbox, there were technologies that could replicate the features of dropbox, but none that I was aware of that tied everything together and worked seemlessly everywhere.
So let’s look at the features which we take for granted with dropbox:
Multiplatform support
The idea of relatively pain free cross platform applications is a fairly new one. I am sure that you have been able to get an rsync client running on windows for years, but I never tried it. The problem with multiplatform support (on an equal basis, i.e. not like poor neglected flash on linux, or the pain of skype on non windows) for open source tools is that it is expensive in terms of development time, and it requires extensive testing. However I work with Linux and Windows on a daily basis, and I need my files to be available there too.
Automatic version history (and deleted files recovery)
Dropbox has saved me several times, when I was a bit too casual with the delete button or made a bunch of changes without committing them to git. Automatic Version history is a safety net which I don’t often use, but when I do, the cost of the feature pays for itself many times over.
Real files, there on your hard drive.
I guess this could be distinguished as local file syncing, rather than some sort of NAS mount, fuse filesystem or hybrid with caching, but Dropbox is squarely about having copies of the files locally.
There are pros and cons to these approaches, which I will cover later. (I am not considering vanilla NFS, Samba, windows file sharing as they seem to be unsuitable for exposing on the internet)
In the past, I’ve experimented with mounting WevDAV and fuse SSH filesystems over the internet to give me access to the same sort of files that I now keep in dropbox. These systems have the benefit of being able to punch through firewalls and proxies via a single port and are relatively easily to secure. However those systems had a tendency to hang applications that didn’t expect transient I/O failures due to laggy internet, and they were painful at best.
Having real local copies of your files, right there in your home directory makes things incredibly friction free. Once you are synced, it’s virtually invisible.
Multiple, distributed backups.
With Dropbox not only do I have copies of each file in my dropbox, on my desktop and laptop, and windows box, but dropbox has copies too, and copies of every historical version of those files going back some time.
While it’s not completely impossible to think of a situation in which all my personal machines with copies of my dropbox were destroyed, I can’t think of too many where dropbox disappears as well. However I can imagine scenarios in which a malicious agent systematically deleted or otherwise made unavailable all my data via an encryption hack etc. I mean if I can delete my dropbox data permanently, then so can someone posing as me.
Luckily, there are some solutions to these potentialities, which are built into some of the tools I have been looking at, so I will cover those later.
No specific platform lock-in
This is kind of an incidental benefit, but I think it’s worth mentioning.
The simple nature of dropbox syncing means that its fairly trivial to migrate to another platform. I have a full, unencrypted, uncompressed copy of all my data sitting here on my desktop. If I wanted to shares those files with another provider, it would be very easy to do so.
Downsides of the Dropbox model
Some of the positives to the dropbox model, are also potential negatives.
Security
As a proprietary service, you don’t have any visibility over the chain of security applied to your data. Once you upload it, you should probably assume that employees of dropbox can access it, as can law authorities, and anyone who manages to intrude into dropboxes security system, or is able to impersonate you. (Though dropbox have put a bit of thought into this, as the notifications when someone logs in from a new pc are reassuring)
The trade off of security vs convenience, is heavily on the side of convenience with dropbox.
The other issue is that dropbox encourages you to keep a single directory of presumably important documents sitting in an obvious place. If you want to encrypt that data, then it’s up to you to provide that protection. On the positive side, it’s fairly trivial to encrypt the home directory in Windows, Fedora and Ubuntu, and this generally the default for new builds.
Lack of scalability
With selective sync, it’s possible to limit how much data is synced to each of your devices. However generally you have a complete copy of all your data on your device. It’s all well and good dropbox giving 1TB of space, but syncing that data takes time.
Finally….
It’s not open source!!
So… In the next part of this series, I will be looking at perkeep