Wednesday, September 26, 2012

Review: The Long Earth


The Long Earth
The Long Earth by Terry Pratchett

My rating: 4 of 5 stars



A very engaging book. I'm afraid to say much about it though, out of fear of spoiling the story. I will say that I found the literary and cinematic references amusing.



View all my reviews

Thursday, September 13, 2012

Review: The Vegetarian Myth: Food, Justice, and Sustainability


The Vegetarian Myth: Food, Justice, and Sustainability
The Vegetarian Myth: Food, Justice, and Sustainability by Lierre Keith

My rating: 3 of 5 stars



This author is obviously angry at being fooled into believing that a vegan lifestyle would be better for her and the planet, and the book is an impassioned argument against such beliefs. Her arguments against the three main arguments for vegetarianism (ecologic, moral, and health) also appear quite strong, although I suspect there is some handwaving at times where the science gets beyond her. In particular, I noted the use of "chemfear" (the belief that "if you can't pronounce it it must not be good for you") in a few places, and there's other spots where I'm not convinced that she's connected all the dots. But there's enough here to at least make one question the merits of the positions she rails again, which are often held with a religious fervor.

Vegans will hate this book, as will many vegetarians, as she calls them childish and ignorant. (I suspect this accounts for many of the "1 star" ratings I'm seeing.) But, sadly, she is right on both counts, certainly with respect to vegans and also with respect to many vegetarians. And while I think her closing recommendations are problematically impractical for many people, she admits that she doesn't have all the answers. But at least she is putting the questions on the table. Unfortunately, the closing of the book includes an excessively aggressive indictment of liberals, the American left, and men, which will tend to put off people who do not share her beliefs in radicalism and feminism. Bad bridge-building there that mars an otherwise very good book.



View all my reviews

Tuesday, September 11, 2012

Review: Fields of Fire


Fields of Fire
Fields of Fire by James Webb

My rating: 4 of 5 stars



I picked this book up because [a:Rachel Maddow|4085286|Rachel Maddow|http://photo.goodreads.com/authors/1329885343p2/4085286.jpg] mentioned it in her book, [b:Drift: The Unmooring of American Military Power|13564710|Drift The Unmooring of American Military Power|Rachel Maddow|http://photo.goodreads.com/books/1334949932s/13564710.jpg|17113518]. Normally I'm not that fond of war novels, but this one definitely held my interest. Intensely emotional at times, but overall not a difficult read.



View all my reviews

Wednesday, September 05, 2012

Review: Twilight of the Elites: America After Meritocracy


Twilight of the Elites: America After Meritocracy
Twilight of the Elites: America After Meritocracy by Christopher Hayes

My rating: 5 of 5 stars



A sobering look at the way meritocracy fails everyone except, of course, the rich, who are often not those of the highest merit. This is one of the few books I've read recently that has led me to jot down ideas for things that need future investigation. One in particular has to do with the conservative mindset (as identified by Joss and others) and indeed whether we have the relationship between conservativism and wealth backwards; that is, conservatives become plutocrats because they are psychologically structured (by genetics or upbringing) to pursue the ouroboros of endless acquisition, rather than the wealthy simply tending to conservatism because it makes rational sense to do so.

I also saw clear connections between this book and [a:Rachel Maddow|4085286|Rachel Maddow|http://photo.goodreads.com/authors/1329885343p2/4085286.jpg]'s [b:Drift|13606169|Drift|Rachel Maddow|http://www.goodreads.com/assets/nocover/60x80.png|17113518] and [a:Lawrence Lessig|25159|Lawrence Lessig|http://photo.goodreads.com/authors/1280016402p2/25159.jpg]'s [b:Republic, Lost: How Money Corrupts Congress -- and a Plan to Stop It|12062331|Republic, Lost How Money Corrupts Congress -- and a Plan to Stop It|Lawrence Lessig|http://photo.goodreads.com/books/1318671917s/12062331.jpg|16768310], but that's not all that surprising given that Hayes works with Maddow and was a fellow at Harvard under Lessig's sponsorship. I would recommend all three books, since really they all address the same core problem, from different points of view: the ways in which our government's leaders become detached from the people they govern.

I borrowed this from the library to read, but I think I'm going to need to buy a copy because I want to write notes in it (something I rarely ever do) and I can't do that with a library copy.



View all my reviews

Thursday, September 15, 2011

Thoughts on Bitcasa

Bitcasa has been getting a lot of attention in my Google Plus circles the past week or so; I suspect this is because it was in the running for TechCrunch's Disrupt prize (but ultimately lost to something called "Shaker", which appears to me to be some sort of virtual bar simulation). Bitcasa claims to offer "infinite" storage on desktop computers for a fixed monthly fee. I've yet to see any solid technical information on how they're doing this, but it seems to me that they're planning to do this by storing your data in the cloud and using the local hard drive as a cache.

There's nothing earthshattering about this; I set up Tivoli Hierarchical Storage Manager to store infrequently used files on lower-priority storage media (either cheap NAS arrays or tape) four years ago with my last employer, and the technology was fairly mature then. Putting the HSM server, data stores, or both in the cloud was possible even then and should be even more so now that cloud services are far more mature than they were then. So while there are obviously issues to sort out, this isn't a big reach technically.

More interesting to me is how they plan to provide "infinite" storage. Given the promise of infinite storage, most users will never delete anything; my experience is that users don't delete files until they run out of space, and if they really do have "infinite" storage that won't ever happen. The rule of thumb I recall from storage planning 101 is that storage requirements double every 18 months. According to Matthew Komorowski, the cost of storage drops by half every 14 months, so their cost to provide that doubled storage should slowly decline over time, but that margin is fairly thin and may not be sufficient to cover the exponentially growing complexity of their storage infrastructure over time. They'll also have to cope with ever-increasing amounts of data transit, but I can't find good information just now on the trend there, in part because transit pricing is still very complicated.

More interesting to me is that Bitcasa appears to be claiming that they will use deduplication to reduce the amount of data transferred from clients to the server. This is, itself, not surprising. The surprising thing is that they also claim that they will be using interclient deduplication; that is, if you have a file that another client has already transferred, they won't actually transfer the file. I think they're overselling the savings from interclient deduplication, though. I may not be typical, but the bulk of the file data on my systems seems to fall into a few categories: camera raws for the photos I've taken; datasets I've compiled for various bulk data analyses (e.g. census data, topographic elevation maps, the FCC licensee database); virtual machine images; and savefiles from various games I play. The camera raw files (over 10,000 photographs at around 10 megs each) are obviously unique to me, and as I rarely share the raws their opportunity to leverage deduplication gain there is essentially nil. As to the datasets, they themselves are duplicative (most of them are downloaded from government sources), of course, but the derived files that I've created from the source datasets are unique to me and are often larger than the source data. So, again, only limited gain opportunity there. Most of my virtual machine images are unique to me, as I've built them from the bottom up myself. And obviously the saved games are unique to me. If I were to sign up my laptop and its 220 GB hard drive (actually larger than that but I haven't gotten around to resizing the main partition from when I reimaged the drive onto a newer, larger drive after a drive crash a couple months ago, so Windows still thinks it's a 220 GB drive) onto Bitcasa, they'd probably end up having to serve me somewhere around 170 to 200 GB of storage, depending mainly on how well it compresses. (Much of the data on my machine is already compressed.)

Even my music (what there is of it; I keep a fairly small collection of about 20 gigabytes) doesn't dedup well. I know, I've tried to dedup my music catalog several times over the past decade plus and my experience is that "identical" songs are often not identical at the bit level; the song might be the same but the metatags differ in some manner that makes them not bit-compatible. Or the songs might be compressed with different bit rates or even different algorithms; I have several albums that I've ripped as many as five times over the years with different rippers. Even if you rip the same song twice from the same disc with the same settings on the same ripper it still might end up with a different bitstream, if the disc has a wobbly bit on it.

If Bitcasa assumes that most of its clients will be "typical" computer users, most of whose data is "stuff downloaded from the Internet", then I suppose they can expect significant deduplication gain, especially for videos, music, and especially for executables and libraries (nearly everyone on a given version of Windows will have the same NTOSKRNL.EXE, for example. although in general OS files cannot be made nonresident anyway without affecting the ability of the computer to boot). The problem I think they're going to run into is that many of their early adopters are not going to be like that. Instead, they're going to be people like us: content creators far more than content consumers, whose computers are all filled to the brim with stuff we've created ourselves out of nothing, unlike anything else out there in the universe.

Then there's the whole issue of getting that 220 GB of data on my machine to their servers. It took my computer nearly 40 days to complete its initial Carbonite image, and that's without backing up any of the executables. I have some large files that I keep around for fairly infrequent use; if Bitcasa decides to offline one of those and I end up needing it, I might be facing a fairly long stall while it fetches the file from their server. Or, if I'm using the computer off the net (something I often do), then I'm hosed, and if I'm on a tether (which is also fairly frequent) then I could be facing a download of a gig file (or larger) over Verizon 3G at about 200 kbps. Good thing Verizon gives me unlimited 3G data!

I also wonder how Bitcasa will interact with applications that do random access to files, such as database apps and virtual machine hosts. I use both of these on a fairly regular basis, and I think it might get ugly if Bitcasa wants to offline one of my VHDs or MDFs (or even my Outlook OST or my Quickbooks company file). If they are planning to use reparse points on Windows the way Tivoli HSM does, files that will need to be accessed randomly, or which need advanced locking semantics, will have to be fully demigrated before they can be used at all.

In addition to all this, the use of cryptographic hashes to detect duplicates is risky. There's always the chance of hash collision. Yes, I know, the odds of that are very small with even a decent sized hash. But an event with even very low odds will happen with some regularity if there are enough chancves for it to occur, which is why we can detect the 21cm ultra-fine hydrogen spin transition at 1420.405752 MHz: this transition occurs with a probability of something like one in a billion, but we can detect it fairly easily because there are billions upon billions of hydrogen atoms in the universe. With enough clients, eventually there's going to be a hash collision, which will ultimately result in replacing one customer's file with some totally different file belonging to some other customer. Worse yet, this event is undetectible to the service provider. (Kudos to the folks at spideroak for pointing this out, along with other security and legal concerns that interclient deduplication presents.)

So while I think the idea is interesting, I think they're going to face some pretty serious issues in both the short term (customer experience not matching customer expectation, especially with respect to broadband speed limitations) and long term (storage costs growing faster than they anticipate). Ought to be interesting to see how it plays out, though. I think it's virtually certain that they'll drop the "infinite" before too very long.

(Disclaimers: Large portions of this post were previously posted by me on Google Plus. I am not affiliated with any of the entities named in this post.)