On distributed databases and distributed ledgers
November 29, 2016
On distributed databases and distributed ledgers
Why can’t companies wanting to share business logic and data just install a distributed database? What is the essential difference between a distributed database and a distributed ledger?
Last month, I shared the thinking that led to the design of Corda, which we at R3 will be open sourcing on November 30; and Mike Hearn and I were interviewed by Brian and Meher of Epicenter last week. We’ve been delighted by the response and are looking forward to working with those seek to build on Corda, help influence its direction or contribute to its development and maturation; there’s a lot of work ahead of us!
But one or two observers have asked a really good question. They asked me: “Aren’t you just reimplementing a distributed database?!”
The question is legitimate: if you strip away the key assumptions underpinning systems like Bitcoin and Ethereum, are you actually left with anything? What is actually different between a distributed ledger platform such as Corda and a traditional distributed database?
The answer lies in the definition I gave in my last blogpost and it is utterly crucial since it defines an entire new category of data management system:
“Distributed ledgers — or decentralised databases — are systems that enable parties who don’t fully trust each other to form and maintain consensus about the existence, status and evolution of a set of shared facts”
“Parties who don’t fully trust each other” is at the heart of this. To see why, let’s compare distributed databases and Corda.
Comparing Corda to a distributed database
In a distributed database, we often have multiple nodes that cooperate to maintain a consistent view for their users. The nodes may cooperate to maintain partitions of the overall dataset or they may cooperate to maintain consistent replicas but the principle is the same: a group of computers, invariably under the control of a single organisation,cooperate to maintain their state. These nodes trust each other. The trust boundary is between the distributed database system as a whole and its users. Each node in the system trusts the data that it receives from its peers and nodes are trusted to look after the data they have received from their peers. You can think of the threat model as all the nodes shouting in unison: “it’s us against the world!”
This diagram is a stylised representation of a distributed database:
In a distributed database, nodes cooperate to maintain a consistent view that they present to the outside world; they cooperate to maintain rigorous access control and they validate information they receive from the outside world.
So it’s no surprise that distributed databases are invariably operated by a single entity: the nodes of the system assume the other nodes are “just as diligent” as them: they freely share information with each other and take information from each other on trust. A distributed database operated by mutually distrusting entities is almost a contradiction in terms.
And, of course, if you have a business problem where you are happy to rely on a central operator to maintain your records — as you sometimes can in finance it should be said — then a distributed database will do just fine: let the central operator run it for you. But if you need to maintain your own records, in synchrony with your peers, this architecture simply won’t do.
And there are huge numbers of situations where we need to maintain accurate, shared records with our counterparts. Indeed, a vast amount of the cost and inefficiency in today’s financial markets stems from the fact that it has been so difficult to achieve this. Until now.
Corda helps parties collaborate to maintain shared data without fully trusting each other
Corda is designed to allow parties to collaborate with their peers to maintain shared records, without having to trust each other fully. So Corda faces a very different world to a distributed database.
A Corda node can not assume the data it receives from a peer is valid: the peer is probably operated by a completely different entity and even if they know who that entity is, it’s still extremely prudent to verify the information. Moreover, if a Corda node sends data to another node, it must assume that node might print it all in an advert on the front page of the New York Times.
The trust boundaries — the red curves in the diagram- are drawn in a completely different place!
In Corda, nodes are operated by different organisations and do NOT trust each other; but the outcome is still a consistent view of data.
To repeat, because this distinction is utterly fundamental: nodes of a distributed database trust each other and collaborate with each other to present a consistent, secure face to the rest of the world. By contrast, Corda nodes can not trust each other and so must independently verify data they receive from each other and only share data they are happy to be broadly shared.
And so we call Corda a distributed ledger, to distinguish it from distributed databases. A distributed ledger that is designed painstakingly for the needs of commercial entities.
Put more simply: you simply can’t build the applications we envisage for Corda with traditional database technology. And that’s what makes this new field so exciting.