Shared Nothing v.s. Shared Disk Architectures: An Independent View

The Shared Nothing Architecture is a relatively old pattern that has had a resurgence of late in data storage technologies, particularly in the NoSQL, Data Warehousing and Big Data spaces. As architectures go it has some pretty interesting performance tradeoffs when compared to the more common approach of simply sharing a disk array (known as Shared Disk). This article compares and contrasts these two.

Shared Disk and Shared Nothing

Shared nothing is a simple idea. Data data is partitioned in some manner and spread across a set of machines. This means that each machine has sole access, and hence sole responsibility, for the data it holds. It does not share responsibility with other machines. So data is completely segregated, with each node having total autonomy over its particular subset.

By comparison shared disk is essentially the opposite: all data is accessible from all cluster nodes. Any machine can read or write any portion of data it wishes. See the figures below.

Understanding the Trade-offs for Writing

When persisting data in a shared disk architecture writes can be performed against any node. If node 1 and 2 both attempt to write a tuple then, to ensure consistency with other nodes, the management system must either use a disk based lock table or else communicated their intention to lock the tuple with the other nodes in the cluster. Both methods provide scalability issues. Adding more nodes either increases contention on the lock table or alternatively increases the number of nodes over which lock agreement must be found.

To explain this a little further consider the case described by the diagram below. The clustered shared disk database contains a record with PK = 1 and data = foo. For efficiency both nodes have cached local copies of record 1 in memory. A client then tries to update record 1 so that ‘foo’ becomes ‘bar’. To do this in a consistent manner the DBMS must take a distributed lock on all nodes that may have cached record 1. Such distributed locks become slower and slower as you increase the number of machines in the cluster and as a result can impede the scalability of the writing process.

The other mechanism, locking explicitly on disk, is rarely done in practice as caching is so fundamental to performance.distributed lock

However shared nothing does not suffer from the same distributed locking problem, assuming that the client is directed to the correct node (that is to say a client writing ‘A’, in the figure above, directs that write at Node 1) , the write can flow straight though to disk with any lock mediation performed in memory. This is because only one machine has ownership of any single piece of data, hence by definition there only ever needs to be one lock.

Thus shared nothing can scale linearly from a write perspective without increasing the overhead of locking data items, because each node has sole responsibility for the data it owns.

However shared nothing will still have to execute a distributed lock for transactional writes that span data on multiple nodes (i.e. a distributed two-phase commit). These are not as large an impedance on scalability as the caching problem above, as they span only the nodes involved in the transaction (as apposed to the caching case which spans all nodes), but they add a scalability limit none the less (and they are also likely to be quite slow when compared to the shared disk case).

So shared nothing is great for systems needing high throughput writes if you can shard your data and stay clear of transactions that span different shards. The trick for this is to find the right partitioning strategy, for instance you might partition data for a online banking system such that all aspects of a user’s account are on the same machine. If the data set can be partitioned in such a way that distributed transactions are avoided then linear scalability, at least for key-based reads and writes, is at your fingertips.

The counter, from the shared disk camp, is that they can use partitioning too. Just because the disk is shared does not mean that data can’t be partitioned logically with different nodes servicing different partitions. There is much truth to this, assuming you can set up your architecture so that write requests are routed to the correct machine, as this tactic will reduce the amount of lock (or block) shipping taking place (and is exactly how you optimise databases like Oracle RAC).

Put another way – a shared disk implementation can be configured in a shared nothing mode. The difference here is just the physical placement of data. Shared disk is always network attached in some way, never local. So whilst remote disks can provide comparatively high throughput and good random IO performance they will do this at often greater monetary cost.

Shared Disk Architectures are write-limited where multiple writer nodes must coordinate their locks around the cluster. Shared Nothing Architectures are write limited where writes span multiple partitions necessitating a distributed two phase commit.

Considering the Retrieval of Data

The retrieval of data is a very different story, with different tradeoffs for each of these two approaches.  Looking firstly at Shared Disk we find two significant drawbacks:

The first is the potential for resource starvation, most notably disk contention on the SAN/NAS drives. Shared disk means exactly that: all machines share the same disk array, and to some extent the same interconnect. Fortunately disk contention in a large shared disk system can be alleviated by partitioning. Data within the shared disk subsystem is often partitioned by its usage pattern (usually by moving tables onto different sections of the backing disk array). The problem with this approach is that it is manual: the data must be physically partitioned in advance.

The second issue is that caching is less efficient. Each machine in a shared disk system is likely to become involved (and hence have the requirement to cache) the whole dataset. This reduces the efficiency of the cache as cache misses are more likely. This is in stark contrast to the shared nothing approach where each machine only needs to cache the subset of the data that it owns. Thus caching can be far more effective in a shared nothing system.

Shared nothing is not without its flaws though. SN works brilliantly if the query is self sufficient – if each node can complete its ‘portion’ of the processing without needing data from any other node. However there will inevitably arise use cases where data from multiple nodes must be brought together or joined in some way. The implication is often that data, which may not be included in the final result, be shipped from one machine to another. This need to ship data between machines to ‘join’ can have a significant effect of overall query performance.

So the reality is that the number of queries requiring data shipping will depend on both the use case and the partitioning strategy.  There are many cases where joins can be eliminated altogether by using Aggregates – for example in commercial search engines. However, for many general business use cases, for example ones with large related fact tables, some data shipping is often inevitable. As a result many shared nothing solutions recommend the use of fast 10GE networks.

Finally we should comment on the concurrency. Many key-value stores use shards and SN to provide very high levels of concurrency. This is achieved by routing user requests, via the sharding key, to the single machine that has the required data. This pattern is very efficient and the result is stores that provide extremely high levels of read and write throughput over a large, concurrent user base. This pattern is used heavily in large web applications via NoSQL.

What we must note is that the scalability of this pattern is only available for key-based access. It does not apply to more general processing in a shared nothing system. Any request that does not explicitly use the primary key must be broadcast to all machines (partitions). This presents a limit to the scalability for questions that do not consider the primary key. As a result many shared nothing systems, which support more general query workloads, show similar levels of concurrency to single node systems (5-10 is not uncommon).

So this is important enough to restate: Shared nothing is only linearly scalable for key-based access. The use of secondary indexes always results in every node being consulted. This limits scalability, certainly in terms of the number of concurrent requests that can be serviced. This is one of the reasons for many distributed key-value stores sticking to the very simple K-V contract.

The retort is that shared nothing reduces the amount of data stored per machine. Thus total data volumes can be higher, or conversely each query will be faster as the average dataset per query is reduced. This is why SN is favoured for Big Data systems like HBase, Map Reduce, Cassandra etc.

Reads in Shared Disk Architectures can suffer from resource starvation issues and less efficient caching as the cluster scales. Shared Nothing  Architectures have the potential for far more scale but this can be hampered by queries that must hit all machines. Query speed can also be affected if  non-result (intermediary) data sets must be shipped cross-machine.

Complexity at Scale

A possibly less obvious benefit of shared nothing is to do with complexity at scale. Put simply, because of the autonomous nature, each node in a SN system has a relatively simple contract. It’s concerns are encapsulated in its own data partition. This means the software to manage failure can be simpler, behaving with little or no knowledge of it’s wider role in the cluster.

In contrast shared disk systems are fully open to the influence of other nodes. These couplings take the form of locks, with timeouts and relatively complex failure semantics. If we consider a failure in a shared disk system, the node is likely to have locks out on the underlying shared disk structure. These locks implicitly affect the other processing nodes and the system must go through a process of discovering the failed nodes, it’s locks and then releasing them or letting them timeout.

The shared nothing system only has to detect failure and promote the backup node (or similar depending on the failure strategy of the system). In fairness these problems are well understood, but often still misapplied and always bring, IMHO, a little more complexity to bare. Certainly the complexity of these issues seems to grow with the number of nodes and the heterogeneity of the deployment. This means SN often works best for very large installations.

So Which Should You Use?

If you are Google or Amazon then the simple, autonomous SN model will likely be attractive. Key-value based approaches will give you the concurrency you need to serve millions of users. The brute force, divide and conquer approach to data processing will provide the grunt needed to sift through datasets that require hundreds of machines to process.

If you are a business system that is unlikely to need more than two or three servers then the complexities of partitioning a complex domain model efficiently may outweigh the benefits. This is why many business databases such as those provided by Oracle and IBM tend to favour shared disk. Particularly considering a shared disk model can be partitioned to simulate at least some of the benefits of the shared nothing approach but within a shared disk system.

Often the choice is made for you by the implementation, and there may be other features besides the physical architecture that attract you to a certain product. Certainly shared nothing, as an approach, is increasing in popularity. Most of the NoSQL space is shared nothing. However many NoSQLs have blended models that include both sharding and replication as first class primitives. This complicates the picture.

Hadoop also provides a blended model. HDFS is really a type of shared disk but the execution model it uses is shared nothing. Computation is routed to the nodes where data lies, wherever this is possible. Composite models such as this can be attractive as they provide benefits from both approaches: a shared subsystem which spans the various machines in the cluster. A programming model that treats data and processing as shared nothing, with each node assuming an autonomous, local data subset. This provides the benefits of shared nothing’s scaling-through-autonomy but with the power to break from the model where needed. Clever!

So whilst the concept of shared nothing vs shared disk is relatively simple, there are a huge host of other factors that differentiate the data technologies of today. This is just one classifier. But it is a useful one, at least from the perspective of understanding how these different systems work under the hood.

Further Reading

There are a number of good papers on the subject. The infamous Michael Stonebraker was one of the early SN evangelists, back in the early 80’s. His paper The Case For Shared Nothing still makes good reading, even if it does skip some issues.

Also Shared-Disk vs. Shared Nothing by the makers of ScaleDB – a Shared Disk database. This paper makes the case for Shared Disk and enumerates the downsides of Shared Nothing.

The last paper presents the opposite view. How to Build A High Performance Data Warehouse is well written, mapping the pros and cons of each architecture. However don’t be sucked in by the academic URL. The authors are all affiliated with Vertica which is a commercial implementation from the Stonebraker camp, and the paper noticeably favours a Shared Nothing Columnar Architecture model, like the one used by Vertica. Never the less it’s a good read.

Finally there is a good section in Architecture of a Database System.

See also Elements of Scale and Are Databases a Thing of the Past?

Posted on November 24th, 2009 in Analysis and Opinion


  1. Andrew Wilson December 3rd, 2009
    9:20 GMT

    Nice, lots of good background reading. Thanks, A.

  2. Hal December 9th, 2009
    13:38 GMT

    Good precis of the issues. Nice bit of cynicism at the end to warn that “academic” paper is indeed marketing material. PS: Didn’t realise the term Shared Nothing was as old as that – good to know that there is Nothing new in this world.

  3. Jeff Darcy January 4th, 2011
    14:13 GMT

    Good article. Thanks.

    Besides performance/scalability, another difference between shared-disk and shared-nothing has to do with how failures are handling. In a shared-disk model, failure of a node requires a complex recovery/failover procedure in which locks are broken, ownership of storage transferred, etc. This is fairly well understood technology today, but – having been around since we were all inventing that technology together – I continue to see it misapplied. In a shared-nothing model, storage and node failures can be handled the same way. Precisely because any given piece of raw storage is “stranded” on one node, systems built around this paradigm tend to involve replication “further out” from the disks so the requesting node simply uses another replica regardless of which type of failure it was. It’s one of those cases IMO where a worse problem leads to a better solution, because the problem (half solved by the multi-controller RAID hardware that shared-disk types use and shared-nothing types eschew) can’t be ignored.

    Another interesting angle here, since you work on Coherence, is how some of these same principles apply to more memory-centric stores. Even with SCI and IB available, memory is essentially shared-nothing so it makes sense if it and disk can be handled in consistent ways instead of having two different systems with sharply different performance and failure-handling characteristics.

  4. ben January 5th, 2011
    17:47 GMT

    An interesting comment Jeff. I hadn’t really considered the differences with respect to failure semantics. From a Coherence perspective a fair bit simpler as it’s all shared nothing and there is no disk (on my current project we use messaging as a system of record, which is of course disk based but it’s fire and forget). Your very point is the beauty of Coherence and the reason that a lot of folks are pushing all memory solutions. These guys are leading the way IMO (if you haven’t seen it already): http://tinyurl.com/2c8vnul. However, as the use of SSD increases we may see the differences you allude to come back again!

  5. Simon Griffiths January 9th, 2011
    2:07 GMT

    Ben, an excellent precis of the problem. My expertise is in the large scale DW and I’d like to add a few comments.

    One of the interesting aspects of data warehousing is that in many cases, data is only ever added to the database and rows should not be updated. This is often a guiding principle and applies to both fact based and dimensional data. This is often implemented through the use of effective dates. In that context, the requirement to ‘lock’ rows for update is substantially reduced. In this situation the main limiting factor that you have identified is substantially reduced in its effect for SD. How about the following as a suggested amendment :

    Shared Disk Architectures are write limited should they require locks that must be coordinated across the cluster. Shared Nothing Architectures are limited should they require a distributed two phase commit.

    In fact there is then a close relationship between the two limiting factors – essentially both systems are limited by the requirement to co-ordinate data change across the cluster. In the case of SN, the co-ordination is required to ensure two or more changes are co-ordinated, in the case of SD the co-ordination is required to co-ordinate competing write requests.

    This observation is also relevant when considering partitioning of data in a SD system, If data is only ever being added, then data-loading becomes straightforward to parallelise up to the level of a load process per data partition and the SN system becomes the equivalent of a SN system for the load process.

    When considering the case of complex queries, then the second limitation of SN you mention (data shipping) becomes the main limiting factor. Again in a DW use case, complex queries that include many tens of joins are the norm – queries with 20-30 are not unusual. Thus the output of each join action will probably not be partitioned on the same key as the next join action and so cross-node shipping of data becomes the norm rather then the exception. The bottleneck for query performance then becomes the network between the nodes. Interestingly, this provides a performance profile for such queries which can provide a reduction in performance as the number of nodes in a SN cluster increases – for 2 nodes, 50% of the data has to be shipped, for 10 nodes, that becomes 90%. I have seen situations where scaling the cluster on SN is near linear until the network is saturated, after that, performance will be flat on extending the cluster. However, as congestion on the network increases, performance will then degrade and so the addition of an node to an SN cluster can result in slower query times.

    A final point I would add is that a major factor effecting the performance of many implementations of SN in data warehousing is data skew. Where the partitioning (or sharding) is determined by the value of a data item then if that data item has anything other than a symmetric distribution then it’s likely that the partitions will have unequal sizes and thus unequal scan times. This is not mitigated by hashing (as hashing changes data location but not distribution) but is mitigated by a round-robin partitioning algorithm.

  6. Dominique De Vito July 6th, 2011
    14:10 GMT

    Good article. Thanks.

    Let’s mention also, here, the denormalization method that is one method, used in SN architecture, for enabling all related data to hold on the same node.

  7. ben July 7th, 2011
    7:26 GMT

    That’s very true Dominique and a point close to my own heart. At RBS we’ve put a lot of effort into the problem of balancing replication and partitioning with the result being the Connected Replication pattern discussed here. The pattern splits data into a star schema and denormalises only the Dimensions. The trick being that the absolute minimum amount of Dimension data is replicated around the grid by constantly tracking what Dimensions are connected to Facts. I’ll update the post to reflect your point. Thanks

  8. Victory September 4th, 2023
    6:44 GMT

    Reading DDIA, and got directed to this article. Thanks for this writeup.

  9. ✅ Российское Лото. Приветствуем. Выплата 68 743 Pyб. Забрать >> https://forms.yandex.ru/cloud/6547f88ceb6146ed3608499e/?hs=f99d1af89e656f3d83a73e9957e3b5cd& ✅ November 8th, 2023
    21:33 GMT

    hk9d2y

  10. You got 63 959 US dollars. Get >> https://script.google.com/macros/s/AKfycbxJBWHhIUyjr_yKr2f76i7NOQCf_0Y9QP32LOG4fB3RnWYqeqSFu-0hRVUMYipbMZML/exec?hs=f99d1af89e656f3d83a73e9957e3b5cd& April 25th, 2024
    1:37 GMT

    slt1uu

  11. win789.at January 24th, 2025
    15:32 GMT

    Exceptional post but I was wondering if you could write a litte more on this topic?
    I’d be very grateful if you could elaborate a little bit more.
    Many thanks!

  12. 79kingcomco January 24th, 2025
    15:46 GMT

    Hello! This post could not be written any
    better! Reading this post reminds me of my old room mate!

    He always kept chatting about this. I will forward this
    article to him. Fairly certain he will have a good read.
    Thank you for sharing!

  13. 888 b January 24th, 2025
    15:50 GMT

    When someone writes an paragraph he/she keeps the thought
    of a user in his/her mind that how a user can be
    aware of it. Therefore that’s why this piece of writing is amazing.

    Thanks!

  14. 33win.style January 24th, 2025
    20:42 GMT

    I’m truly enjoying the design and layout of your website.
    It’s a very easy on the eyes which makes it much more pleasant for me to come
    here and visit more often. Did you hire out a designer to create your theme?

    Excellent work!

  15. https://98win.wang/ January 25th, 2025
    1:19 GMT

    Amazing things here. I am very satisfied to
    peer your post. Thank you a lot and I’m taking a look forward to contact you.

    Will you please drop me a e-mail?

  16. winvn.studio January 25th, 2025
    16:15 GMT

    Heya fantastic blog! Does running a blog similar to this take a massive amount work?
    I’ve virtually no expertise in programming however I had been hoping to start my own blog in the
    near future. Anyhow, should you have any
    ideas or techniques for new blog owners please share.

    I understand this is off subject but I just needed to ask.
    Thanks!

  17. https://78wincom.win/ January 27th, 2025
    15:49 GMT

    Hi, i think that i saw you visited my blog so i came to “return the favor”.I am trying to find things to enhance my website!I suppose its ok to use
    a few of your ideas!!

  18. Xin885 January 28th, 2025
    0:19 GMT

    Thanks for your marvelous posting! I genuinely enjoyed reading it, you may be
    a great author.I will be sure to bookmark your blog and may come back in the
    foreseeable future. I want to encourage you to continue your great writing, have a nice weekend!

  19. win88win.co January 28th, 2025
    10:37 GMT

    Nice blog here! Also your site loads up fast! What web host are you using?
    Can I get your affiliate link to your host?
    I wish my website loaded up as fast as yours lol

  20. sex chịch nhau không che February 7th, 2025
    19:16 GMT

    I’m curious to find out what blog system you have been working with?
    I’m having some small security issues with my latest website and I’d like to find something more
    secure. Do you have any solutions?

  21. Ticket: TRANSACTION 0,75499216 bitcoin. Assure >>> https://telegra.ph/Get-BTC-right-now-02-10?hs=f99d1af89e656f3d83a73e9957e3b5cd& February 14th, 2025
    5:02 GMT

    pbcus6

  22. winvn.com February 14th, 2025
    16:35 GMT

    Exceptional post however I was wanting to know if you could write a litte more
    on this topic? I’d be very thankful if you could elaborate a little bit
    further. Many thanks!

  23. sex chịch nhau không che February 15th, 2025
    7:02 GMT

    What’s Taking place i’m new to this, I stumbled upon this I have found It positively useful and it has aided me out loads.
    I’m hoping to contribute & assist other customers like its helped me.

    Good job.

  24. king88couk February 15th, 2025
    20:24 GMT

    An impressive share! I have just forwarded this onto a friend who had been doing a little
    homework on this. And he in fact ordered me breakfast because I found it
    for him… lol. So let me reword this….
    Thanks for the meal!! But yeah, thanx for spending the time to
    talk about this subject here on your internet site.

  25. sex việt nam gái xinh múp nõn February 17th, 2025
    0:08 GMT

    It’s perfect time to make some plans for the future and it is
    time to be happy. I’ve read this post and if I could I want to suggest you some interesting things
    or advice. Perhaps you can write next articles referring to
    this article. I desire to read more things about it!

  26. You have received 1 message(-s) № 466811. Open >> https://telegra.ph/Binance-Support-02-18?hs=f99d1af89e656f3d83a73e9957e3b5cd& February 18th, 2025
    16:04 GMT

    1vm6hy

Have your say

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>




Safari hates me
IMPORTANT! To be able to proceed, you need to solve the following simple problem (so we know that you are a human) :-)

Add the numbers ( 13 + 4 ) and SUBTRACT two ?
Please leave these two fields as-is:

Talks (View on YouTube)