About the Limit of a SQL Azure Database

by Ciprian Jichici 19. June 2010 00:25

In the past few months I received tons of questions about the SQL Azure database limit. There is an ongoing debate revolving around this size limitation with some arguing is not that bad and some saying it’s a complete showstopper. But before getting into the details let’s have a quick review of the facts. Initially, Microsoft limited the SQL Azure database to 10 GB, which meant that once you hit the limit you will receive error 40544 and you will not be able to perform anymore operations like inserting data, updating data, or creating new objects such as tables, stored procedures, views, and functions. To make things clear, the maximum size of the database refers exclusively to objects that you create and have control over. It does not take into account content from the master database, replicas, and the transaction log. At TechEd 2010 US, Microsoft announced an increase of the limit which now stands at 50 GB.

Well, that’s the current situation. As it is with many things in IT there is no free lunch here and there is also the usual compromise to be dealt with. On one hand there are high expectations from customers based on the fundamental promises of cloud computing: availability, scalability, and optimized cost of operations. There is a common expectation that one will be able to use the power of the cloud to manage efficiently high amounts of data. Currently, the 50 GB limit falls quite short of meeting these expectations. On the other hand, there are technical limitations which lead to the current size limit. In other words, there is a price to be paid in order to get the cloud advantages.

Let me elaborate a bit on this. One of the key features of SQL Azure is its capability to store 3 different copies of your database at any given point in time. While this is not used (at least not for now) for load balancing purposes, it gives you a very strong approach in terms of robustness. Suppose the primary copy fails, your requests will be automatically redirected to the second copy while in the background, SQL Azure will recreate the third copy offline. Once you understand this mechanism it becomes quite clear that it conflicts heads-on with the database size. With this approach, you will need to work with a maximum database size that guarantees a reasonable amount of time needed to rebuild copies, even in the worst case (the first two instances fail at short intervals of time) so that the risk of having all three copies failing while the new copies are still being rebuilt is minimized. It’s quite clear that despite the recent increase (from 10 GB to 50 GB) we cannot expect spectacular increases in the future, unless the model is changed.

It seems to me the next step in scaling the SQL Azure database size relies on introducing a new logical level which works transparently with X GB chunks in the underlying layer. What I am talking about is some kind of “logical” database that is broken transparently by the fabric in chunks. As a matter of fact this is exactly what Microsoft is recommending today as a best practice for those who want to use SQL Azure for large databases (the concept is commonly referred as sharding). The difference is that today we need to develop our own logic in order to achieve this.

As a matter of fact, Microsoft has a few compelling arguments in favor of the existing limit:

  • Application migration to the cloud will start most probably with smaller, departmental applications which are unlikely to require storage in excess of 50 GB
  • Most large applications to live in the cloud will be based on multi-tenant architectures which are build from the ground up with sharding in mind
  • The advantage of having three different copies of your data maintained by the cloud make it worth to re-architect your solution with sharding in mind

While I mostly agree with the arguments presented above, I still think that Microsoft needs to address the following two issues:

  • Release the promised patterns & practices guidance that addresses the challenges of architectures based on sharding (it has been referred many times in official talks but it’s no where to be seen)
  • Build into SQL Azure that extra logical layer that will allow us to move as high as we need in terms on database size while enjoying the advantages of the SQL Azure fabric

Until then, we need to plan and assess carefully before we decide to move our relational data into the cloud.

Tags: , ,

Azure | Architecture | SQL Server

Comments are closed