About the Limit of a SQL Azure Database

by Ciprian Jichici 19. June 2010 00:25

In the past few months I received tons of questions about the SQL Azure database limit. There is an ongoing debate revolving around this size limitation with some arguing is not that bad and some saying it’s a complete showstopper. But before getting into the details let’s have a quick review of the facts. Initially, Microsoft limited the SQL Azure database to 10 GB, which meant that once you hit the limit you will receive error 40544 and you will not be able to perform anymore operations like inserting data, updating data, or creating new objects such as tables, stored procedures, views, and functions. To make things clear, the maximum size of the database refers exclusively to objects that you create and have control over. It does not take into account content from the master database, replicas, and the transaction log. At TechEd 2010 US, Microsoft announced an increase of the limit which now stands at 50 GB.

Well, that’s the current situation. As it is with many things in IT there is no free lunch here and there is also the usual compromise to be dealt with. On one hand there are high expectations from customers based on the fundamental promises of cloud computing: availability, scalability, and optimized cost of operations. There is a common expectation that one will be able to use the power of the cloud to manage efficiently high amounts of data. Currently, the 50 GB limit falls quite short of meeting these expectations. On the other hand, there are technical limitations which lead to the current size limit. In other words, there is a price to be paid in order to get the cloud advantages.

Let me elaborate a bit on this. One of the key features of SQL Azure is its capability to store 3 different copies of your database at any given point in time. While this is not used (at least not for now) for load balancing purposes, it gives you a very strong approach in terms of robustness. Suppose the primary copy fails, your requests will be automatically redirected to the second copy while in the background, SQL Azure will recreate the third copy offline. Once you understand this mechanism it becomes quite clear that it conflicts heads-on with the database size. With this approach, you will need to work with a maximum database size that guarantees a reasonable amount of time needed to rebuild copies, even in the worst case (the first two instances fail at short intervals of time) so that the risk of having all three copies failing while the new copies are still being rebuilt is minimized. It’s quite clear that despite the recent increase (from 10 GB to 50 GB) we cannot expect spectacular increases in the future, unless the model is changed.

It seems to me the next step in scaling the SQL Azure database size relies on introducing a new logical level which works transparently with X GB chunks in the underlying layer. What I am talking about is some kind of “logical” database that is broken transparently by the fabric in chunks. As a matter of fact this is exactly what Microsoft is recommending today as a best practice for those who want to use SQL Azure for large databases (the concept is commonly referred as sharding). The difference is that today we need to develop our own logic in order to achieve this.

As a matter of fact, Microsoft has a few compelling arguments in favor of the existing limit:

  • Application migration to the cloud will start most probably with smaller, departmental applications which are unlikely to require storage in excess of 50 GB
  • Most large applications to live in the cloud will be based on multi-tenant architectures which are build from the ground up with sharding in mind
  • The advantage of having three different copies of your data maintained by the cloud make it worth to re-architect your solution with sharding in mind

While I mostly agree with the arguments presented above, I still think that Microsoft needs to address the following two issues:

  • Release the promised patterns & practices guidance that addresses the challenges of architectures based on sharding (it has been referred many times in official talks but it’s no where to be seen)
  • Build into SQL Azure that extra logical layer that will allow us to move as high as we need in terms on database size while enjoying the advantages of the SQL Azure fabric

Until then, we need to plan and assess carefully before we decide to move our relational data into the cloud.

Tags: , ,

Azure | Architecture | SQL Server

To Master Data, You Must

by Ciprian Jichici 19. February 2010 22:48

Although it might sound like one of Master Yoda’s wise sayings, it is in fact one of the realities of modern IT. In fact, it’s more than a reality, it’s a significant challenge for most CIOs today. Years and years of disconnected technology investments are driving many organizations to a state where data duplication becomes a harsh fact of life. Simple questions like “where is you customer data located?” have in many cases multiple answers. This is not necessarily a bad thing, as long as there is a clear methodology to manage data. Unfortunately, most of the time there is no such thing in the organization. The fact that “where is your customer data located?” has multiple answers is worsened by the fact that “which application manages your customer data?” has multiple answers. To make matters even more complicated, most of the applications referred in the response will claim they own the “customer”.

MDM (Master Data Management) is emerging as a potential way to address the challenges mentioned above. Unfortunately, many organizations believe that by simply implementing a MDM solution all of these challenges will be addressed. That’s a big (and potentially costly) mistake. In most cases, along with the MDM solution itself, you also need a serious shift in the organization’s data management mindset. Failure to drive the mindset change will inevitably lead to the failure of the MDM solution implementation. Such a mindset change should result in at least the following:

  • A data governance program that spans (preferably) the entire organization
  • A master data model
  • Changes in applications that interact with master data
  • Coherent maintenance processes

Obviously, these are major changes for any organization, and they should be supported by the appropriate toolset. SQL Server 2008 R2 brings for the first time to the Microsoft Application Platform such a toolset, in the form of SQL Server 2008 R2 Master Data Services (MDS). Basically, MDS provides you the infrastructure to implement a master data hub for the central management of master data entities and hierarchies. This infrastructure provides services like management portal, versioning, data-centric workflows, extensible business rules, and role-based security.

For the first time on the Microsoft Application Platform we have the necessary tools to morph the vision of unified, enterprise-wide master data management into tangible reality. All we need is the ability to drive the change in mindset.

To find out more about this promising technology I encourage you to go to the SQL Server 2008 R2 Master Data Servicessite.

Tags: ,

SQL Server | Architecture | Microsoft Application Platform

SQL Server 2008 Take Two

by Ciprian Jichici 18. January 2010 14:19

I admit, this might not be the best name for this post, but as I'm writing it the Golden Globe Awards event is unfolding, so I hope my cinematography bias is excused. In case you're still wondering, this is about SQL Server 2008 R2. "Take Two" might not be the best choice simply because SQL Server 2008 R2 is not an attempt to fix issues with SQL Server 2008. As a matter of fact, I think SQL Server 2008 was a smashing hit. What do you do when you have such a winner at hand? It's simple: you build on it. And it sure looks like Microsoft is going to do this with R2.

But enough with the digression, let's get back to the core of the subject. Which are the major new things to expect from SQL Server 2008 R2?

Reporting Services is one area where we'll see significant improvements. To name just a few of them: new constructs in the expression language, new layout features, better data visualization, shared reporting components, and deeper SharePoint integration. There's also going to be a new Report Builder and a new rendering extension that enables us to consume reports as ATOM feeds (see my previous post about ATOM feeds).

Data modeling and analysis will benefit a lot from the new PowerPivot technology. Despite the fact that it’s available mainly as an Excel 2010 extension, PowerPivot is at its heart nothing more and nothing less than an in-memory multidimensional cube based on Analysis Services technology. And it has its own language (see my previous post). But the thing that will make PowerPivot a real winner is its availability in SharePoint 2010 via Excel Services, part of the SQL Server 2008 R2 SharePoint 2010 Insights.

Operations gets its fair share through the new functionalities related to Applications and Multi-Server Management. Concepts like data-tier applications, server groups, and control points are some of the most important things in this area.

Master Data Services enables you to implement master data management solutions (more about this in a future, dedicated post).

Finally, the StreamInsight technology firmly puts SQL Server on the map of Complex Event Processing platforms. The best definition for StreamInsight is one that I’ve recently heard saying that instead of throwing the query to the data, it throws the data to the query. On a more technical note it’s a technology that enables near-zero latency queries on the data while “in flight”. To make things more difficult, the data involves multiple and concurrent event sources generating data at high rates.

Well, that’s about it. It's quite easy to notice that Business Intelligence is the prevailing theme here. While all the new features are important and quite powerful, I think the ones related to BI are the most important. Since this post is already quite long, I’ll save the more detailed talk about BI in SQL Server 2008 R2 for future posts.

Tags:

SQL Server

Yet Another Way To Get Your Data

by Ciprian Jichici 13. January 2010 09:13

One of the new rendering extensions available in Reporting Services in SQL Server 2008 R2 is the one that renders reports in the ATOM data feed format. This is an excellent new and standardized way to get the raw data from a report (supposing for instance that you do not need the formatted version) when your only point of connection is the reporting service itself. The ATOM data feed can be consumed by virtually any application that can handle data feeds. What's really cool here is that the new ATOM functionality is nothing more (from an external point of view) than a rendering extension which makes it blend in a very natural way into the overall Reporting Services landscape.

One particularly interesting example of ATOM feed consumer for Reporting Services is the new PowerPivot technology available in Excel and SharePoint 2010. Obviously, it's not the only one. As I mentioned above, anything that understands data feeds can benefit from this new functionality.

So… how do you get the ATOM feed? It’s as simple as using an URL access string like:

http://<server>/ReportServer?/<path>&rs:Command=Render&rs:Format=ATOM&rc:DataFeed=<datafeedname>

Read more about getting data feeds from reports here.

Tags: ,

Microsoft Application Platform | SQL Server | Office System | PowerPivot