Some Tips for Working With Big Data Models

08/12/2014 Print preview: Some tips for working with big data models

http://www.sqlservercentral.com/articles/model/72275/Printable 1/4

http://www.sqlservercentral.com/articles/model/72275/Printed 2014/12/08 10:01AM

Some tips for working with big data models

By MauroPichiliani, 2011/01/25

As SQL Server DBA I do a lot of consulting in different clients that have problems with theirdatabases. Among the many database scenarios that I found probably the aspect that mostly impactsmy work is to deal with large and complex database models that were custom created to fulfill datastorage requirements.

The origin of these models it not uncommon these days: as the new requirements came up, thedevelopers and other professionals make amends to existing objects, create new tables, relationships,columns, change datatypes and so on. On top of that is the fact that as the business grows, the datastored tends to grow too, which increases the complexity to perform maintenance tasks in the model,the data, the objects, and in the database in general. So, in order to be prepared to deal with large andcomplex database models, I came up with some tips over the years that prepare the ground and guideme through what I have to do to solve whatever are the database needs of the client that hired me.

The main goal of this article is to present some tips to help professionals that need to work withcomplex, big, and hard to understand database models that anyone may came across some day.Typically, these database models store thousands of gigabytes in many related objects that can be usedby hundreds of concurrent users. So having some kind of a "todo" list to deal with this sort of specialdatabases can be beneficial not only to solve immediate problems but also to prepare the ground forfuture requirements and maintenance tasks. Without further delay, these are my tips for those thathave to understand in a nutshell a big database model without being its creator:

1) Identify the biggest, the largest, and the most used tables in the model

Every big database model has at least one main table that contains a high percentage of the data storedin the database. This is a fact observed through many years of experience. By "big table" I mean atable with a large amount of data on in (i.e. rows) and when I say "large table" I mean that this objectcontains many columns with different datatypes. On most cases these big and large tables are the mostaccessed and used by the applications, so it is very helpful and recommended to know and recognizethese object because chances are that you probably will end up working with them. Have a way todifferentiate these tables from the other objects in a diagram is a plus that can save time when a quicklook in the model is needed.

2) Use a source control, versioning or change management tool

Nowadays there are many tools that offer features to track changes over time in the database model.New columns, relationships, changes in datatypes and other minor modifications like these arecommon in a big database model that is constant evolving to adapt itself to new business requirementsor change requests. The use of source control, versioning or change management tools may requirethe generation of a script containing the definition of the database objects. Alternatively, the model'sversioning control may be integrated with existing IDEs such as Microsoft Visual Studio.NET withTeam Foundation Server. Nevertheless the argument that DBAs do not need to use source controltools because they do no program or build applications are no longer acceptable.

3) Know how to print the complete or partial database model

http://www.sqlservercentral.com/Authors/Articles/MauroPichiliani/900539/



Ah, the printed database model! Decorating thousands of cubicles and walls in the IT departmentsover the world, these relics becomes ubiquitous and part of the decoration in developmentenvironments as the data model gets bigger. As I learn from the many teams that I worked with, thereis always someone that asks for a printed version of some object or even the entire database model.So, the idea here is to have a simple, quick and easy way to print a table, or a set of tables and itsrelationships. Options to generate PDF files, print the database model in separated pages (followed bythe notsoimportant and time consuming task of correcting taping hem), and use of different papercolors to identify versions and other model's metadata are among the few practices that came in handyfor a DBA when working with a team of developers.

4) Identity the most used complementary objects (stored procedures, triggers,functions, indexes)

Since the database is not composed only by tables and its relationships, it is a very pragmatic action toknow the other objects that access, control and manipulate the main tables mentioned in the first itemof this article. I found out that big data models implemented in SQL Server usually employ manystored procedures, views, functions, triggers and other objects that are as much important as the tablesthat store the data. To know the logic behind the main lines of code that compose these object cansave time while debugging, decrease the effort required for the tuning process and put the DBA in avery comfortable position when there is the need to modify the data.

5) Have a way to see the database in separated layers: with and withoutrelationships, with and without indexes, with and without constraints, etc.

The market is flooded with CASE modeling tools that can help professionals to deal with big logicalor physical database models. However, if these models could be created iteratively and incrementallyin the same way as an image is created from layers, the DBA can have a powerful way to visualizespecific details without all the complexity of a big model. For instance, imagine an ERP (EnterpriseResource Planning) database model that contains tables for the ERP system and some tables createdfor the integration with an existing intranet system. It would be very valuable if the DBA could justhide the tables from the model that came from the integration with the intranet, thus visualizing onlythe information required for a specific task that requires only the ERP tables and viceversa. This typeof model layering is a very clever way to separate and isolate parts of the model that are specific to atask in order to avoid the spread of the model's complexity.

6) Use colored rectangles to group related tables from the same subsystem

Another approach that can be employed to separate and isolate parts of the model without modifyingit is to use drawing elements. Colored rectangles are almost a standard in modeling when there is theneed to separate subsystems. This technique is easy, simple and it not only groups tables andrelationships but it also improves the readability and the documentation of the model. Of course,comments on the model are very useful when they and can describe with few words information thatmay takes hours of study digging from the direct analysis of the model.

7) Enumerate the correct order to insert, update and remove the data in specifictables in order to respect the relationships

This item is a bit tricky because the correct order to insert, update and remove the data into themodel's tables is programmed in the application or inside an object in the database such as a storedprocedure. The tricky part is that the modifications are mixed and molded by business rules and,sometimes, it is almost impossible to separate them. However, if the DBA can accomplish thisseparation and create a script that correctly insert, update or remove data in the tables in the same way



as they are made through the application he/she may increase its arsenal of scripts that can saveseveral hours during an emergency support call. Please, don't get me wrong here: I'm not suggestingthat the DBA or any other professional should modify directly the table's data and override theapplication in order to quickly solve an issued. What I'm suggesting is that the knowledge of how thetables interact with each order for a particular operation or task can be very helpful and should be inthe DBA's toolset.

8) Always have a way to search for the table's name, columns, datatypes,nullability, description and other attributes

Complex models may contain hundreds or even thousands of database objects. It's wise to have asimple way to search and obtain metadata information from it. Almost always it is the case to knowexactly which database system catalog view or table to look for when the DBA is searching forsomething. Good CASE modeling tools also provide options and features for basic and extendedsearches in the items even if they are still in the logical modeling diagram.

9) Have a script that generate the current database (all objects) with a fraction ofits size (10% is ok)

This item is more a personal experienced tip than a musthave tool. In several occasions I needed tocreate a new database environment for test/production/homologation (or for any combination of these)from an existing database model. These situations require that the all the objects must be created in adifferent server (sometimes a virtualized one) with all the complexity of the model but with only afraction of its data. The more or less 10% mark seems to be an acceptable data rate to start withbecause there must be instances of certain entities (products, customers, employees, applicationsettings, etc) in the database in order to enable the application to start. The script needed to create theall objects is trivial and can be generated automatically by a tool provided by the database but thegeneration of the correct order to insert the data may be very complex, as discussed in the item 7.

10) Keep an updated list of the permissions for the most common and useddatabase objects (in order to know very quickly what a specific user can andcannot do with the objects)

Perhaps the most common administrative task of a DBA is to manage permissions. The DCL (DataControl Language) commands and the graphic user interfaces of database administrative toolsprovides an adequate and simplified way of granting, revoking and denying permissions for object tousers. So, the DBA that work with a complex model have to be aware of the permissions employedfor the objects and the users. The separation of permissions in roles can be very helpful, but one mustknow how to quickly point out a role with a specific set of permissions for the tables of one subsystemof the model is a musthave skill that any DBA should pose. Again, the use of simpler summarizedand grouped documentation of the user's permissions for the main tables and objects can have a greatimpact in the daily administrative task especially if the database has an odd security way of organizingthe user's permissions and access of the objects.

11) Know how to predict and estimate sizes of specific objects in order to forecastdatabase 's growth or shrinkage.

Forecasting based on real data and statistical models that are based on a data distribution can be veryhelpful when there is need to create a report suggesting the allocation and/or the relocation of databaseresources that affect hardware and link costs. In my experience with consulting I found moreprofessionals acting by guess than I would like to admit when there are need to justifyallocation/reallocation of database resources. Here the tip is simple: always make your suggestions



based on statistics, metrics, real needs and demands in order to make your argument solid instead ofbased on guessing and poor predicted values without a concrete statistical base.

12) Show in the database model which objects have partitioning options, if theyare compressed and the filegroups they belong to.

SQL Server and other commercial products have many options for compression, partitioning, and toseparating internally the data that is stored inside a table. When doing maintenance task it is importantto know which tables are compressed, have partitioning enabled, how this partitioning isimplemented, and where the data is physically stored (i.e. filegroups and datafiles). However, the datamodel and the ER diagram do not have a specific notation to represent this type of information, so it isup to de DBA to figure how to describe and document these options in the ER diagram. Comments,different colors, geometric figures, or even a small table in excel style with sample data showing howthe partitioning separates the data are suggestions that can make the model richer and increase theawareness of the DBA about partitioning options just by taking a look at the diagram.

13) In OLAP (Online Analytical Processing) models centralize the fact table andhave a way to link the description of the main hierarchy, levels, members, andgrains for each dimension table.

OLAP data models, implemented either with the star, snowflake or a mix of these two schemas, tendsto be simpler that complex OLTP data models. One of the reasons for that is because there is moreplanning involved during the elaboration of the relationships and the entities than the planning inOLTP models. Also, the definitions of the dimensions, levels, hierarchies, and grains create a logicalstructure in the data that is easily visualized after an inspection of the table's details. Nevertheless, thepoor documentation of the characteristics of dimensions and its relationships with fact data makes themodel harder to understand. One simple idea to simplify the understanding of an OLAP data model isto center the fact table in the diagram in such a way that this table step up from the others and can beclearly identified at first sight. Decorating the model with plenty of grouped information about thestructure and metadata of the dimensions is also a good practice that can save a lot of time guessinghow the data is supposed to be shown to do user when browsing a cube' data.

Conclusion

After explaining these tips I point out that most of them focus on documentation of the databasemodel and its objects. Of course, the main efforts in programming are related to develop andimplement new requirements and change existing features. However, the IT community, and speciallythe DBAs, must enforce the role of documentation not only on source code or deliverable softwaresuch an application, but also on artifacts that are important for all the development cycle.

Following the tips presented in this article the readers can prepare better DBAs and otherprofessionals when they must understand and work with large and complex data models. Thisscenario is not uncommon as business grows and count with a few tips and best practices can reallyhelp when one have to deal with data in a complex and large database.

Copyright © 20022014 Simple Talk Publishing. All Rights Reserved. Privacy Policy. Terms of Use.Report Abuse.

mailto:[email protected]

http://www.sqlservercentral.com/About/Privacy/

http://www.sqlservercentral.com/About/Terms

Some Tips for Working With Big Data Models

Documents

Transcript of Some Tips for Working With Big Data Models