Optimizing SQL Indexes
Using Indexes Properly
SQL Server can be an extremely fast and scalable database engine when it is used correctly. Using SQL Server correctly requires careful forward planning and a good understanding of the indexing system. Good indexes can increase SQL performance 10 fold. So getting index structure right is very important.
Indexes are brilliant at well, indexing large amounts of data for quick access when needed. But indexes become pretty much useless with text and large strings. Data types such as int, smallint, tinyint, char and bit are perfect types to be using in indexes, as these are well optimized and are easily searched. Where as varchars and text columns should never be included in indexes, as the text itself can be scanned by an index to get its values and row identifiers.
With this in mind, when you create your indexes, you should mainly be indexing the columns you are performing WHERE or JOIN queries on. If one or more of these columns are varchars then try and keep them as small as possible. A 50 character varchar can decrease index performance by up to 2000% and one million rows can take 30 seconds to return, where as the same varchar limited to 8 characters can return over one million rows within a second.
Version of SQL 2005 and above have a nice feature on the indexing system where you can “include” columns instead of having them on the actual index key. These are helpful as they allow you to index these columns at the leaf level, without increasing the size of the index keys and reducing performance. As a general rule, any columns you are performing WHERE queuries on, should in the key index columns (the normal ones). Any columns you are performing joins on are best suited in the included columns. This will increase the speed of WHERE queries and also speed up JOIN’s by using the included columns at the leaf level row of the index.
To illustrate this, take these example two tables and the query below which is used to retrieve data from them.
|1||Trampoline||230||Kids||A Fun thing for kids|
|2||Bike||75||Kids||Another Fun thing for kids|
|4||Sandpit||40||Kids||Sandpit for sand|
|5||Light Bulb||5||Housing||Something to light the room up|
SELECT Products.ProductName, Products.Price, Offers.OfferPrice FROM Products INNER JOIN Offers ON Offers.ProductName = Products.ProductName WHERE Products.Price < 120 AND Products.ProductCategory = 'Kids'
This is probably a query you would never use but is just for an example. Based on the query, we should do the following with our index. The index should index the columns: ProductCategory and Price. The included columns should include ProductName. Our indexed columns are indexed because we filter on these columns, so have high seek rates, in which an index is perfect for. Our included column will still technically be indexed for our join query, but is not as heavily used.
Think of an index as a ball pool. You have 100 red balls, 100 blue balls, 200 yellow balls and 100 green balls. You want to filter your query on the colour of balls. Now our table has a column called ballColour, and we have it set as a 50 character varchar. This may seem fine, but our index takes this varchar(50) column and reserves 50 characers for that column per row. You can imagine how annoying that can be for SQL to look through 50 characters for the string “green”.
Now if we change our table to optimise it, we can see how much our index can change, in both size and performance. If we change our ballColour column to a varchar(8), this should be able to hold most names of colour and 8 is a good size as there are no overheads as we are using 8 bytes. If we rebuild our index, the index will now only reserve 8 characters per row for our ballColour column. So lets do a bit of maths…
If we have 500 balls (100 red, 200 yellow, 100 blue and 100 green) and we take our ballColour column as varchar(50), that means our index was storing 500 x 50bytes of data just for our ballColour column. Thats 25,000 bytes or 24.4Kbytes. That might not sound like much, but when your database has more columns and more rows, this can quickly multiply. Now once we have changed our ballColour column to a varchar(8) we only use 500 x 8 bytes. Thats only 4,000 bytes or 3.9Kbytes. So we have exactly the same data, in 84% less space. SQL Server will love you for this, and in return will give you millions of results in seconds.
To further optimize your SQL indexes and tables. Pay close attention to integer columns. There are a number of different interger you can use, but only use what you need. If you are using a bigint, you are using 8bytes per row. But if your maximum value will fit in a smallint, you will save 6 bytes of space per row, you have more than halved your data space for that column, which in turn will increase your index performance. The smaller data footprint your table and index has, the quicker SQL will perform.
There are some instances were bigint, large varchars and text columns have to be used. Therefore the only thing you can do next, is increase the power of the server. More or better CPU’s, more RAM and better performing hard disks. Another route is optimizing your T-SQL queries themselves, as bad T-SQL writing can be the cause of many performance problems.
One useful datatype to keep in mind is the datetime2 type. A regular datetime column will use 8bytes, where as a datetime2 column will use 6 – 8 bytes. You could be saving 2 bytes per column, which as you know by now, is a big saving on large tables and indexes. If you want to go even further, and date/time accuracy is not important to you, use a smalldatetime data type which will save you another 2 – 4 bytes as this only requires 4 bytes, but is only accurate to one minute.
Also, remember the difference between varchar and char. Varchar is a variable character string, meaning it can range from 0 to the maximum value set. So even if your string is only 5 characters long and your max length is 50, you will still effectively be using 50 bytes for that column. Where as char is for a fixed string length. If you know all your strings are going to be 5 characters and no more, no less, this will then only use 5 bytes of space per column, and you will have no un-used space caused by whitespace.
Generally, a high performing SQL database is produced with good forward planning and an understanding of efficient data types. Don’t index something you don’t need. Don’t return something you don’t need. But most of all, don’t have empty space in your tables and indexes where you can prevent it.
Below are a list of data types you can use, and you can see how much space each one uses and you can see which is best for efficieny.
|Data Type||Minimum Length||Maximum Length||Data Space|
|char(n)||1 Characters||8,000 Characters||n bytes|
|varchar(n)||1 Characters||8,000 Characters||n bytes|
|varchar(max)||0 Characters||1,073,741,824 Characters||upto 1,073,741,824 bytes|
|text||0 Characters||2Gb of text data||up to 2GB|
|nchar(n)||1 Characters||4,000 Characters||n bytes|
|nvarchar(n)||1 Characters||4,000 Characters||n bytes|
|nvarchar(max)||0 Characters||536,870,912 Characters||upto 536,870,912 bytes|
|ntext||0 Characters||2GB of text data||upto 2GB|
|bit||NULL||1 or 0||1 bit|
|binary(n)||1 bytes||8,000 bytes||n bytes|
|varbinary(n)||0 bytes||8,000 bytes||n bytes|
|varbinary(max)||0 bytes||2GB||upto 2GB|
|image||0 bytes||2GB||upto 2GB|
|float(n)||24 only||53 only||4 - 8 bytes|
|time||N/A||N/A||3 - 5 bytes|