Have you heard and understand why processing a ‘select * from any table’ can be negative on the performance impact? While I was doing some demo prep for my upcoming PASS Summit Pre-Conference I saw this behavior at its worst.
Before we jump into why this is such a negative impact on your performance I should tell you about my test environment. I have a table with 4.6 million records in it. The table has the following structure:
CREATE TABLE [dbo].[Crimes](
[ID] [varchar](500) NULL,
[CaseNumber] [varchar](500) NULL,
[Date] [varchar](500) NULL,
[Block] [varchar](500) NULL,
[IUCR] [varchar](500) NULL,
[PrimaryType] [varchar](500) NULL,
[Description] [varchar](1000) NULL,
[LocationDescription] [varchar](500) NULL,
[Arrest] [varchar](500) NULL,
[Domestic] [varchar](500) NULL,
[Beat] [varchar](500) NULL,
[District] [varchar](500) NULL,
[Ward] [varchar](500) NULL,
[CommunityArea] [varchar](500) NULL,
[FBICode] [varchar](500) NULL,
[XCoordinate] [varchar](500) NULL,
[YCoordinate] [varchar](500) NULL,
[Year] [varchar](500) NULL,
[UpdatedOn] [varchar](500) NULL,
[Latitude] [varchar](500) NULL,
[Longitude] [varchar](500) NULL,
[Location] [varchar](500) NULL,
[BeatInt] [int] NULL
) ON [PRIMARY]
(One thing that you may have already noticed is all the varchar columns. The reason all the data types are variable characters is due to the poor source file I was working from. Data that I would expect to be all numeric had some text in it.)
I then added an index to improve query performance.
Create Index beat on crimes (beat)
If I want to get all the crimes in a beat, I can do this in a couple ways. First I can try the Select * method. The ‘*’ indicates I want all the columns returned.
Select * from Crimes
where beat like '1533'
My test execution took 17 seconds. When I changed the query to return only the specific columns that I needed by replacing the * with the specific column names:
Select PrimaryType,Description from Crimes
where beat like '1533'
The results were returned in 11 seconds rather than the 17 seconds (36% improvement on performance). One reason for this is simply the amount of data that needs to be read from storage is trimmed down. Both queries returned approximately 27 thousand rows.
Let us consider one more test. This time we will explicitly list out each of the columns in our select clause. This is more or less doing the exact same thing as Select *.
Select
[ID] ,[CaseNumber],[Date] ,[Block] ,[IUCR] ,[PrimaryType] ,[Description],[LocationDescription] ,[Arrest] ,
[Domestic] ,[Beat] ,[District] ,[Ward] ,
[CommunityArea] ,[FBICode] ,[XCoordinate] ,[YCoordinate] ,[Year] ,
[UpdatedOn] ,[Latitude] ,[Longitude] ,[Location] ,[BeatInt]
from Crimes
where beat like '1533'
When each of the columns are listed out it takes just as long as it does if I were to do a Select *. Even though two queries are comparable, I prefer to see the columns listed out. One of the reasons has to do with future compatibility, if sometime down the road a column gets added to the table then I don’t have to worry if I am sending more data to an application front end then in needs.
The cause of the performance increase is simple, there is less data to read from the storage and less data that needs to be transferred to the client. When retrieving data from SQL Server use the query to limit your results to just the data you need and the overall performance should be better than if you were to bring back all the data.
These examples have drastic differences. It is easy to see how when you are looking at return row set of 27,000 records how performance can be improved. Other areas you will see improvement in performance by removing the ‘Select *’, (that we have not talked about in this post) include:
- Less data to be transferred over internal networks.
- Less data to load into applications
- Less data to send over the internet
This is one of many small adjustments which can make a big difference in how you’re SQL Server Performs.