I am redesigning a customer database and one of the new pieces of information I would like to store along with the standard address fields (Street, City, etc.) is the geographic location of the address. The only use case I have in mind is to allow users to map the coordinates on Google maps when the address cannot otherwise be found, which often happens when the area is newly developed, or is in a remote/rural location.
My first inclination was to store latitude and longitude as decimal values, but then I remembered that SQL Server 2008 R2 has a
geography data type. I have absolutely no experience using
geography, and from my initial research, it looks to be overkill for my scenario.
For example, to work with latitude and longitude stored as
decimal(7,4), I can do this:
insert into Geotest(Latitude, Longitude) values (47.6475, -122.1393) select Latitude, Longitude from Geotest
geography, I would do this:
insert into Geotest(Geolocation) values (geography::Point(47.6475, -122.1393, 4326)) select Geolocation.Lat, Geolocation.Long from Geotest
Although it's not that much more complicated, why add complexity if I don't have to?
Before I abandon the idea of using
geography, is there anything I should consider? Would it be faster to search for a location using a spatial index vs. indexing the Latitude and Longitude fields? Are there advantages to using
geography that I am not aware of? Or, on the flip side, are there caveats that I should know about which would discourage me from using
@Erik Philips brought up the ability to do proximity searches with
geography, which is very cool.
On the other hand, a quick test is showing that a simple
select to get the latitude and longitude is significantly slower when using
geography (details below). , and a comment on the accepted answer to another SO question on
geography has me leery:
@SaphuA You're welcome. As a sidenote be VERY carefull of using a
spatial index on a nullable GEOGRAPHY datatype column. There are some
serious performance issue, so make that GEOGRAPHY column non-nullable
even if you have to remodel your schema. – Tomas Jun 18 at 11:18
All in all, weighing the likelihood of doing proximity searches vs. the trade-off in performance and complexity, I've decided to forgo the use of
geography in this case.
Details of the test I ran:
I created two tables, one using
geography and another using
decimal(9,6) for latitude and longitude:
CREATE TABLE [dbo].[GeographyTest] ( [RowId] [int] IDENTITY(1,1) NOT NULL, [Location] [geography] NOT NULL, CONSTRAINT [PK_GeographyTest] PRIMARY KEY CLUSTERED ( [RowId] ASC ) ) CREATE TABLE [dbo].[LatLongTest] ( [RowId] [int] IDENTITY(1,1) NOT NULL, [Latitude] [decimal](9, 6) NULL, [Longitude] [decimal](9, 6) NULL, CONSTRAINT [PK_LatLongTest] PRIMARY KEY CLUSTERED ([RowId] ASC) )
and inserted a single row using the same latitude and longitude values into each table:
insert into GeographyTest(Location) values (geography::Point(47.6475, -122.1393, 4326)) insert into LatLongTest(Latitude, Longitude) values (47.6475, -122.1393)
Finally, running the following code shows that, on my machine, selecting the latitude and longitude is approximately 5 times slower when using
declare @lat float, @long float, @d datetime2, @repCount int, @trialCount int, @geographyDuration int, @latlongDuration int, @trials int = 3, @reps int = 100000 create table #results ( GeographyDuration int, LatLongDuration int ) set @trialCount = 0 while @trialCount < @trials begin set @repCount = 0 set @d = sysdatetime() while @repCount < @reps begin select @lat = Location.Lat, @long = Location.Long from GeographyTest where RowId = 1 set @repCount = @repCount + 1 end set @geographyDuration = datediff(ms, @d, sysdatetime()) set @repCount = 0 set @d = sysdatetime() while @repCount < @reps begin select @lat = Latitude, @long = Longitude from LatLongTest where RowId = 1 set @repCount = @repCount + 1 end set @latlongDuration = datediff(ms, @d, sysdatetime()) insert into #results values(@geographyDuration, @latlongDuration) set @trialCount = @trialCount + 1 end select * from #results select avg(GeographyDuration) as AvgGeographyDuration, avg(LatLongDuration) as AvgLatLongDuration from #results drop table #results
GeographyDuration LatLongDuration ----------------- --------------- 5146 1020 5143 1016 5169 1030 AvgGeographyDuration AvgLatLongDuration -------------------- ------------------ 5152 1022
What was more surprising is that even when no rows are selected, for example selecting where
RowId = 2, which doesn't exist,
geography was still slower:
GeographyDuration LatLongDuration ----------------- --------------- 1607 948 1610 946 1607 947 AvgGeographyDuration AvgLatLongDuration -------------------- ------------------ 1608 947