Techniques for unique, correct and fast geo-queries II
You may remember my last post on this topic ended with a question. Just to recap, we have a table with a lot of rows that have geographical coordinates and we want to find a random subset that lies in a given map window. The problem with the query demonstrated last was that there are rows with the exact same coordinates and for those coordinates the query would always return the same row (namely the one with the highest intid
).
I posted a question on Stackoverflow and Tom H. came up with a solution that was 90% there. The full query looks like this:
with cte as ( select intid, row_number() over ( partition by geoLat, geoLng order by newid() ) as row_num, count(intid) over (partition by geoLat, geoLng) as TotalCount from documents d where d.geoLat < @maxLat and d.geoLat > @minLat and ( ((@maxLng > @minLng) and (d.geoLng < @maxLng and d.geoLng > @minLng)) or ((@maxLng < @minLng) and ((d.geoLng > @minLng and d.geoLng < 180) or (d.geoLng > -180 and d.geoLng < @maxLng)) ) ) ) select top (@maxcount) dd.*, cte.intid, rand(cte.intid) from cte,documents dd where row_num = 1 + floor(rand() * TotalCount) and cte.intid = dd.intid order by newid()
The query uses Common Table Expressions, which I’ve dabbled in before. Looking at the execution plan makes my head hurt, but it’s at least as fast as the original version. See the new version in action at the TEDBot site.
Leave a Reply