Geospatial Structures
Three approaches to index location data for find things near me queries -- each with different trade-offs.
Three approaches to index location data for “find things near me” queries — each with different trade-offs.
Geohash
- Encodes lat/lng as a base-32 string — nearby locations share a common prefix
- Range query = find all geohashes with same N-character prefix + 8 neighbors
Quadtree
- Recursively divide space into 4 quadrants
- Only subdivide when a quadrant has too many points (adaptive)
Comparison Table
| Geohash | Quadtree | R-Tree | |
|---|---|---|---|
| Index type | String prefix | Tree | Tree of rectangles |
| Edge handling | ❌ Poor (neighbor lookup needed) | ✅ Good | ✅ Good |
| Shard by region | ✅ Easy | 🔶 Medium | 🔶 Complex |
| Range queries | 🔶 OK | ✅ Good | ✅ Best |
| Dynamic updates | ✅ Easy | ✅ OK | 🔶 Rebalancing needed |
| Used in | Uber, Redis GEO | Google Maps, games | PostGIS, SQLite |
When to Use ✅ / Avoid ❌
✅ Geohash — simple “nearby” queries, distributed caching of location data ✅ Quadtree — adaptive density, game worlds, map rendering ✅ R-Tree — complex polygon/range queries, GIS systems ❌ Any of these — if you just need city-level lookup (a country→city hash map is enough)
Real Systems
| System | How Used |
| ———————- | ——————————————- |
| Uber | Driver-rider matching by geohash cell |
| Yelp / Google Maps | “Near me” POI search |
| Redis | GEOADD / GEORADIUS (geohash internally) |
| PostGIS | R-Tree spatial indexing |
| Pokémon GO | Quadtree for spawn zones |
Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
def geohash_encode(lat, lon, precision=6):
"""
Encode latitude and longitude to geohash string.
Uses bit interleaving: lon bit, lat bit, lon bit, lat bit, ...
Then groups 5 bits and maps to base32 characters.
Args:
lat: latitude (-90 to 90)
lon: longitude (-180 to 180)
precision: number of characters (higher = more precise)
Returns:
geohash string
"""
lat_range = [-90.0, 90.0]
lon_range = [-180.0, 180.0]
geohash = []
bits = 0
bit = 0
ch = 0
# Generate bits by alternating lon/lat bisection
while len(geohash) < precision:
# Longitude bit
mid = (lon_range[0] + lon_range[1]) / 2
if lon > mid:
ch |= (1 << (4 - bit)) # Set bit in current character
lon_range[0] = mid
else:
lon_range[1] = mid
bit += 1
# Latitude bit
mid = (lat_range[0] + lat_range[1]) / 2
if lat > mid:
ch |= (1 << (4 - bit))
lat_range[0] = mid
else:
lat_range[1] = mid
bit += 1
# After 5 bits, convert to base32 character
if bit == 5:
geohash.append("0123456789bcdefghjkmnpqrstuvwxyz"[ch])
bit = 0
ch = 0
return "".join(geohash)
def geohash_decode(geohash):
"""
Decode a geohash string back to lat/lon with error margins.
Returns:
(lat, lon, lat_err, lon_err)
"""
lat_range = [-90.0, 90.0]
lon_range = [-180.0, 180.0]
is_lon = True # Start with longitude bit
for char in geohash:
# Convert base32 character to 5-bit integer
idx = "0123456789bcdefghjkmnpqrstuvwxyz".index(char)
# Process each of the 5 bits
for i in range(4, -1, -1):
bit = (idx >> i) & 1
if is_lon:
mid = (lon_range[0] + lon_range[1]) / 2
if bit:
lon_range[0] = mid
else:
lon_range[1] = mid
else:
mid = (lat_range[0] + lat_range[1]) / 2
if bit:
lat_range[0] = mid
else:
lat_range[1] = mid
is_lon = not is_lon # Alternate between lon and lat
lat = (lat_range[0] + lat_range[1]) / 2
lon = (lon_range[0] + lon_range[1]) / 2
lat_err = (lat_range[1] - lat_range[0]) / 2
lon_err = (lon_range[1] - lon_range[0]) / 2
return (lat, lon, lat_err, lon_err)
def get_neighbors(geohash):
"""
Return all 8 neighboring geohash cells (N, NE, E, SE, S, SW, W, NW).
Neighbor lookup tables based on geohash cell positioning.
"""
neighbors = {
'right': {
'even': "bc01fg45238967deuvhjyznpkmstqrwx",
'odd': "p0r21436x8zb9dcf5h7kjnmqesgutwvy"
},
'left': {
'even': "238967debc01fg45kmstqrwxuvhjyznp",
'odd': "14365h7k9dcfesgujnmqp0r2twvyx8zb"
},
'top': {
'even': "p0r21436x8zb9dcf5h7kjnmqesgutwvy",
'odd': "bc01fg45238967deuvhjyznpkmstqrwx"
},
'bottom': {
'even': "14365h7k9dcfesgujnmqp0r2twvyx8zb",
'odd': "238967debc01fg45kmstqrwxuvhjyznp"
}
}
border = {
'right': {'even': "bcfguvyz", 'odd': "prxz"},
'left': {'even': "0145hjnp", 'odd': "028b"},
'top': {'even': "prxz", 'odd': "bcfguvyz"},
'bottom': {'even': "028b", 'odd': "0145hjnp"}
}
last_char = geohash[-1]
parent = geohash[:-1]
parity = 'odd' if len(geohash) % 2 else 'even'
result = {}
# North, South, East, West
for direction in ['right', 'left', 'top', 'bottom']:
if last_char in border[direction][parity]:
result[direction] = neighbors[direction][parity][ord(last_char) - ord('0')] + parent
else:
result[direction] = neighbors[direction][parity][ord(last_char) - ord('0')] + parent
# Diagonals (NE, NW, SE, SW)
result['ne'] = get_neighbors(result['top'])['right']
result['nw'] = get_neighbors(result['top'])['left']
result['se'] = get_neighbors(result['bottom'])['right']
result['sw'] = get_neighbors(result['bottom'])['left']
return result
# Example usage:
if __name__ == "__main__":
# Encode Eiffel Tower location
lat, lon = 48.858, 2.294
geohash = geohash_encode(lat, lon, precision=7)
print(f"Geohash: {geohash}") # e.g., "u09t..."
# Decode back
decoded_lat, decoded_lon, lat_err, lon_err = geohash_decode(geohash)
print(f"Decoded: {decoded_lat:.3f}, {decoded_lon:.3f} ± ({lat_err:.3f}, {lon_err:.3f})")
Algorithm Flow
GEOHASH ENCODE (lat=48.858, lon=2.294, precision=4 chars):
PROXIMITY QUERY with geohash cells:
Uber (H3 Hexagonal Grid)
Uber moved from geohash to their open-source H3 system (hexagonal hierarchical grid). Hexagons have uniform neighbor distance (all 6 neighbors equidistant), unlike geohash rectangles that vary in shape. H3 has 16 resolutions; Uber uses resolution 7 (~5.16 km²) for surge pricing zones and resolution 9 (~0.1 km²) for driver dispatch. Each cell has a 64-bit ID encodable as a string.
Redis GEOADD / GEOSEARCH
Redis stores geo-points in a sorted set using a 52-bit geohash as the score. GEOSEARCH key FROMMEMBER point BYRADIUS 1 km ASC uses ZRANGEBYSCORE on the geohash range, then filters by actual Haversine distance. Precision: 52-bit geohash gives ~0.6m accuracy. Pure Redis, no PostGIS dependency needed.
PostGIS (R-Tree via GiST index)
PostGIS uses an R-Tree index (implemented via PostgreSQL’s GiST framework) for geometry columns. ST_DWithin(geom, ST_MakePoint(lon,lat)::geography, 1000) uses the index to find candidate geometries, then applies exact distance calculation. Handles polygons, lines, and points — not just points like geohash.
Google S2 Geometry Library
S2 projects Earth onto 6 cube faces, then uses a Hilbert space-filling curve to map 2D to 1D. Results in cells with more uniform area than geohash and better edge properties. Used internally at Google Maps, Google Earth, and by Pokémon GO for spawn point indexing. Each S2 cell has a 64-bit ID at 30 levels of precision.
Elasticsearch geo_point / geo_shape
geo_point fields use a BKD tree (k-d tree variant) for range queries and a geohash grid for aggregations (geohash_grid aggregation). geo_shape for polygon intersection uses a PrefixTree (quad-tree or geohash grid) tiled at configurable precision.
References
- 📄 Guttman 1984 — R-Trees: A Dynamic Index Structure for Spatial Searching — the original R-Tree paper
- 🎥 ByteByteGo — Design a Location-Based Service — geohash and quadtree in system design
- 🔗 Niemeyer’s Geohash Reference Implementation — original geohash site + interactive map
- 🔗 Uber H3 Documentation — hexagonal grid system with code examples
- 🔗 Google S2 Geometry Overview — S2 cell system used in Google Maps
- 📖 Wikipedia: Geohash — encoding details and precision table