Post

Geospatial Structures

Three approaches to index location data for find things near me queries -- each with different trade-offs.

Geospatial Structures

Three approaches to index location data for “find things near me” queries — each with different trade-offs.

Geohash

  • Encodes lat/lng as a base-32 string — nearby locations share a common prefix
  • Range query = find all geohashes with same N-character prefix + 8 neighbors

Quadtree

  • Recursively divide space into 4 quadrants
  • Only subdivide when a quadrant has too many points (adaptive)

Comparison Table

  Geohash Quadtree R-Tree
Index type String prefix Tree Tree of rectangles
Edge handling ❌ Poor (neighbor lookup needed) ✅ Good ✅ Good
Shard by region ✅ Easy 🔶 Medium 🔶 Complex
Range queries 🔶 OK ✅ Good ✅ Best
Dynamic updates ✅ Easy ✅ OK 🔶 Rebalancing needed
Used in Uber, Redis GEO Google Maps, games PostGIS, SQLite

When to Use ✅ / Avoid ❌

✅ Geohash — simple “nearby” queries, distributed caching of location data ✅ Quadtree — adaptive density, game worlds, map rendering ✅ R-Tree — complex polygon/range queries, GIS systems ❌ Any of these — if you just need city-level lookup (a country→city hash map is enough)

Real Systems

| System | How Used | | ———————- | ——————————————- | | Uber | Driver-rider matching by geohash cell | | Yelp / Google Maps | “Near me” POI search | | Redis | GEOADD / GEORADIUS (geohash internally) | | PostGIS | R-Tree spatial indexing | | Pokémon GO | Quadtree for spawn zones |

Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
def geohash_encode(lat, lon, precision=6):
    """
    Encode latitude and longitude to geohash string.
    Uses bit interleaving: lon bit, lat bit, lon bit, lat bit, ...
    Then groups 5 bits and maps to base32 characters.

    Args:
        lat: latitude (-90 to 90)
        lon: longitude (-180 to 180)
        precision: number of characters (higher = more precise)

    Returns:
        geohash string
    """
    lat_range = [-90.0, 90.0]
    lon_range = [-180.0, 180.0]
    geohash = []
    bits = 0
    bit = 0
    ch = 0

    # Generate bits by alternating lon/lat bisection
    while len(geohash) < precision:
        # Longitude bit
        mid = (lon_range[0] + lon_range[1]) / 2
        if lon > mid:
            ch |= (1 << (4 - bit))  # Set bit in current character
            lon_range[0] = mid
        else:
            lon_range[1] = mid
        bit += 1

        # Latitude bit
        mid = (lat_range[0] + lat_range[1]) / 2
        if lat > mid:
            ch |= (1 << (4 - bit))
            lat_range[0] = mid
        else:
            lat_range[1] = mid
        bit += 1

        # After 5 bits, convert to base32 character
        if bit == 5:
            geohash.append("0123456789bcdefghjkmnpqrstuvwxyz"[ch])
            bit = 0
            ch = 0

    return "".join(geohash)

def geohash_decode(geohash):
    """
    Decode a geohash string back to lat/lon with error margins.

    Returns:
        (lat, lon, lat_err, lon_err)
    """
    lat_range = [-90.0, 90.0]
    lon_range = [-180.0, 180.0]
    is_lon = True  # Start with longitude bit

    for char in geohash:
        # Convert base32 character to 5-bit integer
        idx = "0123456789bcdefghjkmnpqrstuvwxyz".index(char)

        # Process each of the 5 bits
        for i in range(4, -1, -1):
            bit = (idx >> i) & 1

            if is_lon:
                mid = (lon_range[0] + lon_range[1]) / 2
                if bit:
                    lon_range[0] = mid
                else:
                    lon_range[1] = mid
            else:
                mid = (lat_range[0] + lat_range[1]) / 2
                if bit:
                    lat_range[0] = mid
                else:
                    lat_range[1] = mid

            is_lon = not is_lon  # Alternate between lon and lat

    lat = (lat_range[0] + lat_range[1]) / 2
    lon = (lon_range[0] + lon_range[1]) / 2
    lat_err = (lat_range[1] - lat_range[0]) / 2
    lon_err = (lon_range[1] - lon_range[0]) / 2

    return (lat, lon, lat_err, lon_err)

def get_neighbors(geohash):
    """
    Return all 8 neighboring geohash cells (N, NE, E, SE, S, SW, W, NW).
    Neighbor lookup tables based on geohash cell positioning.
    """
    neighbors = {
        'right': {
            'even': "bc01fg45238967deuvhjyznpkmstqrwx",
            'odd': "p0r21436x8zb9dcf5h7kjnmqesgutwvy"
        },
        'left': {
            'even': "238967debc01fg45kmstqrwxuvhjyznp",
            'odd': "14365h7k9dcfesgujnmqp0r2twvyx8zb"
        },
        'top': {
            'even': "p0r21436x8zb9dcf5h7kjnmqesgutwvy",
            'odd': "bc01fg45238967deuvhjyznpkmstqrwx"
        },
        'bottom': {
            'even': "14365h7k9dcfesgujnmqp0r2twvyx8zb",
            'odd': "238967debc01fg45kmstqrwxuvhjyznp"
        }
    }

    border = {
        'right': {'even': "bcfguvyz", 'odd': "prxz"},
        'left': {'even': "0145hjnp", 'odd': "028b"},
        'top': {'even': "prxz", 'odd': "bcfguvyz"},
        'bottom': {'even': "028b", 'odd': "0145hjnp"}
    }

    last_char = geohash[-1]
    parent = geohash[:-1]
    parity = 'odd' if len(geohash) % 2 else 'even'

    result = {}

    # North, South, East, West
    for direction in ['right', 'left', 'top', 'bottom']:
        if last_char in border[direction][parity]:
            result[direction] = neighbors[direction][parity][ord(last_char) - ord('0')] + parent
        else:
            result[direction] = neighbors[direction][parity][ord(last_char) - ord('0')] + parent

    # Diagonals (NE, NW, SE, SW)
    result['ne'] = get_neighbors(result['top'])['right']
    result['nw'] = get_neighbors(result['top'])['left']
    result['se'] = get_neighbors(result['bottom'])['right']
    result['sw'] = get_neighbors(result['bottom'])['left']

    return result

# Example usage:
if __name__ == "__main__":
    # Encode Eiffel Tower location
    lat, lon = 48.858, 2.294
    geohash = geohash_encode(lat, lon, precision=7)
    print(f"Geohash: {geohash}")  # e.g., "u09t..."

    # Decode back
    decoded_lat, decoded_lon, lat_err, lon_err = geohash_decode(geohash)
    print(f"Decoded: {decoded_lat:.3f}, {decoded_lon:.3f} ± ({lat_err:.3f}, {lon_err:.3f})")

Algorithm Flow

GEOHASH ENCODE (lat=48.858, lon=2.294, precision=4 chars):

PROXIMITY QUERY with geohash cells:

Uber (H3 Hexagonal Grid)

Uber moved from geohash to their open-source H3 system (hexagonal hierarchical grid). Hexagons have uniform neighbor distance (all 6 neighbors equidistant), unlike geohash rectangles that vary in shape. H3 has 16 resolutions; Uber uses resolution 7 (~5.16 km²) for surge pricing zones and resolution 9 (~0.1 km²) for driver dispatch. Each cell has a 64-bit ID encodable as a string.

Redis GEOADD / GEOSEARCH

Redis stores geo-points in a sorted set using a 52-bit geohash as the score. GEOSEARCH key FROMMEMBER point BYRADIUS 1 km ASC uses ZRANGEBYSCORE on the geohash range, then filters by actual Haversine distance. Precision: 52-bit geohash gives ~0.6m accuracy. Pure Redis, no PostGIS dependency needed.

PostGIS (R-Tree via GiST index)

PostGIS uses an R-Tree index (implemented via PostgreSQL’s GiST framework) for geometry columns. ST_DWithin(geom, ST_MakePoint(lon,lat)::geography, 1000) uses the index to find candidate geometries, then applies exact distance calculation. Handles polygons, lines, and points — not just points like geohash.

Google S2 Geometry Library

S2 projects Earth onto 6 cube faces, then uses a Hilbert space-filling curve to map 2D to 1D. Results in cells with more uniform area than geohash and better edge properties. Used internally at Google Maps, Google Earth, and by Pokémon GO for spawn point indexing. Each S2 cell has a 64-bit ID at 30 levels of precision.

Elasticsearch geo_point / geo_shape

geo_point fields use a BKD tree (k-d tree variant) for range queries and a geohash grid for aggregations (geohash_grid aggregation). geo_shape for polygon intersection uses a PrefixTree (quad-tree or geohash grid) tiled at configurable precision.

References

This post is licensed under CC BY 4.0 by the author.