OpenStreetMap and OpenLayers Tutorials

Part 2: Building Tiles with PostGIS OpenStreetMap data and Mapnik: Your Own OpenStreetMap
Part 1: Loading OpenStreetMap data into PostGIS: An Almost Idiot's Guide
Using OpenLayers: Part 1
Using OpenLayers: Part 2
Part 3: Using your own custom built OSM tiles in OpenLayers

PostGIS, pgRouting, and PostgreSQL Tutorials

OSCON 2009: Tips and Tricks for Writing PostGIS Spatial Queries
PGCon2009: PostGIS 1.4, PostgreSQL 8.4 Spatial Analysis Queries, Building Geometries, Open Jump
pgRouting: Loading OpenStreetMap with Osm2Po and route querying
Solving the Nearest Neighbor Problem in PostGIS
PostGIS Nearest Neighbor: A Generic Solution - Much Faster than Previous Solution
Part 1: Getting Started With PostGIS: An almost Idiot's Guide
Part 1: Getting Started With PostGIS: An almost Idiot's Guide (PostGIS 2.0)
Part 2: Introduction to Spatial Queries and SFSQL with PostGIS
Part 3: PostGIS Loading Data from Non-Spatial Sources
PLR Part 1: Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot's Guide
PLR Part 2: PL/R and PostGIS
PLR Part 3: PL/R and Geospatial Data Abstraction Library (GDAL) RGDAL

SpatiaLite Tutorials

Part 1: Getting Started with SpatiaLite: An almost Idiot's Guide

Miscellaneous Tutorials/Cheatsheets/Examples

OGR2OGR Cheatsheet

SQL Server 2008 Tutorials

Part 1: Getting Started With SQL Server 2008 Spatial: An almost Idiot's Guide
Part 2: Getting Started With SQL Server 2008 Spatial: Reproject data and More Spatial Queries
Part 3: Getting Started With SQL Server 2008 Spatial: Spatial Aggregates and More

UMN Mapserver Tutorials

How to Use different kinds of datasources in UMN Mapserver layers
Using Mapserver as a WMS Client.

PostGIS Code Snippets

PostGIS Concave Hull
PostGIS ST_Dump, Dump
Extent Expand Buffer Distance: PostGIS - ST_Extent, Expand, ST_Buffer, ST_Distance
PostGIS ST_GeomFromText
Intersects Intersection: PostGIS - ST_Intersects, ST_Intersection
PostGIS MakeLine ST_MakeLine Examples
PostGIS MakePoint
PointFromText, LineFromText, ST_PointFromText, ST_LineFromText OGC functions - PostGIS
PostGIS Simplify
Summary (pre 1.3.1 name), ST_Summary (+1.3.1)
PostGIS: ST_Translate

Part 2: Building Tiles with PostGIS OpenStreetMap data and Mapnik: Your Own OpenStreetMap

This is a continuation of our Loading OpenStreetMap data in PostGIS. In this tutorial we will build a tile cache of the Massachusetts data we loaded in Part 1 and then render it in OpenLayers. We will be using Mapnik toolkit to do this.

Some PostGIS gotchas before we start-- that you may or may not need to address

In PostgreSQL database -- had to add these 2 columns because it was looking for it and didn't exist in my dump:

ALTER TABLE planet_osm_polygon ADD COLUMN "addr:housename" text;
ALTER TABLE planet_osm_point ADD COLUMN "addr:housename" text;

The Mapnik scripts use the PostGIS deprecated functions Extent,AsBinary, and SetSRID to name a few which were removed in PostGIS 2.0. You will sadly need to run the legacy.sql for now to get back those functions if you are testing with PostGIS 2.0. The legacy.sql is installed in the contrib/share/postgis-2.0 folder.

UPDATE: Mapnik trunk, aka Mapnik 2.0 with planned release in Summer 2011 no longer uses deprecated PostGIS functions and has also been revised to support Python 2.5->3.2

Getting Mapnik up

Install Python 2.5 or 2.6 -- NOTE: MapNik to our knowledge does not currently work with newer versions of Python.
If you are on Windows - install the HOTOSM (if you didn't already in the Loading OpenStreetMap tutorial). If you are on another OS, refer to Mapnik installation instructions.
Note: HOTOSM users - when HOTOSM installs, it sets the PythonPath variable under Control Panel->System->Advanced -> Environment Variables to ;C:\Program Files\HOTOSM\python\2.5\site-packages. This will not work if you are running python 2.6. Change the 2.5 to 2.6. To test your install, open a command line and cd to
```
cd C:\Program Files\HOTOSM\demo\python
c:\python26\python rundemo.py
```
If all is good, you should see a bunch of .png files generated in that folder.

Building the Tiles

You may also want to take a look at http://weait.com/content/build-your-own-openstreetmap-server if some of what we say doesn't make sense to you.

Once you have Mapnik installed, you want to download the python scripts for generating OpenStreetMap tiles. which you can get from openstreetmap subversion repository http://svn.openstreetmap.org/applications/rendering/mapnik/ and then copy these into a folder say osm_mapnik.

For ease of use, we have exported these out of the subversion repository and made the latest available (as of May 18th,2011) -- at http://www.bostongis.com/downloads/osm_mapnik_r26011.zip

Note: If you have an svn client you can download the folder yourself with command svn co http://svn.openstreetmap.org/applications/rendering/mapnik/.

Next do the following at a command prompt: NOTE to windows users -- if you don't have wget or tar tools, you can manually download these files and put them in your osm_mapnik folder and extract them with 7-zip into folder world_boundaries

cd osm_mapnik #this is the folder you extracted the scripts
wget http://tile.openstreetmap.org/world_boundaries-spherical.tgz
tar xvzf world_boundaries-spherical.tgz
wget http://tile.openstreetmap.org/processed_p.tar.bz2
tar xvjf processed_p.tar.bz2 -C world_boundaries
wget http://tile.openstreetmap.org/shoreline_300.tar.bz2
tar xjf shoreline_300.tar.bz2 -C world_boundaries
wget http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/cultural/110m-admin-0-boundary-lines.zip
unzip 110m-admin-0-boundary-lines.zip -d world_boundaries
wget http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/10m-populated-places.zip
unzip 10m-populated-places.zip -d world_boundaries

Figure out extent to build for

Before we begin, we need to figure out the extent of our tiles. We do that with this SQL Statement (sadly there is data in the osm ma dataset that far exceeds the extents of Massachusetts), so I opted to use a US states table I had lying around instead of ST_Extent of any of the osm tables:

SELECT ST_Extent(the_geom) As wgs_84, ST_Extent(ST_Transform(the_geom,900913)) as web_merc
FROM tiger.states WHERE stusps = 'MA';

BOX(-73.508142 41.187053,-69.858861 42.88679) --wgs_84
BOX(-8182888.93659964 5011826.70961612,-7776652.83391808 5265667.90278576) --web_merc

Generate the osm styling xml and prepare tile directory

A little bit about the osm.xml. This file contains all the styling rules for generating the tiles and serves as a template for building your final styling .xml. It defines the icons to use, how to draw the lines and so forth. If you are not happy with the default mapnik styling and want to highlight certain features differently, you can manually edit this xml file and resave it under say osm_custom.xml before you start to build you run the generate_xml.py and use that where we are using osm.xml.

Note: If you are on windows and want to put the files in non C drive -- do a mkdir D:/osm_tiles or whatever folder you want

mkdir /osm_tiles
generate_xml.py osm.xml /osm_tiles/ma_osm.xml --dbname osm --user postgres --port 5432  --extent -8182889,5011827,-7776653,5265668 --accept-none

Test rendering

Next we edit the generate_image.py file and change the ll to -73.508142,41.187053,-69.858861,42.88679 and then at command line:

generate_image.py

Confirm that the image.png generated looks like Massachusetts. It will be very high-res around 8 MB and look something like:
Massachusetts Mapnik

Generate Massachusetts Tiles

Next resave the generate_tiles.py as generate_tiles_ma.py. Edit generate_times_ma.py and the render_tiles section way at the bottom,change to:

	bbox = (-73.508142,41.187053,-69.858861,42.88679)
    render_tiles(bbox, mapfile, tile_dir, 6, 8, "Massachusetts")

    minZoom = 10
    maxZoom = 17
    bbox = (-73.508142,41.187053,-69.858861,42.88679)
    render_tiles(bbox, mapfile, tile_dir, minZoom, maxZoom)

Windows commands

set MAPNIK_MAP_FILE=/osm_tiles/ma_osm.xml
set MAPNIK_TILE_DIR=/osm_tiles
set HOME=/osm_tiles
generate_tiles_ma.py

If you just want to generate for Boston

Resave generate_tiles.py as generate_tiles_boston.py and change the tiles section to:

	bbox = (-71.2,42.23,-70.99, 42.4)
render_tiles(bbox, mapfile, tile_dir, 10, 17, "Boston")

set MAPNIK_MAP_FILE=/osm_boston_tiles/ma_osm.xml
set MAPNIK_TILE_DIR=/osm_boston_tiles
set HOME=/osm_boston_tiles
generate_tiles_boston.py

In Part 3 of our series, we will show you how to use your own custom tiles in OpenLayers.

Part 1: Loading OpenStreetMap data into PostGIS: An Almost Idiot's Guide

For this exercise, we will download Massachusetts OSM data and then load it into our PostGIS spatially enabled PostgreSQL database. The OSM data contains roads, points of interests, building footprints, administrative boundaries, addresses, and too many other things to itemize. Note that much of the data provided in OSM for Massachusetts is provided by our very own Massachusetts Office of Geographic Information (MassGIS) as well as contributions from people like you. Now on with the show.

Loading the OSM Planet data

These instructions assume you already have PostGIS 1.5+ installed and a spatially enabled database called osm. You can call the database anything you want or use an existing PostGIS spatial database. If you don't have one, create one using our Getting Started with PostGIS: An Almost Idiot's Guide
Hstore is a key value tag column data type for PostgreSQL. It is sometimes referred to as a data type for supporting schema-less designs. It will require a bit more space to load but provides more flexibility on how you can query your OSM data and has additional information you will not find in any of the other columns. Installing hstore is optional but you will need it if you use the --hstore flag during load. Install Hstore in your PostgreSQL database. It is located in your PostgreSQL share/contrib/hstore.sql. If you are running PostgreSQL 9.1 or above, you can use the new extensions system to install by running the SQL statement:
CREATE EXTENSION hstore;
Download Massachusetts osm file from CloudMade http://downloads.cloudmade.com/americas/northern_america/united_states/massachusetts. You want to download the file called massachusetts.osm.bz2
In order to load OpenStreetMap .OSM XML files, you will need osm2pgsql which you can find out more about at http://wiki.openstreetmap.org/wiki/Osm2pgsql. There are compiled binaries available for many Linux variants, Windows, and Mac OSX.

If you are on windows, go here http://wiki.openstreetmap.org/wiki/Osm2pgsql#Windows_XP. If you plan to build map tiles with the data later, we recommend the HOTOSM package which installs osm2pgsql as well as MapNik and OSMOSIS. These will be useful for generating tiles.
If you don't see a default.style file in your package, download it from the above links. For the HOTOSM install, default.style is located in the Program Files/HOSTOSM/share folder. Copy the default.style file into the same folder as your massachusetts.osm.bz2 file.
Note: IF you plan to setup a mapping tile server with OSM data later, check out Dane Springmeyer's Mapnik tutorials: http://www.dbsgeo.com/.
If you install the windows HOTOSM package make sure to reboot your pc as requested to get all the path variables in your system. Next at the command line cd into the folder containing your data and run below to load the data:
```
osm2pgsql massachusetts.osm.bz2 -d osm -U postgres -P 5432 -S default.style --hstore
```
If you want to load additional states, use the --append option switch. So for example if I wanted to load neighboring states like New Hampshire, I would download New Hampshire and then follow with this command.
```
osm2pgsql new_hampshire.osm.bz2 --append -d osm -U postgres -P 5432 -S default.style --hstore
```

If all goes well with your install, your screen should look something like:

osm2pgsql SVN version 0.69-21289M

Using projection SRS 900913 (Spherical Mercator)
Setting up table: planet_osm_point
NOTICE:  table "planet_osm_point" does not exist, skipping
NOTICE:  table "planet_osm_point_tmp" does not exist, skipping
Setting up table: planet_osm_line
NOTICE:  table "planet_osm_line" does not exist, skipping
NOTICE:  table "planet_osm_line_tmp" does not exist, skipping
Setting up table: planet_osm_polygon
NOTICE:  table "planet_osm_polygon" does not exist, skipping
NOTICE:  table "planet_osm_polygon_tmp" does not exist, skipping
Setting up table: planet_osm_roads
NOTICE:  table "planet_osm_roads" does not exist, skipping
NOTICE:  table "planet_osm_roads_tmp" does not exist, skipping
Mid: Ram, scale=100

!! You are running this on 32bit system, so at most
!! 3GB of RAM can be used. If you encounter unexpected
!! exceptions during import, you should try running in slim
!! mode using parameter -s.

Reading in file: massachusetts.osm
Processing: Node(10082k) Way(621k) Relation(2k)
Node stats: total(10082538), max(1202366398)
Way stats: total(621446), max(104206285)
Relation stats: total(2846), max(1463423)

Writing way(621k)

Writing rel(2k)
Committing transaction for planet_osm_point
Sorting data and creating indexes for planet_osm_point
Completed planet_osm_point
Committing transaction for planet_osm_line
Sorting data and creating indexes for planet_osm_line
Completed planet_osm_line
Committing transaction for planet_osm_polygon
Sorting data and creating indexes for planet_osm_polygon
Completed planet_osm_polygon
Committing transaction for planet_osm_roads
Sorting data and creating indexes for planet_osm_roads
Completed planet_osm_roads

Spot checking the tables

If your data loaded, you should see three new tables all with a column called way that holds the PostGIS geometry and another column called tags which holds the hstore key value pairs.

The way column holds the PostGIS geometry in spherical web mercator projection or if you used the reproject switch, a different projection. NOTE that while spherical mercator is good for web mapping display, it sucks for measuring distances, area or anything that has to do with measurement. We'll talk about that later. So in your database you should see these 3 tables:

planet_osm_point: which contains points of interest such as restaurants, hospitals, schools, supermarkets and addresses
planet_osm_lines: contains roads and streets
planet_osm_polygons: contains lakes, building footprints, administrative boundaries such as towns and cities

Index your hstore column

There is some data available in Hstore that is just not available in any of the columns. Some of the more commonly used tags, you will find as columns in the data. With that said, we will index our hstore columns with these SQL commands.

CREATE INDEX idx_planet_osm_point_tags ON planet_osm_point USING gist(tags);
CREATE INDEX idx_planet_osm_polygon_tags ON planet_osm_polygon USING gist(tags);
CREATE INDEX idx_planet_osm_line_tags ON planet_osm_line USING gist(tags);

Query the data

Now for a simple query to pull all sushi places. Sadly it seems the sushi offering is not very complete:

SELECT name, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext -- tags 
FROM  planet_osm_point
WHERE tags @> 'cuisine=>sushi'::hstore;

-- Result --
    name     |         pt_lonlattext
-------------+-------------------------------
 Moshi Moshi | POINT(-72.6285103 42.3202165)
 Mr Sushi    | POINT(-71.1553199 42.4162195)

Pull all the kinds of amenities and their sources:

This you can write one of two ways. Using the hstore tag (second query) is faster since its indexed.

SELECT DISTINCT amenity, tags->'source_url' As source
FROM planet_osm_point
WHERE amenity > ''
ORDER BY amenity;

-- about twice as fast for my MA dataset -- The ? 'amenity' is an indexable hstore operation that asks if the hstore tags has a key called 'amenity'

SELECT DISTINCT tags->'amenity' As amenity, tags->'source_url' As source
FROM planet_osm_point
WHERE tags ? 'amenity'
ORDER BY tags->'amenity';

          amenity           |                               source
----------------------------+---------------------------------------------------------------------
:
bus_station                 |
cafe                        |
campsite                    |
:                           |
cinema                      |
City Hall                   | http://mass.gov/mgis/townhalls.htm
Clinic                      |
college                     |   
:
library                     | http://mass.gov/mgis/libraries.htm
:

Using OpenLayers: Part 1

What is Open Layers?

Open Layers is an opensource client-side JavaScript/AJAX framework for overlaying various mapping services. It supports various mapping apis such as Google, Yahoo, Microsoft Virtual Earth, OGC WMS, OGC WFS, KaMap, Text layers, and Markers to name a few. The nice thing about it being a pure client-side implementation is that you can drive it with any server language such as ASP.NET, PHP, PERL and for simple maps, embed directly into a plain html file. There is minimal requirement from the webserver if you are using publicly available or subscription layers.

In the next couple of sections, we'll test drive OpenLayers by overlaying various layers from Boston.

Downloading the classes and setting up the Base page

We will be using the 2.2 API which you can download from http://openlayers.org/download/OpenLayers-2.2.zip

Download the file from the above URL.
Extract the folder copy out the build/OpenLayers.js and the img and theme folders into a new folder called ol22
Create a blank file called example1.htm file which sits in the same directory as you ol22 directory so you will have ol22, example1.htm

Copy the contents of http://www.bostongis.com/demos/OpenLayers/example1.htm to your blank file.

Dissecting the Pieces of the Application

In the above example which you can see from the link above, we have drawn a Microsoft Virtual Earth Map layer, and 3 WMS layers - Boston Neighborhood Boundary, Boston Mainstreets Boundary, and Boston Water Taxi stops. Two of our WMS layers come from a UMN Mapserver dishing out two layers at once and our Water Taxi layer is being dished out by Mass GIS Geoserver - http://lyceum.massgis.state.ma.us/wiki/doku.php?id=history:home .

Adding the necessary script references

First for any 3rd party mapping services you will be using, you need to include the libraries in addition to the OpenLayers library file and these should be included before your OpenLayers include. The code to do that is on line 11 and 12 of the htm file and look like this.

11:  <script src='http://dev.virtualearth.net/mapcontrol/v3/mapcontrol.js'></script>
12:  <script src="ol22/OpenLayers.js"></script>

Alternatively, you can use the latest release OpenLayers.js directly from the OpenLayers site by replace line 12 with

12:  <script src="http://openlayers.org/api/OpenLayers.js"></script>

Creating the Open Layers map Object and adding layers

Next we create a blank div with id=map that has the dimensions we want and position that where we want on the page. Our map will live there.

16:    <div id="map" style="width: 400px; height: 400px"></div>

Next we write the javascript code to create the map and load into our div and add the layers

18:    var map = new OpenLayers.Map(document.getElementById("map"));
19:    var dndwmsurl =  "http://dndmaps.cityofboston.gov/mapserv/scripts/mapserv410/mapserv410.exe?map=\\mapserv\\dndwms\\dndbasepg.map&"
20:
21:    map.addLayer(new OpenLayers.Layer.VirtualEarth("VE"));
22:    
23:     /**Note we don't have to specify an SRS, Service or Request for WMS layers below. 
24:        OpenLayer will ask for projection based on base our base layer, EPSG:4326, Service: WMS, Request: GetMap.  
25:        We chose image/gif because IE6 and below doesn't natively support transparency for png without a hack.  **/
26:    wmstaxi = new OpenLayers.Layer.WMS("MASSGIS Boston Taxi Stops", "http://giswebservices.massgis.state.ma.us/geoserver/wms",
27:            {layers: "massgis:GISDATA.WATERTAXISTOPS_PT", transparent: "true", format: "image/gif"},
28:                {tileSize: new OpenLayers.Size(400,400), buffer: 1 }) 
29:    map.addLayer(wmstaxi)
30:        
31:    var wmsbos = new OpenLayers.Layer.WMS("Boston Neigborhoods and Mainstreets",dndwmsurl,
32:                {layers: "neighborhoods,mainstreets", transparent:"true",format: "image/gif"},
33:                    {tileSize: new OpenLayers.Size(400,400), buffer: 1 });
34:                                
35:    map.addLayer(wmsbos);

A couple of notes about the above code that may not be entirely obvious

In the UMN MapServer WMS we have extra slashes for the map file because our map location has \. E.g its \\dndwms instead of just \. This is to escape out the \. Future versions of Open Layers will do this automatically.
For the WMS layers we specified a tileSize and buffer. This is to prevent Open Layers from slicing up the tiles on the server too granulary - otherwise it will ask for a lot more tiles per call. The layers we are overlaying are fairly light so don't need many calls.

Centering the map and adding the layer switcher

The layer switcher is the little + sign on the right side that shows you the layers active and allows you to deactivate a layer or reactivate a layer.


37:    var boston = new OpenLayers.LonLat(-71.0891380310059,42.3123226165771);
38:    map.setCenter(boston, 10);
39:    map.addControl(new OpenLayers.Control.LayerSwitcher());

download

Using OpenLayers: Part 2

In this tutorial we will just show some example snippets of using OpenLayers that we have found most useful. We will be assuming Open Layers 2.4 and above.

Initializing Layers Off

In our previous example, you will notice that all the layers we have added are turned on by default. In order to turn the layers off, we can pass in the additional parameter of visibility. Like the below line 28. Note that visibility applies to all classes of layers, not just WMS layers.


26:    wmstaxi = new OpenLayers.Layer.WMS("MASSGIS Boston Taxi Stops", "http://giswebservices.massgis.state.ma.us/geoserver/wms",
27:            {layers: "massgis:GISDATA.WATERTAXISTOPS_PT", transparent: "true", format: "image/gif"},
28:                {tileSize: new OpenLayers.Size(400,400), buffer: 1,  visibility: false })

The above effect can be seen http://www.bostongis.com/demos/OpenLayers/example2.htm.

If you want to turn off a layer after initialization, then you would use the setVisibility method of a layer. For example - mylayer.setVisibility(false). This would be useful say if you did not want to use the built-in layer switcher, but instead for example have sets of layers display and not display based on others.

Forcing a layer to be a Base Layer

Within an OpenLayers map, you can have only one layer selected as Base. This layer will control the projection of other layers among other things. In the layer switcher, you will see base layer options marked at the top in a category section called Base Layer. Each option will have a radio button instead of a checkbox to prevent from selecting more than one. By default the first Base Layer added to the map becomes the active base layer.

Certain layers have to be base layers because they are not transparent and there are other things you can't control about them like projection and so forth. For example commercial layers such as Google, Yahoo, and Virtual Earth are just assumed to be base layers so you don't have to explicitly state them to be and actually can't force them not to be.

If you have a WMS layer for example, it can be used as a base layer or an overlay layer. If you don't explicitly state it is a base layer, it will be assumed to be an overlay layer. So how do you force baseness you ask, simple by setting the isBaseLayer property of the layer as shown below.


            map.addLayer(new OpenLayers.Layer.WMS.Untiled("MassGIS Boundary", "http://giswebservices.massgis.state.ma.us/geoserver/wms", 
{'layers': "massgis:GISDATA.BOUNDARY_ARC",'transparent':"true",'format': "image/png"}, {'isBaseLayer':true}));

To Tile or Not to Tile

There are currently two kinds of layers in which you have the option of tiling and not tiling. These are WMS layers and Mapserver layers. Each of these are exposed as separate layer classes WMS, WMS.Untiled, Mapserver, Mapserver.Untiled. In what circumstances would you want to tile and other circumstances where you would not want to tile. To first answer this question, its useful to think about what happens when you tile.

When maps are tiled, the map viewport is broken out into several quandrants - which tries to approximate what you have set for TileSize, but often falls short. So for each load of the screen, multiple requests sometimes in the order of 60 per layer paint is requested. You can imagine this can be a considerable strain on a weak server and especially silly when what you are asking for is a light boundary file of say Boston. In the untiled case, only one request is made per layer, but the request is usually on the order of 4 times larger than the view port to allow for smoother panning effects at the cost of a little slower load time. Outlined below are the cases where we've found tiling or untiling useful. Some are a matter of taste and some are a matter of practicality.

Use Untiled in the following scenarios

Fairly light-weight (in file size) that span huge areas
Processor constrained WMS server
If you find it annoying that half of your map paints while the other half is loading
High band-width server and high band-width clients
Your map images return embedded scales.

Use tiled during the following scenarios

Fairly heavy geometries (in file size) that span small areas
Super fast WMS server or server with tile caching built in
Low band-width clients
Relatively long pauses of a completely blank map area that suddenly loads all at once annoys you.

An example of an untiled layer is shown in our visibility discussion. An example of an untiled is shown in our Base Layer discussion.

Part 3: Using your own custom built OSM tiles in OpenLayers

In this third Part of our OpenStreetMap series we will demonstrate how to use the osm tiles we built in Part 2: Building Tiles with PostGIS OpenStreetMap data.

Creating your own Tile Layer Class

This first part is optional: Note that when you built the tiles, as you got deeper and deeper levels, therw were a lot more tiles and even worse, a lot of those damn tiles were empty. I personally hate carrying around thousands of files of nothingness. It takes a long time to upload nothing. So what I do first is delete those nothing tiles which there seem to be more of than real tiles before I upload them to my server. If you are on Windows, the easiest way to do that is with a command line VBscript. Here is a little script to get rid of nothing. Before you do this, set aside one blank tile to use when you get a 404 request. This will only work on Windows. I haven't figured out the complimentary command for Linux yet. Simply save the contents of this code to a file called delete_osm_blank_tiles.vbs and edit the last line to point to your osm maptiles cache. Then run the script at a command line with cscript delete_osm_blank_tiles.vbs.

Sub RecursiveDeleteBlankOSMImages(ByVal strDirectory)
  Dim objFolder, objSubFolder, objFile
  Dim strExt
  Dim strExtensionsToDelete
  Dim lngSize
  lngSize = 104 ' The threshold in bytes (this is 104bytes - anything less seems to be a blank image)
  strExtensionsToDelete = "png"
  Set objFolder = objFSO.GetFolder(strDirectory)
  For Each objFile in objFolder.Files   
    For Each strExt in Split(Ucase(strExtensionsToDelete),",")
      If Right(Ucase(objFile.Path), Len(strExt)+1) = "." & strExt AND objFile.Size < lngSize Then
          wscript.echo "Deleting:" & objFile.Path
          objFile.Delete
      End If
    Next
  Next 
  For Each objSubFolder in objFolder.SubFolders
    RecursiveDeleteBlankOSMImages objSubFolder.Path
  Next
End Sub


Dim objFSO
Set objFSO = CreateObject("Scripting.FileSystemObject")
RecursiveDeleteBlankOSMImages "C:\osm_matiles\"

Copy the tiles to your web server or one of those cloud storage web accessible file distributions like S3.

Next create a new layer class and save it in it's own JS file. Replace http://localhost/osm_matiles, http://localhost/images/404.png with your own server details. NOTE: We have an array of 3 tile urls. For our case, this is pretty useless, but for load balancing purposes and to also get rid of javascripts limitations of calling a server x amount of times in any period, you may want to setup more than one tile server or mask your single tile server with multiple dns names (to fool javscript). So my OpenStreetMapLocal.js file looks something like below and I place it in the same folder as my mapping application:

OpenLayers.Util.OSMLocal = {};

/**
 * Constant: MISSING_TILE_URL
 * {String} URL of image to display for missing tiles
 */
OpenLayers.Util.OSMLocal.MISSING_TILE_URL = "http://localhost/images/404.png";
/**
 * Function: onImageLoadError
 */
OpenLayers.Util.onImageLoadError = function() {
    this.src = OpenLayers.Util.OSMLocal.MISSING_TILE_URL;
};

/**
 * Class: OpenLayers.Layer.OSM.LocalMassachusettsMapnik
 *
 * Inherits from:
 *  - <OpenLayers.Layer.OSM>
 */
OpenLayers.Layer.OSM.LocalMassachusettsMapnik = OpenLayers.Class(OpenLayers.Layer.OSM, {
    /**
     * Constructor: OpenLayers.Layer.OSM.LocalMassachuettsMapnik
     *
     * Parameters:
     * name - {String}
     * options - {Object} Hashtable of extra options to tag onto the layer
     */
    initialize: function(name, options) {
        var url = [
            "http://localhost/osm_matiles/${z}/${x}/${y}.png",
            "http://localhost/osm_matiles/${z}/${x}/${y}.png",
            "http://localhost/osm_matiles/${z}/${x}/${y}.png"
        ];
        options = OpenLayers.Util.extend({
            numZoomLevels: 19,
            buffer: 0,
            transitionEffect: "resize"
        }, options);
        var newArguments = [name, url, options];
        OpenLayers.Layer.OSM.prototype.initialize.apply(this, newArguments);
    },

    CLASS_NAME: "OpenLayers.Layer.OSM.LocalMassachusettsMapnik"
});

Using your custom tile layer class

Now you link in this file as you would OpenStreetMap file and use this new class as you would any other OpenStreetMap OpenLayer layer class and you can even create ones with different themes if you want by creating a different set of tiles and theming them differently. That's it folks. Here is a complete version just containing Massachusetts and neighboring State tiles with a complimentary Town boundary layer dished out by our local MassGIS government agency.

<html><title>OpenStreetMap with Local Tile Cache</title>
<script src="http://www.openlayers.org/api/OpenLayers.js"></script>
<script src="js/OpenStreetMapLocal.js"></script>
<script>
    var lat= 42.06651;  //POINT(-71.7169041387889 42.0665181226572)
    var lon=-71.71690;
    var zoom=5;
    var map; //complex object of type OpenLayers.Map
    var ls = new OpenLayers.Control.LayerSwitcher();
    var mgisurls = ["http://giswebservices.massgis.state.ma.us/geoserver/wms"];
    var prjwgs84 = new OpenLayers.Projection("EPSG:4326");

    function init() {
        //var boundsms = new OpenLayers.Bounds(33869, 772032, 335919, 959719) //EPSG:26986
        var boundswm = new OpenLayers.Bounds(-8182919, 5035362.5,-7784059.5, 5304535) //EPSG:900913 Web Mercator (native projection of tiles)
        
        map = new OpenLayers.Map ("map", {
                controls:[ new OpenLayers.Control.Navigation({'mouseWheelOptions': {interval: 100}}),
                new OpenLayers.Control.PanZoomBar(),ls ,
                  new OpenLayers.Control.Attribution(), 
                  new OpenLayers.Control.MousePosition()
               ],
            maxExtent: boundswm,
            maxResolution: 700,
            numZoomLevels: 18,
            units: 'm',
            projection: new OpenLayers.Projection("EPSG:900913"),
            displayProjection: new OpenLayers.Projection("EPSG:4326")
        } );

        var lyrMapnik = new OpenLayers.Layer.OSM.LocalMassachusettsMapnik("OSM Mapnik Massachusetts", {'isBaseLayer': true});   
        var lyrTowns = new OpenLayers.Layer.WMS("Towns",  mgisurls, 
            { 'layers': "massgis:GISDATA.TOWNS_POLYM", 
                'styles': "",
                'transparent': "true", 'FORMAT': "image/png"}, 
                { 'isBaseLayer': false, 'attribution': '<br /><a href="http://www.mass.gov/mgis/">MASSGIS (MA ITD)</a>',
                'visibility': false, 'buffer': 1, 'singleTile':false, 'tileSize': new OpenLayers.Size(240,150) })
        map.addLayers([lyrMapnik, lyrTowns]);
        
        if( ! map.getCenter() ){
            var lonLat = new OpenLayers.LonLat(lon, lat).transform(new OpenLayers.Projection("EPSG:4326"), map.getProjectionObject());
            map.zoomToExtent(boundswm);
            ls.maximizeControl();
        }
    }
</script>
<body>
<div style="width:880; height:550" id="map"></div>
<script type="text/javascript" defer="true">
    init();
</script>
</body>
</html>

To see it in action go to http://www.bostongis.com/demos/OpenLayers/mapnik_localosm.htm

OSCON 2009: Tips and Tricks for Writing PostGIS Spatial Queries

We gave a presentation on writing PostGIS spatial queries at OSCon 2009 last week. Slides will be available on the site, but you can download them here as well.

Some of the sample data were were using you can grab here

PostGIS 1.4 is finally out as well, and hopefully should be hitting your favorite distros soon if it hasn't already. For windows users, we have compiled binaries http://postgis.refractions.net/download/windows/ for 8.2,8.3, and 8.4 and hope to have the stack builder installs sometime next week.

PGCon2009: PostGIS 1.4, PostgreSQL 8.4 Spatial Analysis Queries, Building Geometries, Open Jump

We gave a lecture at PGCon 2009. Below are the links and the generated data set. Some of the examples use features in PostgreSQL 8.4 and PostGIS 1.4, but most should be applicable to earlier versions. The data generation part does use PostgreSQL 8.4 WITH RECURSIVE construct, but not in a terribly useful way.

Lots of great presentations at PGCon2009 -- the slides should be up shortly for all the presentations.

PG Con 2009 PostGIS Spatial Analysis Lecture

For display we use OpenJump. Flash Movie of OpenJump rendering the autogenerated town

Windows users: There are PostGIS 1.4 Windows binaries currently stack builder installer will be available shortly hhttp://postgis.refractions.net/download/windows/. Please note nothing has changed in the binaries between 1.4RC2 and 1.4 official except for the version number stamped on them when you look at SELECT postgis_full_version()

If you want to play with the generated data set - do the following

Create a PostGIS enabled database called pgcon2009 -- preferably 1.4 version and PostgreSQL 8.4
Download this file -- pgcon2009_assets.zip
Unzip and load the file using psql or pgAdmin III
Run the following line in your pgcon2009 database from psql or pgAdmin III ALTER DATABASE pgcon2009 SET search_path=assets, public;

pgRouting: Loading OpenStreetMap with Osm2Po and route querying

For this exercise we are going to demonstrate how to use OSM2PO and an OSM pbf to create a routing network. Much of this is borrowed from Anita Graser's tutorial OSM2PO QuickStart

For our exercise, we love using the Teczno Metro Extracts. These are great and have been recently updated April 2013 as of this writing. They are separate OSM feeds for major world cities, and are packaged in various formats Standard: osm xml, osm pbf (compressed OSM format), and shape files. We'll be using Boston for this tutorial, but you can swap Boston with your favorite metro destination.

Windows specific instructions

Install PostGIS 2.0 either via Stackbuilder or copy the respective PostGIS 2.0/2.1 binaries using one of winnie's build bot builds
From Winnie grab the pgRouting 32-bit or 64-bit based (currently 1.07dev) binaries on which version of PostgreSQL 9.2 you are using:
Winnie Build Bot Builds extract and copy the files to your ..PostgreSQL/9.2 install folder.

Install PostGIS and pgRouting

These instructions are the same for all OS if your PostGIS build was compiled with raster support (don't get extension feature without raster, sorry) ad using latest development version of pgRouting (1.07+) that now includes CREATE EXTENSION support.

In an existing database or new db run

CREATE EXTENSION postgis;
CREATE EXTENSION pgrouting;

Using OSM2PO to prepare OSM data for pgRouting use

OSM2PO is both a routing engine and a OSM 2 pgRouting loader written in Java. So it's crossplatform assuming you have a Java runtime loaded on your computer

Download osm2po-4.7.7.zip (full version and details is http://osm2po.de
extract the zip (it already comes with precompiled JARS -- so no compile required and should work on all OS with Java runtime)
Copy the demo.bat (or demo.sh if on Unix/Linux) and rename it boston.bat or boston.sh
edit it replacing the .pbf url with: http://osm-extracted-metros.s3.amazonaws.com/boston.osm.pbf
(You can use a different metro by looking at Teczno Metro Extracts options and copying the respective link).
So your final batch script should look like: java -Xmx512m -jar osm2po-core-4.7.7-signed.jar prefix=hh tileSize=x http://osm-extracted-metros.s3.amazonaws.com/boston.osm.pbf
If Java is not in your OS path, you will have to use the full path instead of just java. E.g. on windows would be something like (C:\Program Files (x86)\Java\jre7\bin\java)
Osm2Po comes with its own mini-webserver with routing engine that reads from the pbf for you to spot check things. You can get to it by opening your browser and pasting in: http://localhost:8888/Osm2poService. Case sensitive. It will look something like this:
You should now have an SQL file in the hh folder called: hh_2po_4pgr.sql

Load this file up with PSQL. Note this file will be generally too big to use pgAdmin, so you need to use PSQL. If you are on windows you can create a batch script with something like this:
```
set PSQL="C:\Program Files\PostgreSQL\9.2\bin\psql"
set PGPORT=5432
set PGHOST=localhost
set PGPASSWORD=something
cd hh
%PSQL% -U postgres -d mydb -q -f "hh_2po_4pgr.sql"
pause
```
and then execute the batch script by Right Click -> Run As Administrator

For Linux you can set the settings much the same except use ${PSQL} instead of %PSQL% and get rid of the set So Unix/Linux sh would look something like below. If psql is not in your path, you might have to give full path to psql executable.
```
PSQL="psql"
PGPORT=5432
PGHOST=localhost
PGPASSWORD=something
cd hh
${PSQL} -U postgres -d mydb -q -f "hh_2po_4pgr.sql"
pause
```

Directions

Create a view to get in structure wanted by astar_sp_delta.

CREATE OR REPLACE VIEW vw_boston_routing
AS 
SELECT id As gid, osm_id, osm_name, osm_meta, osm_source_id, osm_target_id, 
       clazz, flags, source, target, km, kmh, cost, cost as length, reverse_cost, x1, 
       y1, x2, y2, geom_way As the_geom
  FROM hh_2po_4pgr;

Okay now for the fun part, putting pgRouting to work.

Let's compute the shortest path using the source: 131904 and Target: 17500 we had pictured. The function that takes advantage of spatial index is called astar_sp_delta. The alternative one shortest_path_astar (took 1 second) takes an SQL statement, but for this example is about 4 times slower since it doesn't utilize geometry.

The 0.1 is in units of the spatial reference (in this case degrees) and is bounding box expansion to expand the bounding box that includes both source and target vertex).

The astar_sp_delta function will return a table consisting of columns (id, gid, the_geom) which you can then join back with your input data to get additional elements. The below queries took about 250ms on my 32-bit PostgreSQL 9.2 on windows 7.

If you want to map the route, you'll need the geometry. As shown in this query

SELECT r.id, s.gid, s.osm_name,s.cost, s.km, s.kmh , r.the_geom
  FROM astar_sp_delta('vw_boston_routing',
               131904, 17500, 0.1) As r INNER JOIN 
        vw_boston_routing AS s ON r.gid =  s.gid ;

If you just want the edges output and don't need to render it, leave out the geometry column.

SELECT r.id, s.osm_name
  FROM astar_sp_delta('vw_boston_routing',
               131904, 17500, 0.1) As r INNER JOIN 
        vw_boston_routing AS s ON r.gid =  s.gid ;

id |            osm_name
---+---------------------------
  1 | Kent Street
  2 | Kent Street
  3 | Beacon Street
 :
 19 | Cambridge Street
 20 | Cambridge Street
 :
 68 | Merrimac Street
 :
 83 | John F. Fitzgerald Surface Road
 84 | State Street
(84 rows)

Solving the Nearest Neighbor Problem in PostGIS

A common problem encountered in GIS is the Nearest Neighbor problem. In a nutshell the problem is to find the x number of nearest neighbors given a geometry and n geometries of data.

The nearest neighbor crops up in other disciplines as well, except in other disciplines the units of measurement are not spatial distance but instead some sort of matrix distance. For this particular talk, I will focus on the GIS nature of the problem and specifically solving it in PostGIS.

The easiest and most naive way to do this is to do an exhaustive search of the whole space comparing the distances against the geometry of interest and sorting them. This solution while intuitive and easy to write is the slowest way of doing this.

If you were to write such a simple solution in PostgreSQL, then it would take something of the form shown below. The below gives you the nearest 5 neighbors from a reference row with gid = 1.


SELECT g1.gid As gref_gid, g1.description As gref_description, 
g2.gid As gnn_gid, g2.description As gnn_description 
    FROM sometable As g1, sometable As g2   
WHERE g1.gid = 1 and g1.gid <> g2.gid  
ORDER BY ST_Distance(g1.the_geom,g2.the_geom) 
LIMIT 5

Now for moderate to large datasets, this exercise becomes painfully slow because distance is not an indexable function because it involves relations between two entities.

If you knew something about your data like there are no neighbors past 300 units away from your geometry of interest or that at that point you could care less what lies past 300 units, then you could significantly improve the speed of this query by writing it like this:


 SELECT g1.gid As gref_gid, g1.description As gref_description, g2.gid As gnn_gid, 
        g2.description As gnn_description  
    FROM sometable As g1, sometable As g2   
    WHERE g1.gid = 1 and g1.gid <> g2.gid AND ST_DWithin(g1.the_geom, g2.the_geom, 300)   
    ORDER BY ST_Distance(g1.the_geom,g2.the_geom) 
    LIMIT 5

If you needed to get the nearest neighbor for all records in a table, but you only need the first nearest neighbor for each, then you can use PostgreSQL's distinctive DISTINCT ON syntax. Which would look something like this :


 SELECT DISTINCT ON(g1.gid)  g1.gid As gref_gid, g1.description As gref_description, g2.gid As gnn_gid, 
        g2.description As gnn_description  
    FROM sometable As g1, sometable As g2   
    WHERE g1.gid <> g2.gid AND ST_DWithin(g1.the_geom, g2.the_geom, 300)   
    ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom)

If you needed to run this for more than one record in sometable, but wanted a cross tab like query that lists the n nearest neighbors in one row with neighbors listed as n1, n2, n3 etc, then you can take advantage of PostgreSQL unique array offerings which we detail in SQL Idiom Design Patterns. The below code is excerpted from that article.


	SELECT m.building_name, m.nn_precincts[1] As nn_precinct1, 
		m.nn_precincts[2] As nn_precinct2, 
		m.nn_precincts[3] As nn_precinct3
		FROM 
		(SELECT b.building_name, b.the_geom, 
			ARRAY(SELECT p.precint_name
					FROM police_precints p
		WHERE ST_DWithin(b.the_geom, p.the_geom, 5000) 
		ORDER BY ST_Distance(b.the_geom, p.the_geom) 
		LIMIT 3) As nn_precincts 
FROM buildings b  ) m
ORDER BY m.builidng_name;

If you needed to run this for more than one record in sometable, then you can either put this in a loop and run it for each record of interest in sometable. Alternatively you could create an SQL function that returns a table for each record and use it like shown below.

CREATE OR REPLACE FUNCTION fnsometable_nn(gid integer, geom1 geometry, distguess double precision, n
umnn integer)  
    RETURNS SETOF sometable AS
    $BODY$
    SELECT g2.*  
        FROM sometable As g2   
            WHERE $1 <> g2.gid AND expand($2, $3) && g2.the_geom  
        ORDER BY distance(g2.the_geom, $2) LIMIT $4  
    $BODY$ LANGUAGE 'sql' VOLATILE;

To use the function to run for the first 100 records in sometable and return the 5 closest neighbors for each of those records, you would do this


SELECT g1.gid, g1.description, (fnsometable_nn(g1.gid, g1.the_geom, 300, 5)).gid As nn_gid, 
    (fnsometable_nn(g1.gid, g1.the_geom, 300, 5)).description as nn_description, 
    distance((fnsometable_nn(g1.gid, g1.the_geom, 300, 5)).the_geom,g1.the_geom) as dist
    FROM (SELECT * FROM sometable ORDER BY gid LIMIT 100) g1

A lot of people are probably thinking "HUH" at this point. I will now wave my hand and leave it as an exercise for people to figure out why this works. I will say this much. I am using a feature in PostgreSQL which from my readings is an accident of design and only works with functions written in SQL and C language. This accident will probably be later corrected in future versions so don't expect this to work moving forward.

Returning tables in the select part of an sql statement is super non-standard and should I say bizarre and unelegant, but is the only way to do it currently in PostgreSQL where the you have a set returning function whose inputs depend on values from a relating table. This is because PostgreSQL currently does not support subselects and set returning functions in the FROM clause that dynamically change based on inputs from other table fields. In fact most databases do not support this.

Unfortunately PostgreSQL doesn't support a predicate similar to the CROSS APPLY predicate that SQL Server 2005 has. I think Oracle does by pretty much just leaving out the CROSS APPLY predicate. CROSS APPLY is more elegant looking I think in general and also more self-documenting in the sense that when you see the clause - you know what is being done.

SQL Server which I use very often - introduced this feature in the SQL Server 2005 offering and in a few cases its a life-saver.

I think the next next release of PostgreSQL will probably support a feature similar to this. Just to demonstrate how powerful this feature can be, if we had the cross apply feature, we could do away with our intermediate set returning function and simply write our query like this.


SELECT g1.gid, g1.description, nn.gid As nn_gid, nn.description As nn_description, distance(nn.the_g
eom,g1.the_geom) as dist
FROM (SELECT * FROM sometable ORDER BY gid LIMIT 100) g1 CROSS APPLY 
    (SELECT g2.*  
        FROM sometable As g2 
        WHERE g1.gid <> g2.gid AND expand(g1.the_geom, 300) && g2.the_geom)  
ORDER BY distance(g2.the_geom, g1.the_geom) LIMIT 5) As nn

This would alleviate the need to create a new nn function for each table or kind of query we are formulating.

Its always interesting to see how different databases tackle the same issues. Oracle Locator and Oracle Spatial for example does have an SDO_NN and SDO_NN_DISTANCE functions which magically does all this for you, but I haven't used it so not sure of its speed or if it has any caveats. One I did notice from reading is that it doesn't deal with where clauses too well.

PostGIS Nearest Neighbor: A Generic Solution - Much Faster than Previous Solution

A generic solution to PostGIS nearest neighbor

After some heavy brainstorming, I have come up with a faster and more generic solution to calculating nearest neighbors than my previous solutions. For the gory details on how I arrived at this solution - please check out the following blog entries Boston GIS Blog entries on Nearest Neighbor Search.

In the sections that follow I will briefly describe the key parts of the solution, the strengths of the solution, the caveats of the solution, and some general use cases.

The parts of the solution

The solution is composed of a new PostgreSQL type, and 2 PostgreSQL functions. It involves a sequence of expand boxes that fan out from the bounding box of a geometry to as far out as you want. Below are the descriptions of each part.

pgis_nn - a new type that defines a neighbor solution - for a given near neighbor solution it gives you, the primary key of the neighbor and the distance it is away from the reference object
expandoverlap_metric - this is a helper function that returns the first fanned out box that an object is found in relative to the reference geometry. It returns an integer where the integer varies from 0 (in bounding box) to maxslices where maxslices is how many slices you want to dice up your larger expand box.
_pgis_fn_nn - this is the workhorse function that should never be directly called - it is a pgsql function that returns for each geometry k near neighbors
pgis_fn_nn(geom1 geometry, distguess double precision, numnn integer, maxslices integer, lookupset varchar(150), swhere varchar(5000), sgid2field varchar(100), sgeom2field varchar(100))) - this is our interface function written in language sql to fool the planner into thinking we are using an sql function. This is to get around certain anomalies of PostgreSQL that prevent pgsql set returning functions from being used if they reference fields in the from part of an SQL statement. Description of the inputs is as follows.
1. geom1 - this is the reference geometry.
2. distguess - this is the furthest distance you want to branch out before you want the query to give up. It is measured in units of the spatial ref system of your geometries.
3. numnn - number of nearest neighbors to return.
4. maxslices - this is the max number of slices you want to slice up your whole expand box. The more slices the thinner the slices, but the more iterations. I'm sure there is some statistically function one can run against ones dataset to figure out the optimal number of slices and distance guess, but I haven't quite figured out what that would be.
5. lookupset - this is usually the name of the table you want to search for nearest neighbors. In theory you could throw in a fairly complicated subselect statement as long as you wrap it in parethesis. Something like '(SELECT g.*, b.name FROM sometable g INNER JOIN someothertable b ON g.id = b.id)' . Althought I haven't had a need to do that yet so haven't tested that feature.
6. swhere - this is the where condition to apply to your dataset. If you want the whole dataset to be evaluated, then put 'true'. In theory you can have this correlated with your reference table by gluing parameters together, so can get pretty sophisticated.
7. sgid2field - this is the name of the unique id field in the dataset you want to check for nearest neighbors.
8. sgeom2field - this is the name of the geometry field in the dataset you want to check for nearest neighbors

The code can be downloaded here http://www.bostongis.com/downloads/pgis_nn.txt. I will also be posting this to the PostGIS Wiki once I clean it up a bit more.

Key features

My objective in creating this solution is to give PostGIS the same expressiveness that Oracle's SDO_NN and SDO_NN_DISTANCE functions provide Oracle. Unlike the Oracle solution, this solution is not implemented as an Operator, but instead as a set returning function.

Features are outlined below

Can handle more than one reference point at a time and return a single dataset. For example you can say give me the 5 nearest neighbors for each of my records in this query and get the result back as a single table.
You can limit your search to a particular distance range, slice it up as granularly as you want in order to achieve the best resulting speed. Note I implemented this function as linearly expanding boxes to utilize GIST indexes. In hindsight it might have been more efficient processor wise to have the boxes expand like a normal distribution or something like that.
You can apply where conditions to the sets you are searching instead of searching the whole table - e.g. only give me the 5 nearest hospitals that have more than 100 beds. You can also have the where conditions tied to fields of each record of interest. E.g. include all near neighbors, but don't include the reference point.
This solution works for all geometries - e.g. you can find point near neighbors to a polygon, polygon near neighbors to a polygon etc. Nearest distance to a line etc.

Caveats

This function will only work if you give it a primary key that is an integer. I figured an integer is what most people use for their primary keys and is the most efficient join wise.
This function currently doesn't handle ties. E.g. if you specify 5 neighbors and you have 6 where the last 2 are the same distance away, one will arbitrarily get left out. I think it will be fairly trivial to make it handle this case though.
If there are no neighbors for a particular reference that meet the stated criteria, that record would get left out. There are workarounds for this which I will try to provide in more advanced use cases.

Use Cases

All the data we will be using is in SRID: 26986. For a refresher course on loading data and setting up indexes, check Part 1: Getting Started With PostGIS: An almost Idiot's Guide

This returned 1000 records in 6 seconds on my dual windows 2003 server
Datasets used: buildings http://www.mass.gov/mgis/lidarbuildingfp2d.htm,
firestations: http://www.mass.gov/mgis/firestations.htm

Find nearest fire station for first 1000 buildings in Boston and sort in decreasing order of distance from the fire station such that the building furthest away from its nearest firestation is at the top.


SELECT g1.gid as gid_ref, f.name, g1.nn_dist/1609.344 as dist_miles, g1.nn_gid
FROM (SELECT b.gid, 
    (pgis_fn_nn(b.the_geom, 1000000, 1,1000,'fire_stations', 'true', 'gid', 'the_geom')).*  
    FROM (SELECT * FROM buildings limit 1000) b) As g1, fire_stations f
    WHERE f.gid = g1.nn_gid 
    ORDER BY g1.nn_dist DESC

Data set used: Fire stations from above and Massachusetts census tiger roads -http://www.mass.gov/mgis/cen2000_tiger.htm
For each street segment in zip 02109 list the 2 nearest fire stations and the distance in meters from each fire station. Order results by street name, street id and distance from nearest fire station


Total query runtime: 687 ms. 296 rows retrieved.

SELECT g1.gid as gid_ref, g1.street, g1.fraddl, g1.toaddl, f.name, g1.nn_dist as dist_meters, g1.nn_gid
FROM (SELECT b.*,
    (pgis_fn_nn(b.the_geom, 1000000, 2,1000,'fire_stations', 'true', 'gid', 'the_geom')).*  
    FROM (SELECT * FROM census2000tiger_arc where zipl = '02109') b) As g1, fire_stations f
    WHERE f.gid = g1.nn_gid 
    ORDER BY g1.street, g1.fraddl, g1.gid, g1.nn_dist

For each street segment in zip 02109 list the 2 nearest buildings with building footprint > 5000 sq meters and specify how far away they are and the areas of each building.


--Total query runtime: 2656 ms. 296 rows retrieved.

SELECT g1.gid as gid_ref, g1.street, g1.fraddl, g1.toaddl, area(b.the_geom) as bldg_area, g1.nn_dist as dist_meters, g1.nn_gid
FROM (SELECT s.*,
    (pgis_fn_nn(s.the_geom, 1000000, 2,1000,'buildings', 'area(the_geom) > 5000', 'gid', 'the_geom')).*  
    FROM (SELECT * FROM census2000tiger_arc where zipl = '02109') s) As g1, buildings b
    WHERE b.gid = g1.nn_gid 
    ORDER BY g1.street, g1.fraddl, g1.gid, g1.nn_dist

More Advanced Queries

Example of two simultaneous neighbor queries in one query

Layers used
http://www.mass.gov/mgis/schools.htm - schools
http://www.mass.gov/mgis/hospitals.htm - hospitals

Now for this, I think we can safely assume that no street we care about should be further than 30,000 meters from its nearest school and hospital so we will limit our search to 30,000 meters and dice up our box into 10 expand boxes - so boxes expand at a rate of 3000 meters out per expansion.

Give me the nearest elemenary school and the nearest hospital emergency center for street segments in select zipcodes and how far away they are from these street segments.


--Total query runtime: 5516 ms.(~5.5 secs). 291 rows retrieved.
SELECT emer.gid as gid_ref, emer.street, emer.fraddl, emer.toaddl, hospitals.name As emer_name, emer.nn_dist as emer_dist_meters, 
        schools.name as sch_name, sch.nn_dist as sch_dist_meters
FROM (SELECT b.gid, b.street,b.fraddl, b.toaddl,
    (pgis_fn_nn(b.the_geom, 30000, 1,10,'hospitals', 'er_status=''Y''', 'gid', 'the_geom')).*  
    FROM (SELECT * FROM census2000tiger_arc where zipl IN('02109', '02110')) As b) As emer INNER JOIN
    (SELECT b.gid,
    (pgis_fn_nn(b.the_geom, 30000, 1,10,'schools', 'true', 'gid', 'the_geom')).*  
    FROM (SELECT * FROM census2000tiger_arc where zipl IN('02109', '02110')) As b) As sch ON emer.gid = sch.gid
    INNER JOIN hospitals ON hospitals.gid = emer.nn_gid 
    INNER JOIN schools ON schools.gid = sch.nn_gid
    ORDER BY emer.street, emer.fraddl, emer.gid

Example limit distance exclusion queries

Provide a list of all schools in Massachusetts Boston extent whose closest hospital emergency room is further than 3000 meters away

Using our new neighbor function exclusion query


--Total query runtime: 2969 ms. 75 rows retrieved.
SELECT sch.gid as gid_ref, sch.name as sch_name
FROM schools sch LEFT JOIN (SELECT b.gid, 
    (pgis_fn_nn(b.the_geom, 3000, 1,2,'hospitals', 'er_status=''Y''', 'gid', 'the_geom')).*  
    FROM (SELECT * FROM schools WHERE schools.the_geom && GeomFromText('POLYGON((225446.34375 886435.5625,225446.34375 905270.8125,242278.640625 905270.8125,242278.640625 886435.5625,225446.34375 886435.5625))',26986)) b) As g1 ON sch.gid = g1.gid
    WHERE g1.gid IS NULL AND sch.the_geom && GeomFromText('POLYGON((225446.34375 886435.5625,225446.34375 905270.8125,242278.640625 905270.8125,242278.640625 886435.5625,225446.34375 886435.5625))',26986)
ORDER BY sch.name

The old way doing a simple expand distance exclusion query. SOMETIMES THE OLD WAYS ARE JUST BETTER


	-- Total query runtime: 63 ms. 75 rows retrieved.
SELECT sch.gid as gid_ref, sch.name as sch_name
FROM schools sch LEFT JOIN 
(SELECT b.gid FROM schools b , hospitals h 
		WHERE b.the_geom && GeomFromText('POLYGON((225446.34375 886435.5625,225446.34375 905270.8125,242278.640625 905270.8125,242278.640625 886435.5625,225446.34375 886435.5625))',26986) AND h.er_status = 'Y'
				AND expand(b.the_geom, 3000) && h.the_geom AND distance(b.the_geom, h.the_geom) <= 3000) g1
ON sch.gid = g1.gid
WHERE g1.gid IS NULL AND sch.the_geom && GeomFromText('POLYGON((225446.34375 886435.5625,225446.34375 905270.8125,242278.640625 905270.8125,242278.640625 886435.5625,225446.34375 886435.5625))',26986)
ORDER BY sch.name

Caching data by storing closest neighbor in a table

For this exercise, we will create 2 new fields in our schools table called nearest_hospital and nearest_hospital_gid and update the fields with the closest hospital to that school. We will set our maximum distance search to 100,000 meters carved out into 10 10,000 meter expansions.


-- Query returned successfully: 2613 rows affected, 20625 ms (~1/3 minute) execution time.
ALTER TABLE schools ADD nearest_hospital character varying(255);
ALTER TABLE schools ADD nearest_hospital_gid integer;

UPDATE schools SET nearest_hospital = h.name , nearest_hospital_gid = h.gid
FROM (SELECT s.gid As sch_gid,
    (pgis_fn_nn(s.the_geom, 100000, 1,10,'hospitals', 'er_status=''Y''', 'gid', 'the_geom')).*  
    FROM schools As s) As nn_emer INNER JOIN hospitals h ON nn_emer.nn_gid = h.gid
WHERE schools.gid = nn_emer.sch_gid

Later on we realize we need an alternate hospital in case the first hospital is filled or because of traffic situation we can't get there, so we create a new set of fields to hold the second closest hospital. This particular exercise demonstrates where clause dependent on the additional attribute data of the reference record.


-- Query returned successfully: 2613 rows affected, 23032 ms (~1/3 minute) execution time.
ALTER TABLE schools ADD nearest_hospital_2 character varying(255);
ALTER TABLE schools ADD nearest_hospital_gid_2 integer;


UPDATE schools SET nearest_hospital_2 = h.name , nearest_hospital_gid_2 = h.gid
FROM (SELECT s.gid As sch_gid,
    (pgis_fn_nn(s.the_geom, 100000, 1,10,'hospitals', 'er_status=''Y'' AND gid <> ' || CAST(s.nearest_hospital_gid As varchar(20)) , 'gid', 'the_geom')).*  
    FROM schools As s) As nn_emer INNER JOIN hospitals h ON nn_emer.nn_gid = h.gid
WHERE schools.gid = nn_emer.sch_gid AND nearest_hospital_gid IS NOT NULL

Part 1: Getting Started With PostGIS: An almost Idiot's Guide

What Is PostGIS?

PostGIS is an open source, freely available, and fairly OGC compliant spatial database extender for the PostgreSQL Database Management System. In a nutshell it adds spatial functions such as distance, area, union, intersection, and specialty geometry, geography, and raster data types to the database. PostGIS is very similar in functionality to SQL Server Spatial support, ESRI ArcSDE, Oracle Spatial, and DB2 spatial extender except it has more functionality and generally better performance than all of those (and it won't bankrupt you). The latest release version now comes packaged with the PostgreSQL DBMS installs as an optional add-on. As of this writing PostGIS 3.3.2 is the latest stable release and PostGIS 3.4.0 will be coming out within the next year. Features new in PostGIS 3.3 and new in Upcoming PostGIS 3.4. Notable features of PostGIS:

Flat earth including 3D geometry support (part of postgis extension), advanced 3D in postgis_sfcgal extension
Round-earth (geography type) spatial support (part of postgis extension)
Advanced 3D analytics via SFCGAL Enhanced 3D geometry and other geometry processing functions.
SQL/MM topology support: via postgis_topology extension
Seamless raster/vector analysis support via the postgis_raster extension including an easy to use command line raster database loader that supports various types and can load whole folders of raster files with one commandline statement, and really jazzy image export functions to output both raster and geometries as PNG/TIFF and other raster formats.
the graphical gui loader,which is packaged with the Windows Application Stack builder and some other desktop distros, includes batch file uploading as well as exporting. You will find this installed in your PostgreSQL/bin folder and is called shp2pgsql-gui.exe.
Geocoding for US data utilizing US Census Tiger Data using the postgis_tiger_geocoder extension
Standardizing addresses using address_standardizer extension for parsing addresses into parts, useful for geocoding addresses.

The PostGIS Windows bundle also includes cool PostGIS related extensions that augment your spatial enjoyment. In the bundle you will also find:

pgRouting for building routing applications.
ogrfdw for querying remote spatial data sources. This includes curl support so you can query web services such as WFS services as well.
pgpointcloud for storing LIDAR data in compress POINTPATCH and also performing various operations on it.
h3-pg for using Uber h3 indexing scheme and converting h3 hexagons back and forth between postgis geometry/geography representations and h3index hashes

We will assume a windows environment for this tutorial, but most of the tutorial will apply to other supported platforms such as Linux, Unix, BSD, Mac etc. We will also be using Massachusetts/Boston data for these examples. For desktop users, the EnterpriseDB one-click installer exists as well for Mac/OSX and Linux desktops, so you should be able to follow along without too much fuss.

Installing PostgreSQL with PostGIS Functionality

We will not go into too much detail here since the install wizard (at least the windows one) is pretty good. Below are the basic steps.

Download the install for your specific platform from the PostgreSQL Binary Download ( https://www.postgresql.org/download/ ) . As of this writing the latest version is PostgreSQL 15 and we will be assuming PostGIS 3.3+. The minimum support for PostgreSQL for PostGIS 3.3 is PostgreSQL 11, PostgreSQL for PostGIS 3.1 is PostgreSQL 9.5, and PostGIS 3.2 is 9.6 (for windows we build installers for 9.6-14 for the 3.2 series)
Launch exe to install PostgreSQL
Once PostgreSQL is installed, launch Application Stack Builder from (Start->Programs->PostgreSQL ..->Application Stackbuilder and pick the version of PostgreSQL you want to install PostGIS on and the version of PostGIS to install.
You can upgrade from 2.+ to 3.+ without having to backup and restore your db. and ALTER EXTENSION postgis UPDATE; will do the trick after you've installed the latest PostGIS.

For versions of PostGIS from 2.5.3+ you can use SELECT postgis_extensions_upgrade(); to upgrade postgis, postgis_raster, postgis_topology, postgis_sfcgal, and postgis_tiger_geocoder.

There is a postgis sampler option) which creates a regular spatial db with all the extensions installed. The dumper,loader commandline and GUI tools in PostgreSQL bin folder will get overwritten by the last PostGIS you installed so be careful. Generally speaking PostGIS 3.1 should work just fine everywhere you were using PostGIS before. If you were upgrading from PostGIS < 2.2 you may need to install the legacy_minimal.sql or legacy.sql scripts packaged in PostgreSQL//contrib/postgis-3.1
Navigate to spatial extensions and pick PostGIS 3.3. Download , install. Please note the PostGIS windows installer, no longer creates a template database. Using CREATE EXTENSION postgis; for enabling PostGIS in a database is the recommended way.
The create spatial database checkbox is optional, and we generally uncheck it. It creates a spatial database for you to experiment with and has all the extensions packaged with PostGIS Bundle pre-installed.
Do you want to register the PROJ dir. This is needed for geometry and raster ST_Transform
Do you want to register the GDAL_DATA prompt is needed for PostGIS raster and ogr_fdw extensions. This is because in order to do operations that require raster transformations or other rater warping / clipping etc, PostGIS uses GDAL epsg files. ogr_fdw uses configuration files for some vector types and these are stored in GDAL_DATA folder. The windows build, makes a local copy of these in the PostgreSQL install\gdal-data folder and saying yes will automatically add/change an GDAL_DATA environment variable putting this path in for you. If you use GDAL already (or you are running both PostGIS 32-bit and 64-bit, chances are you already have this environment variable set and you may not want to overwrite it. PostGIS will work happily with an existing one, but just remember if you uninstall a PostGIS or your GDAL, these functions may stop working and you'll need to repoint the environment variable.
-- Enable Raster drivers. All raster drivers are disabled by default. Saying yes to this prompt allows the most common drivers (that are considered safe). E.g. they don't call out to web services etc. If you are not content with the list shown there, you may want to explicitly enable additional drivers using the GUC raster features PostGIS GDAL Enabled Raster Drivers GUC, that can either be set using ALTER SYSTEM (for 9.4+), or ALTER DATABASE for specific databases.
-- Out of database rasters are disabled by default for security reasons. If you need them, say yes to this prompt. Again if you want each database to have different settings, you can opt for the GUC route. Enable Out of Database rasters GUC
For those of you who want to try experimental builds -- e.g. We have experimental Windows builds made as part of our continuous integration process when anything changes in the PostGIS code base. These can be downloaded from http://postgis.net/windows_downloads.

Creating a spatial database using SQL

You can use something like psql or the pgAdmin query window to create a database and spatially enable your database. This is the way to go if you have only a terminal interface to your db or you have a butt load of extensions you want to enable. Your psql steps would look something like this:

To install a bunch of extensions, just open up the pgAdmin SQL Query window (which we'll cover shortly) or psql and run this including only the extensions you want.


CREATE DATABASE gisdb;
\connect gisdb;
-- Enable PostGIS (includes raster)
CREATE EXTENSION postgis;
-- Enable Topology
CREATE EXTENSION postgis_topology;
-- Enable PostGIS Advanced 3D 
-- and other geoprocessing algorithms
CREATE EXTENSION postgis_sfcgal;
-- fuzzy matching needed for Tiger
CREATE EXTENSION fuzzystrmatch;
-- rule based standardizer
CREATE EXTENSION address_standardizer;
-- example rule data set
CREATE EXTENSION address_standardizer_data_us;
-- Enable US Tiger Geocoder
CREATE EXTENSION postgis_tiger_geocoder;
-- routing functionality
CREATE EXTENSION pgrouting;
-- spatial foreign data wrappers
CREATE EXTENSION ogr_fdw;

-- LIDAR support
CREATE EXTENSION pointcloud;
-- LIDAR Point cloud patches to geometry type cases
CREATE EXTENSION pointcloud_postgis;

--- Uber h3 hexagon indexing scheme for PostGIS 3.3.2+ bundles
CREATE EXTENSION h3;
--- converts between h3 index representations 
-- and  postgis geometry/geography
CREATE EXTENSION h3_postgis;

Creating a spatial database using pgAdmin GUI

PostgreSQL comes packaged with a fairly decent admin tool called PgAdmin3. If you are a newbie or not sure what extensions you want, it's best just to use that tool to create a new database and look at the menu of extensions you have available to you.

On windows PgAdmin is under Start->Programs->PostgreSQL 14->PgAdmin 4

You can also run pgAdmin not on the server and also just install it separately downloading from https://pgadmin.org,
Login with the super user usually postgres and the password you chose during install. If you forgot it, then go into pg_hba.conf (which is located where you specified data cluster in PostgreSQL install) (just open it with an editor such as notepad or a programmer editor). Set the line
```
host all all 127.0.0.1/32 md5

to

host all all 127.0.0.1/32 trust
```
If you are on a newer windows (say 2008 or Windows 7), you may see an additional option
host all all ::1/128 trust
The ::1/128 is usually the controlling one and is what localhost resolves to in IPV6 so you'll want to set this one.

This will allow any person logging locally to the computer that PostgreSQL is installed on to access all databases without a password. (127.0.0.1/32) means localhost only (32 is the bit mask). Note you can add additional lines to this file or remove lines to allow or block certain ip ranges. The lines highest in the file take precedence.

So for example if you wanted to allow all users logging in access as long as they successfully authenticate with an md5 password, then you can add the line
```
host all all 0.0.0.0/0 md5
```
. If it is below, you will still be able to connect locally without a password but non-local connections will need a valid username and password.

Note: - PgAdmin allows editing Postgresql.conf and pg_hba.conf using the PgAdmin tool. These are accessible from Tools->Server Configuration and provide a fairly nice table editor to work with. This feature is only available if you have installed the adminpack.sql (this is located in C:\Program Files\PostgreSQL\9.x\share\contrib) (Admin Pack) in the postgres database .
To install it --- switch to postgres database

and run this command in the sql window of PgAdmin or using psql: CREATE EXTENSION adminpack;
(NOTE: you can also use the extensions gui of PgAdmin to install in the postgres db)
Now for the fun part - Create your database. Call it gisdb or whatever you want.
It's generally a good idea to create a user too that owns the database that way you don't need to use your superuser account to access it.

Verify you have the newest PostGIS with this query:

SELECT postgis_full_version();

Should output something of the form

POSTGIS="3.3.2 3.3.2" [EXTENSION] PGSQL="150" 
GEOS="3.11.1-CAPI-1.17.1" 
SFCGAL="SFCGAL 1.4.1, CGAL 5.3, BOOST 1.78.0" 
PROJ="7.2.1" GDAL="GDAL 3.4.3, released 2022/04/22" LIBXML="2.9.9" 
LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" TOPOLOGY RASTER

Loading GIS Data Into the Database

Now we have a nice fully functional GIS database with no spatial data. So to do some neat stuff, we need to get some data to play with.

Get the Data

Download data from the MassGIS site.
For this simple exercise just download Towns
Extract the files into some folder. We will only be using the _POLY files for this exercise.

NOTE: Someone asked how you extract the file if you are on a linux box.

---FOR LINUX USERS ---

If you are on Linux/Unix, I find the exercise even easier. If you are on linux or have Wget handy - you can do the below to download the file after you have cded into the folder you want to put it in.

wget http://wsgw.mass.gov/data/gispub/shape/state/boundaries.zip

Now to extract it simply do the following from a shell prompt

unzip boundaries.zip

---END FOR LINUX USERS ---

Figure out SRID of the data

You will notice one of the files it extracts is called BOUNDARY_POLY.prj. A .prj is often included with ESRI shape files and tells you the projection of the data. We'll need to match this descriptive projection to an SRID (the id field of a spatial ref record in the spatial_ref_sys table) if we ever want to reproject our data.

Open up the .prj file in a text editor. You'll see something like NAD_1983_StatePlane_Massachusetts_Mainland_FIPS_2001 and UNIT["Meter",1.0]
Open up your PgAdmin III query tool and type in the following statement select srid, srtext, proj4text from spatial_ref_sys where srtext ILIKE '%Massachusetts%'And then click the green arrow. This will bring up about 10 records.
Note the srid of the closest match. In this case its 26986. NOTE: srid is not just a PostGIS term. It is an OGC standard so you will see SRID mentioned a lot in other spatial databases, gis webservices and applications. Most of the common spatial reference systems have globally defined numbers. So 26986 always maps to NAD83_StatePlane_Massachusetts_Mainland_FIPS_2001 Meters. Most if not all MassGIS data is in this particular projection.

Loading the Data

The easiest data to load into PostGIS is ESRI shape data since PostGIS comes packaged with a nice command line tool called shp2pgsql which converts ESRI shape files into PostGIS specific SQL statements that can then be loaded into a PostGIS database.

This file is located in the PostgreSQL bin folder which default location in Windows is Program Files/PostGreSQL/15/bin

Make a PostGIS mini toolkit

Since these files are so embedded, it is a bit annoying to navigate to. To create yourself a self-contained toolkit you can carry with you anywhere, copy the following files from the bin folder into say c:\pgutils:

 comerr32.dll krb5_32.dll libeay32.dll
libiconv-2.dll libintl-2.dll libpq.dll pgsql2shp.exe psql.exe
pg_dump.exe pg_restore.exe shp2pgsql.exe ssleay32.dll libproj-9.dll geos_c.dll geos

Note: The GUI loader is packaged as a self-contained postgisgui folder in the bin of your PostgreSQL install. If you prefer the GUI interface, you can copy that folder and run the shp2pgsql-gui.exe file from anywhere even an external file network path.

Load Towns data

Open up a command prompt.
Cd to the folder you extracted the towns data
Run the following command:
c:\pgutils\shp2pgsql -s 26986 BOUNDARY_POLY towns > towns.sql
Load into the database with this command:
psql -d gisdb -h localhost -U postgres -f towns.sql
If you are on another machine different from the server, you will need to change localhost to the name of the server. Also you may get prompted for a password. For the above I used the default superuser postgres account, but its best to use a non-super user account.
Alternatively you can use the gui to load the data and when you do, your screen will look something like this. Which is a little different from the PostGIS 1.5 loader, because it allows uploading multiple files at ones. To edit any of the settings for each file, click into the cell and the cell will become editable. In this case we replaced the default table name boundary_poly with towns.
This particular dataset is only polygons. You can override the behavior of bringing in as multipolgons by clicking the Options button checking the Generate simple geometries ... .
One thing you can do with the shp2pgsql command line version pacakged with PostGIS 2.0-2.2, that you can't do with the GUI is to do a spatial transformation from one coordiante system to another. So witht eh command line, we can transform to 4326 (WGS 84) and load to geography type with a single command. Hopefully we'll see this in the GUI in a future release.

Indexing the data

Table indexes are very important for speeding up the processing of most queries. There is also a downside to indexes and they are the following

Indexes slow down the updating of indexed fields.
Indexes take up space. You can think of an index as another table with bookmarks to the first similar to an index to a book.

Given the above, it is often times tricky to have a good balance. There are a couple general rules of thumb to go by that will help you a long way.

Never put indexes on fields that you will not use as part of a where condition or join condition.
Be cautious when putting index fields on heavily updated fields. For example if you have a field that is frequently updated and is frequently used for updating, you'll need to do benchmark tests to make sure the index does not cause more damage in update situations than it does for select query situations. In general if the number of records you are updating at any one time for a particular field is small, its safe to put in an index.
Corrollary to 2. For bulk uploads of a table - e.g. if you are loading a table from a shape, its best to put the indexes in place after the data load because if an index is in place, the system will be creating indexes as its loading which could slow things down considerably.
If you know a certain field is unique in a table, it is best to use a unique or primary index. The reason for this is that it tells the planner that once its found a match, there is no need to look for another. It also prevents someone from accidentally inserting a duplicate record as it will throw an error.
For spatial indexes - use a gist index. A gist basically stores the bounding box of the geometry as the index. For large complex geometries unfortunately, this is not too terribly useful.

The most common queries we will be doing on this query are spatial queries and queries by the town field. So we will create 2 indexes on these fields. NOTE: The loader has an option to create the spatial index which we took advantage of, so the spatial index one is not necessary but we present it here, just so if you ever need to create a spatial index, like for csv loaded data, you know how to.

CREATE INDEX idx_towns_geom
ON towns
USING gist(geom);


CREATE INDEX idx_towns_town
ON towns
USING btree(town);

Querying Data

Go back into PgAdmin III and refresh your view. Verify that you have a towns database now.

Test out the following queries from the query tool


SELECT ST_Extent(geom) FROM towns WHERE town = 'BOSTON';

SELECT ST_Area(ST_Union(geom)) FROM towns WHERE town = 'BOSTON';

Viewing the Data

If you are a GIS newbie, I highly recommend using QGIS. QGIS has ability to view PostGIS data both geometry and raster directly, do simple filters on it, is free, is cross-platform (Linux, Windows, MacOSX,Unix) and is the least threatening of all the GIS Viewers I have seen out there for people new to GIS.

Part 1: Getting Started With PostGIS: An almost Idiot's Guide (PostGIS 2.0)

What Is PostGIS?

PostGIS is an open source, freely available, and fairly OGC compliant spatial database extender for the PostgreSQL Database Management System. In a nutshell it adds spatial functions such as distance, area, union, intersection, and specialty geometry data types to the database. PostGIS is very similar in functionality to SQL Server 2008 Spatial support, ESRI ArcSDE, Oracle Spatial, and DB2 spatial extender. The latest release version now comes packaged with the PostgreSQL DBMS installs as an optional add-on. As of this writing PostGIS 2.0.0 is the latest stable release. Noteable enhancements in this release:

now packages along new 3D measurement functions, 3D spatial index support and 3D surface area types
seamless raster/vector analysis support including an easy to use command line raster database loader that supports various types and can load whole folders of raster files with one commandline statement, and really jazzy image export functions to output both raster and geometries as PNG/TIFF and other raster formats.
SQL/MM topology support
the graphical gui loader,which is packaged with the Windows Application Stack builder and some other desktop distros, now includes batch file uploading as well as exporting. This feature can be enabled as a plugin in pgAdmin III
Ability to install using CREATE EXTENSION postgis, ALTER EXTENSION postgis .. if you are running 9.1.3
KNN distance functionality if you are running PostgreSQL 9.1+
It also includes as an extra: tiger geocoder with loader for 2010 Tiger data. You can expect to see this tiger geocoder upgrade in 2.1.0 to load tiger 2011 data.
The very first version of PostGIS to support PostgreSQL 64-bit on windows

We will assume a windows environment for this tutorial, but most of the tutorial will apply to other supported platforms such as Linux, Unix, BSD, Mac etc. We will also be using Massachusetts/Boston data for these examples. For desktop users, the EnterpriseDB one-click installer exists as well for Mac/OSX and Linux desktops, so you should be able to follow along without too much fuss.

Installing PostgreSQL with PostGIS Functionality

We will not go into too much detail here since the install wizard (at least the windows one) is pretty good. Below are the basic steps.

Note for Vista Users Because of the new added security in Vista, you may run into issues installing PostgreSQL. Please refer to the Windows Vista gotchas http://trac.osgeo.org/postgis/wiki/UsersWikiWinVista in the PostGIS wiki if you run into issues.

Download the install for your specific platform from the PostgreSQL Binary Download ( http://www.postgresql.org/download/ ) . As of this writing the latest version is PostgreSQL 9.1.3 and we will be assuming PostGIS 2.0. The minimum support PostgreSQL for PostGIS 2.0.0 is PostgreSQL 8.4 and to get the full fancy smancy features like KNN distance indexable operators and ability to install with extensions, you need 9.1. So use 9.1, PLEASE, enough said. The below options follow the basic sequence of the postgresql windows installer.
Launch exe to install PostgreSQL
If you want to access this server from other than the server itself. Check the "Accept connection on all addresses, not just localhost". NOTE: You can change this later by editing the postgresql.conf -> listen_addresses property and if you don't like the default port of 5432 you can change this as well in the postgresql.conf -> port property.
For encoding UTF-8 is preferred because you can convert to other encodings. SQL_ASCII was the default on Windows before 8.3 and was later replaced with WIN1252. UTF-8 however is now supported well under Windows and generally the default on Linux/Unix.
Once PostgreSQL is installed, launch Application Stack Builder from (Start->Programs->PostgreSQL 9.1->Applciation Stackbuilder and pick the version of PostgreSQL you want to install PostGIS on and the version of PostGIS to install. NOTE: PostGIS 1.5,2.0.0 can coexist on the same server so you can install both, in 2.0.0 (as far as windows is concerned, we changed the template to template_postgis20) so it won't require overwriting with the 1.* template_postgis database. The dumper,loader commandline and GUI tools in PostgreSQL bin folder will get overwritten by the last PostGIS you installed so be careful. Generally speaking PostGIS 2.0.0 should work just fine everywhere you were using PostGIS 1.5 before, but you may need install the legacy_minimal.sql or legacy.sql if you have old applications or tools that use old functions (removed in PostGIS 2.0)
Navigate to spatial extensions and pick PostGIS 2.0. Download , install. Please note for many install packages - particularly windows. When you choose PostGIS as an option, the system will create a template_postgis20 (or template_postgis) template database for you that has PostGIS functions included.
The create spatial database checkbox is optional, and we generally uncheck it. It creates a spatial database for you to experiment with, but the template_postgis20 is always created for windows. If you are running PostgreSQL 9.1+, we recommend you don't even bother with the template database and just use CREATE EXTENSION postgis in the database of your choosing or use PgAdmin Extensions install feature which we will cover in this tutorial.
-- Do you want to register the GDAL_DATA prompt is new for PostGIS 2.0. This is because in order to do operations that require raster transformations or other rater warping / clipping etc, PostGIS uses GDAL epsg files. The windows build, makes a local copy of these in the PostgreSQL install\gdal-data folder and saying yes will automatically add an GDAL_DATA environment variable putting this path in for you. If you use GDAL already (or you are running both PostGIS 32-bit and 64-bit, chances are you already have this environment variable set and you may not want to overwrite it. PostGIS will work happily with an existing one, but just remember if you uninstall a PostGIS 2.0 or your GDAL, these functions may stop working and you'll need to repoint the environment variable.
For those of you who want to try experimental builds -- e.g. We have experimental Windows builds made weekly or as frequently as anything interesting happens in the PostGIS code base. These can be downloaded from http://postgis.net/windows_downloads. For those who want to test out 2.0.1SVN, just replace the postgis-2.0.dll in the PostgreSQL lib folder with the one in the zip and run the respective postgis upgrade sql file in the share/contrib/postgis-2.0 or if you are using extensions ALTER EXTENSION postgis UPDATE TO "2.0.1SVN".

Creating a spatial database

PostgreSQL comes packaged with a fairly decent admin tool called PgAdmin3. If you are a newbie, its best just to use that tool to create a new database.

On windows PgAdmin III is under Start->Programs->PostgreSQL 9.1->PgAdmin III
Login with the super user usually postgres and the password you chose during install. If you forgot it, then go into pg_hba.conf (just open it with an editor such as notepad or a programmer editor). Set the line
```
host all all 127.0.0.1/32 md5

to

host all all 127.0.0.1/32 trust
```
If you are on a newer windows (say 2008 or Windows 7), you may see an additional option
host all all ::1/128 trust
The ::1/128 is usually the controlling one and is what localhost resolves to in IPV6 so you'll want to set this one.

This will allow any person logging locally to the computer that PostgreSQL is installed on to access all databases without a password. (127.0.0.1/32) means localhost only (32 is the bit mask). Note you can add additional lines to this file or remove lines to allow or block certain ip ranges. The lines highest in the file take precedence.

So for example if you wanted to allow all users logging in access as long as they successfully authenticate with an md5 password, then you can add the line
```
host all all 0.0.0.0/0 md5
```
. If it is below, you will still be able to connect locally without a password but non-local connections will need a valid username and password.

Note:

postgres

On windows the file is located in C:\Program Files\PostgreSQL\9.0\share\contribs\adminpack.sql for versions of PostgreSQL below 9.1. It is located in the share folder of Linux installs as well. To install it --- switch to postgres database and run the adminpack.sql script in that database.

If you are running 9.1, however, it doesn't matter which OS you are running with -- just connect to your postgres database and run this command in the sql window of PgAdmin or using psql: CREATE EXTENSION adminpack;

(NOTE: you can also use the extensions gui of PgAdmin to install in the postgres db)

Now for the fun part - Create your database. Call it gisdb or whatever you want. If you are running PostgreSQL below 9.1, choose the template database called template_postgis20. Chose this as a template. For PostgreSQL 9.1+, we recommend you just use the default template
Its generally a good idea to create a user too that owns the database that way you don't need to use your superuser account to access it.
UPDATE: - The remaining steps in this section are not needed if you chose template_postgis20 for your new database. However if you are trying to spatially enable an existing database or you didn't get the template_postgis20 option. Do the remaining steps.

For PostgreSQL 9.1 -- expand your database and you should see an extensions section , right-mouse click on that and select the New Extension option.

Note the two options -- postgis and postgis_topology -- ain't that a beaut. This will work even if your PostgreSQL server is not on Windows. . Select postgis (this includes geometry, geography, and raster support). Then if you want to use topology, then repeat the step picking postgis_topology. Also note the Definition tab, which allows you to install postgis in the non-default schema and will also allow you a painless way to upgrade to PostGIS 2.0.1 when that comes out by changing the version number from drop down.

If you are unlucky enough to be using PostGIS on pre-9.1 and also don't have a template database, things are a bit tougher. The pre-PostgreSQL 9.1 way --Next go to tools->Query tool in pgAdmin III and browse to the postgresql install contrib postgis.sql file (on Windows the default install is Program files\Postgresql\8.4\share\contrib\postgis-2.0. You'll need to run all these scripts:
```
postgis.sql
rtpostgis.sql
spatial_ref_sys.sql
topology.sql (if you want topology support)
```
On the Query tool, make sure your gisdb is selected and then run the above scripts click the green arrow. You'll get a bunch of notices - not to be alarmed.

As of PgAdmin III 1.10 -- the Plugin Icon has PSQL as an option and if you responded to the message Yes to the Overwrite my plugins.ini (for 9.1 you won't get a prompt since these files are stored separately), you should get an additional PostGIS Shapefile and DBF Loader option. . This option should become ungreyed when you select a database and when you launch it, it will pass in the credentatials to the database for you and launch a PSQL connection.

If the Plugins green is disabled (and says No Plugins installed)

File->Options

C:\Program Files\PostgreSQL\8.4\bin) and click the ... to repoint it if it is not

Loading GIS Data Into the Database

Now we have a nice fully functional GIS database with no spatial data. So to do some neat stuff, we need to get some data to play with.

Get the Data

Download data from the MassGIS site.
For this simple exercise just download Towns
Extract the files into some folder. We will only be using the _POLY files for this exercise.

NOTE: Someone asked how you extract the file if you are on a linux box.

---FOR LINUX USERS ---

If you are on Linux/Unix, I find the exercise even easier. If you are on linux or have Wget handy - you can do the below to download the file after you have cded into the folder you want to put it in.

wget http://wsgw.mass.gov/data/gispub/shape/state/townssurvey_shp.exe

Now to extract it simply do the following from a shell prompt

unzip townssurvey_shp.exe

---END FOR LINUX USERS ---

NOTE: As of PostGIS 1.5.0, the windows build is now packaged with a shp2pgsql Graphical User Interface. You can also download it separately if you are using a lower version of PostGIS or want to load data from a separate workstation that doesn't have PostgreSQL installed.download this separately and use with any version of PostGIS from 1.2 - 2.0.0 and enable it as a Plug-In in PgAdminIII . Check out our screencast on configurating the shp2pgsql-gui as a PgAdmin III plug-in and using it. or our write-up on registering it

Figure out SRID of the data

You will notice one of the files it extracts is called TOWNS_POLY.prj. A .prj is often included with ESRI shape files and tells you the projection of the data. We'll need to match this descriptive projection to an SRID (the id field of a spatial ref record in the spatial_ref_sys table) if we ever want to reproject our data.

Open up the .prj file in a text editor. You'll see something like NAD_1983_StatePlane_Massachusetts_Mainland_FIPS_2001 and UNIT["Meter",1.0]
Open up your PgAdmin III query tool and type in the following statement select srid, srtext, proj4text from spatial_ref_sys where srtext ILIKE '%Massachusetts%'And then click the green arrow. This will bring up about 10 records.
Note the srid of the closest match. In this case its 26986. NOTE: srid is not just a PostGIS term. It is an OGC standard so you will see SRID mentioned a lot in other spatial databases, gis webservices and applications. Most of the common spatial reference systems have globally defined numbers. So 26986 always maps to NAD83_StatePlane_Massachusetts_Mainland_FIPS_2001 Meters. Most if not all MassGIS data is in this particular projection.

Loading the Data

The easiest data to load into PostGIS is ESRI shape data since PostGIS comes packaged with a nice command line tool called shp2pgsql which converts ESRI shape files into PostGIS specific SQL statements that can then be loaded into a PostGIS database.

This file is located in the PostGresql bin folder which default location in Windows is Program Files/PostGreSQL/8.4/bin

Make a PostGIS mini toolkit

Since these files are so embedded, it is a bit annoying to navigate to. To create yourself a self-contained toolkit you can carry with you anywhere, copy the following files from the bin folder into say c:\pgutils:

 comerr32.dll krb5_32.dll libeay32.dll
libiconv-2.dll libintl-2.dll libpq.dll pgsql2shp.exe psql.exe
pg_dump.exe pg_restore.exe shp2pgsql.exe ssleay32.dll

Note: The GUI loader is packaged as a self-contained postgisgui folder in the bin of your PostgreSQL install. If you prefer the GUI interface, you can copy that folder and run the shp2pgsql-gui.exe file from anywhere even an external file network path.

Load Towns data

Open up a command prompt.
Cd to the folder you extracted the towns data
Run the following command:
c:\pgutils\shp2pgsql -s 26986 TOWNS_POLY towns > towns.sql
Load into the database with this command:
psql -d gisdb -h localhost -U postgres -f towns.sql
If you are on another machine different from the server, you will need to change localhost to the name of the server. Also you may get prompted for a password. For the above I used the default superuser postgres account, but its best to use a non-super user account.
Alternatively you can use the gui to load the data and when you do, your screen will look something like this. Which is a little different from the PostGIS 1.5 loader, because it allows uploading multiple files at ones. To edit any of the settings for each file, click into the cell and the cell will become editable.
This particular dataset is only polygons. You can override the behavior of bringing in as multipolgons by clicking the Options button checking the Generate simple geometries ... .
One thing you can do with the shp2pgsql command line version pacakged with PostGIS 2.0, that you can't do with the GUI is to do a spatial transformation from one coordiante system to another. So witht eh command line, we can transform to 4326 (WGS 84) and load to geography type with a single command. Hoepfully we'll see this in the GUI in PostGIS 2.0.1 or 2.1.0.

Indexing the data

Table indexes are very important for speeding up the processing of most queries. There is also a downside to indexes and they are the following

Indexes slow down the updating of indexed fields.
Indexes take up space. You can think of an index as another table with bookmarks to the first similar to an index to a book.

Given the above, it is often times tricky to have a good balance. There are a couple general rules of thumb to go by that will help you a long way.

Never put indexes on fields that you will not use as part of a where condition or join condition.
Be cautious when putting index fields on heavily updated fields. For example if you have a field that is frequently updated and is frequently used for updating, you'll need to do benchmark tests to make sure the index does not cause more damage in update situations than it does for select query situations. In general if the number of records you are updating at any one time for a particular field is small, its safe to put in an index.
Corrollary to 2. For bulk uploads of a table - e.g. if you are loading a table from a shape, its best to put the indexes in place after the data load because if an index is in place, the system will be creating indexes as its loading which could slow things down considerably.
If you know a certain field is unique in a table, it is best to use a unique or primary index. The reason for this is that it tells the planner that once its found a match, there is no need to look for another. It also prevents someone from accidentally inserting a duplicate record as it will throw an error.
For spatial indexes - use a gist index. A gist basically stores the bounding box of the geometry as the index. For large complex geometries unfortunately, this is not too terribly useful.

The most common queries we will be doing on this query are spatial queries and queries by the town field. So we will create 2 indexes on these fields. NOTE: The loader has an option to create the spatial index which we took advantage of, so the spatial index one is necessary aside form just to keep in mind or if you forgot to enable the index opton in the loader.

CREATE INDEX idx_towns_geom
ON towns
USING gist(geom);


CREATE INDEX idx_towns_town
ON towns
USING btree(town);

Querying Data

Go back into PgAdmin III and refresh your view. Verify that you have a towns database now.

Test out the following queries from the query tool

For PostGIS installations of 1.2.2 and above, the preferred function names start with ST_


SELECT ST_Extent(geom) FROM towns WHERE town = 'BOSTON';

SELECT ST_Area(ST_Union(geom)) FROM towns WHERE town = 'BOSTON';

Old syntax pre PostGIS 1.2.2 - this will not work in PostGIS 1.4+. If you have old code like this -- change it to the above syntax. We have crossed out the below code to demonstrate it is BAD


SELECT Extent(geom) from towns where town = 'BOSTON';

SELECT Area(GeomUnion(geom)) FROM towns where town = 'BOSTON';

Most functions in new postgis installs just have an ST_ prefixed, except for GeomUnion which became ST_Union. The other difference is that relational operators with ST_ now automagically use index operators where as the ones without ST_ you need to do an additional && call.

Example: a.geom && b.geom AND Intersects(a.geom,b.geom) can simply be written as
ST_Intersects(a.geom, b.geom)

If the above gives you an error such as mixed SRIDs, most likely you are running 1.3.2 postgis which was very defective. Upgrade to 1.3.3 at your next opportunity. To verify your install -
SELECT postgis_full_version();

Viewing the Data

If you are a GIS newbie, I highly recommend using Quantum GIS. Quantum GIS has ability to view PostGIS data directly, do simple filters on it, is free, is cross-platform (Linux, Windows, MacOSX,Unix) and is the least threatening of all the GIS Viewers I have seen out there for people new to GIS.

Post Comments About Part 1: Getting Started With PostGIS: An almost Idiot's Guide

Part 2: Introduction to Spatial Queries and SFSQL with PostGIS

What is SFSQL?

One of the greatest things about Spatial Relational Databases is that they bring GIS to a new level by allowing you to apply the expressive SQL declarative language to the spatial domain. With spatial relational databases, you can easily answer questions such as what is the average household income of a neighborhood block. What political district does X reside in. This new animal that marries SQL with GIS is called Simple Features for SQL (SFSQL). In essence SFSQL introduces a new set of functions and aggregate functions to the SQL Language.

Some Common Queries

In this next section we'll go over some common queries utilizing the data we downloaded in Part 1. Although we are using PostGIS for this exercise, most Spatial Relational Databases such as Oracle Spatial, ArcSDE, DB Spatial Extender, have similar syntax.

Transforming from One Coordinate System to Another

NAD 83 Meters to NAD 83 Ft

As noted in Part 1, the data we downloaded is in NAD 83 Meters MA State Plane. What if we needed the data in NAD 83 feet. To accomplish this, we would need to transform our spatial data to the new coordinate system with the transform function. If you look in the spatial_ref_sys table you'll notice that srid = 2249 is the srid of the Nad 83 ft MA State Plane. Our query would look something like this.

SELECT town, ST_Transform(geom, 2249) as
geom_nad83ft FROM towns

Getting Latitude and Longitude Centroid

In order to get the latitude and longitude of our data, we need our coordinate reference system to be in some for of longlat. To begin with, we first transform our data from NAD 83 Meters Massachusetts State Plane to some variant of longlat - closest match is NAD 83 North American Datum (srid = 4269). Then we find the centroid and then the x and y coordinates of that.


  SELECT town, ST_X(ST_Centroid(ST_Transform(geom, 4269))) as longitude,

ST_Y(ST_Centroid(ST_Transform(geom, 4269))) as latitude


FROM towns

Aggregate Functions

Spatial aggregate functions are much like regular SQL aggregate functions such as AVG, SUM, COUNT in that they work with GROUP BY and HAVING predicates and collapse multiple records into a single record. If you are unfamiliar with the above terms - take a look at Summarizing data with SQL (Structured Query Language)

Extent

The extent function is an aggregate function that gives you the bounding box of a set of geometries. It is especially useful for determining the bounding rectangle of the area you are trying to map. For example if we wanted to find the bounding box of the boston area in NAD 83 feet, we would do something like this.


SELECT town, ST_Extent(ST_Transform(geom, 2249)) as the_extent 
FROM towns 
WHERE town = 'BOSTON' 
GROUP BY town

ST_Union

The ST_Union function is an aggregate function that takes a set of geometries and unions them together to create a single geometry field. In versions prior to PostGIS 1.2, this was called geomunion and the old name was completely removed in PostGIS 2.0. For our towns data, we may have multiple records per town. To get a single geometry that represents the total region of a town, we could use the geomunion function like the example below.


select town,  ST_Union(geom) as thegeom  from towns group by town;

It is important to note that while the above query will give you one record per town. Our original plain vanilla of


select town,  geom as thegeom  from towns;

will give you multiple records per town if multiple record exist per town.

Seeing Results Visually

To get a visual sense of what all these different queries look like, you can dump out the above outputs as an ESRI shape file using the pgsql2shp tool and view it using a shape viewer such as QuantumGIS or use the query plugin tools directly to output directly from the db. .


      pgsql2shp -f myshp -h myserver -u apguser -P apgpassword -g thegeom mygisdb "select town, geomunion(geom) as thegeom  from towns group by town"

One caveat: the shape dumper utility can only dump out fields of type geometry. So for example to dump out a bbox type such as what is returned by the extent function, you'll need to cast the output as a geometry something like


SELECT town, ST_SetSRID(ST_Extent(ST_Transform(geom, 2249))::geometry,2249) as theextent 
FROM towns 
WHERE town = 'BOSTON' GROUP BY town

Distance Queries

Getting Minimal Distance using Distance function

One common question asked for of a spatial database is how far one object is from another. PostGIS like similar spatial databases has a function called distance which will tell you the minimum distance between one geometry from another.

An example of such a use is below. The below query will tell you How far each zip in your zipcode table is from another zip 02109?. In this case we are comparing the field geom_meter which we have in NAD 83 State Plane meters. Distance is always measured in the metric of your spatial reference system.


      SELECT z.zipcode As zip1, z2.zipcode As zip2, ST_Distance(z.geom_meter,z2.geom_meter) As thedistance
FROM zipcode z, zipcode z2
WHERE z2.zipcode = '02109'

If the above geometries were polygons, then we would be getting the minimal distance from any point on each polygon to the other. For point geometries there is no distinction between a min distance and other distances.

Getting Average Distance using the Centroid function

If we are not dealing with just points, then there are all kinds of distances we could be asking for. For example If we wanted the average distance between 2 polygons, then we would want the distance between the centroids. An average distance compare would then look like.


      SELECT z.zipcode As zip1, z2.zipcode As zip2, ST_Distance(ST_Centroid(z.geom_meter),ST_Centroid(z2.geom_meter)) As averagedistance
FROM zipcode z, zipcode z2
WHERE z2.zipcode = '02109'

Getting a list of objects that are within X distance from another object

An often asked question is what zipcodes are within x away from my zip?. To do that one would think that simply doing a distance < x would suffice. This would suffice except it would be slow since it is not taking advantage of PostGIS indexes and also because if the geometries you are comparing are fairly complex, the search could be exhaustive. To write the query in the most efficient way, we would include the use of the Expand function. Like so SELECT z.zipcode FROM zipcode z, zipcode z2 WHERE z2.zipcode = '02109' AND ST_DWithin(z2.geom_meter, z.geom,3000)

The ST_DWithin takes the bounding box of a geometry and expands it out X in all directions and then does a more exhaustive distance check on the actual geometries. So in this case we would be expanding our 02109 zip geometry out 3000 meters.

It internally uses the && operator is the interacts operator that returns true if two geometries bounding boxes intersect and false if they do not. It is a much faster especially if you have indexes on your geometries because it utilizes the bounding box index.

Part 3: PostGIS Loading Data from Non-Spatial Sources

Often you will receive data in a non-spatial form such as comma delimited data with latitude and longitude fields. To take full advantage of PostGIS spatial abilities, you will want to create geometry fields in your new table and update that field using the longitude latitude fields you have available.

General Note: All the command statements that follow should be run from the PgAdminIII Tools - Query Tool or any other PostGreSQL Administrative tool you have available. If you are a command line freak - you can use the psql command line tool packaged with PostGreSQL.

Getting the data

For this exercise, we will use US zip code tabulation areas instead of just Boston data. The techniques here will apply to any data you get actually.

First step is to download the data from US Census. http://www.census.gov/geo/www/gazetteer/places2k.html

Importing the Data into PostGreSQL

PostgreSQL comes with a COPY function that allows you to import data from a delimited text file. Since the ZCTAs data is provided in fixed-width format, we can't import it easily without first converting it to a delimited such as the default tab-delimited format that COPY works with. Similarly for data in other formats such as DBF, you'll either want to convert it to delimited using tools such as excel, use a third party tool that will import from one format to another, or one of my favorite tools Microsoft Access that allows you to link any tables or do a straight import and export to any ODBC compliant database such as PostgreSQL.

For fixed width data such as ZCTA, we demonstrate a more generic way of importing fixed with data into PostgreSQL that doesn't require any additional tools beyond what is packaged with PostgreSQL.

Create the table to import to

First you will need to create the table in Postgres. You want to make sure the order of the fields is in the same order as the data you are importing.


CREATE TABLE zctas
(
  state char(2),
  zcta char(5),
  junk varchar(100),
  population_tot int8,
  housing_tot int8,
  water_area_meter float8,
  land_area_meter float8,
  water_area_mile float8,
  land_area_mile float8,
  latitude float8,
  longitude float8
) 
WITHOUT OIDS;

Convert from Fixed-width to Tab-Delimited

For this part of the exercise, I'm going to use Microsoft Excel because it has a nice wizard for dealing with fixed-width and a lot of windows users have it already. If you open the zcta file in Excel, it should launch the Text Import Wizard. MS Access has a similarly nice wizard and can deal with files larger than excels 65000 some odd limitation. Note there are trillions of ways to do this step so I'm not going to bother going over the other ways. For non-MS Office users other office suites such as Open-Office probably have similar functionality.

Open the file in Excel.
Import Text Wizard should launch automatically and have Fixed-Width as an option
Look at the ZCTA table layout spec http://www.census.gov/geo/www/gazetteer/places2k.html#zcta and set your breakouts the same as specified. For the above I broke out the Name field further into first 5 for zcta and the rest for a junk field.
Next File->Save As ->Text (Tab delimited)(*.txt) -give it name of zcta5.tab
Copy the file to somewhere on your PostGreSQL server.

The COPY command

Now copy the data into the table using the COPY command. Note the Copy command works using the PostGreSQL service so the file location must be specified relative to the Server.


     COPY zctas FROM 'C:/Downloads/GISData/zcta5.tab';

Creating and Populating the Geometry Field

Create the Geometry Field

To create the Geometry field, use the AddGeometryColumn opengis function. This will add a geometry field to the specified table as well as adding a record to the geometry_columns meta table and creating useful constraints on the new field. A summary of the function can be found here http://postgis.refractions.net/docs/ch06.html#id2526109.


SELECT AddGeometryColumn( 'public', 'zctas', 'thepoint_lonlat', 4269, 'POINT', 2 );

The above code will create a geometry column named thepoint_longlat in the table zctas that validates to make sure the inputs are 2-dimensional points in SRID 4269 (NAD83 longlat).

Populate the Geometry Field using the Longitude and Latitude fields


UPDATE zctas
        SET thepoint_lonlat = ST_SetSRID(ST_Point( longitude, latitude),4269) )

The above code will generate a PostGIS point geometry object of spatial reference SRID 4269.

There are a couple of things I would like to point out that may not be apparently clear to people not familiar with PostGreSQL or PostGis

You can't just put any arbitrary SRID in there and expect the system to magically transform to that. The SRID you specify has to be the reference system that your data is in.

Transforming to Another spatial reference system

The above is great if you want your geometry in longlat spatial reference system. In many cases, longlat is not terribly useful. For example if you want to do distance queries with your data, you don't want your distance returned back in longlat. You want it in a metric that you normally measure things in.

In the code below, we will create a new geometry field that holds points in the WGS 84 North Meter reference system and then updates that field accordingly.


SELECT AddGeometryColumn( 'public', 'zctas', 'thepoint_meter', 32661, 'POINT', 2 );

UPDATE zctas SET thepoint_meter = transform(setsrid(makepoint(longitude, latitude),4269), 32661);

Please note in earlier versions of this article I had suggested doing the above with


UPDATE zctas
SET thepoint_meter = transform(PointFromText('POINT(' || longitude || ' ' || latitude || ')',4269),32661) ;

On further experimentation, it turns out that PointFromText is perhaps the slowest way of doing this in Postgis. Using a combination of ST_Setsrid and ST_Point is on the magnitude of 7 to 10 times faster at least for versions of PostGIS 1.2 and above. ST_GeomFromText comes in second (replace ST_PointFromText with ST_GeomFromText) at about 3 to 1.5 times slower than ST_SetSRID ST_Point. See about PointFromText on why PointFromText is probably slower. In ST_GeomFromText, there appears to be some caching effects so on first run with similar datasets ST_GeomFromText starts out about 3-4 times slower than the ST_makepoint (ST_Point) way and then catches up to 1.5 times slower. This I tested on a dataset of about 150,000 records and all took - 26 secs for ST_PointFromText fairly consistently, 10.7 secs for first run of GeomFromText then 4.1 secs for each consecutive run, 3.5 secs fairly consistently for setsrid,makepoint on a dual Xeon 2.8 Ghz, Windows 2003 32-bit with 2 gig RAM).

Index your spatial fields

One of the number one reasons for poor query performance is lack of attention to indexes. Putting in an index can make as much as a 100 fold difference in query speed depending on how many records you have in the table. For large updates and imports, you should put your indexes in after the load, because while indexes help query speed, updates against indexed fields can be very slow because they need to create index records for the updated/inserted data. In the below, we will be putting in GIST indexes against our spatial fields.


CREATE INDEX idx_zctas_thepoint_lonlat ON zctas
  USING GIST (thepoint_lonlat);

CREATE INDEX idx_zctas_thepoint_meter ON zctas
  USING GIST (thepoint_meter);

ALTER TABLE zctas ALTER COLUMN thepoint_meter SET NOT NULL;
CLUSTER idx_zctas_thepoint_meter ON zctas;

VACUUM ANALYZE zctas;

In the above after we create the indexes, we put in a constraint to not allow nulls in the thepoint_meter field. The not null constraint is required for clustering since as of now, clustering is not allowed on gist indexes that have null values. Next we cluster on this index. Clustering basically physically reorders the table in the order of the index. In general spatial queries are much slower than attribute based queries, so if you do a fair amount of spatial queries especially bounding box queries, you get a significant gain.

Please note: (for those familiar with Cluster in SQL Server - Cluster in PostgreSQL pretty much means the same thing, except in PostgreSQL - at least for versions below 8.3 (not sure about 8.3 and above), if you cluster and then add new data or update data, the system does not ensure that your new or updated data is ordered according to the cluster as it does in SQL Server. The upside of this is that there is no additional penalty in production mode with clustering as you have in SQL Server. The downside is you must create a job of some sort to maintain the cluster for tables that are constantly updated.

In the above we vacuum analyze the table to insure that index statistics are updated for our table.

PLR Part 1: Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot's Guide

R is both a language as well as an environment for doing statistical analysis. R is available as Free Software under the GPL. For those familiar with environments such as S, MatLab, and SAS - R serves the same purpose. It has powerful constructs for manipulating arrays, packages for importing data from various datasources such as relational databases, csv, dbf files and spreadsheets. In addition it has an environment for doing graphical plotting and outputting the results to both screen, printer and file.

For more information on R check out R Project for Statistical Computing.

What is PL/R?

PL/R is a PostgreSQL language extension that allows you to write PostgreSQL functions and aggregate functions in the R statistical computing language.

With the R-language you can write such things as aggregate function for median which doesn't exist natively in PostgreSQL and exists only in a few relational databases natively (e.g. Oracle) I can think of. Even in Oracle the function didn't appear until version 10.

Another popular use of R is for doing Voronoi diagrams using the R Delaunay Triangulation and Dirichlet (Voronoi) Tesselation (deldir) Comprehensive R Archive Network (CRAN) package. When you combine this with PostGIS you have an extremely powerful environment for doing such things as nearest neighbor searches and facility planning.

In the past, PL/R was only supported on PostgreSQL Unix/Linux/Mac OSX environments. Recently that has changed and now PLR can be run on PostgreSQL windows installs. For most of this exercise we will focus on the Windows installs, but will provide links for instructions on Linux/Unix/Mac OSX installs.

Installing R and PL/R

In order to use PLR, you must first have the R-Language environment installed on the server you have PostgreSQL on. In the next couple of sections, we'll provide step by step instructions.

Installing PostgreSQL and PostGIS

It goes without saying. If you don't have PostgreSQL already - please install it and preferably with PostGIS support. Checkout Getting started with PostGIS

Installing R

Next install R-Language:
1. Pick a CRAN Mirror from http://cran.r-project.org/mirrors.html
2. In Download and Install R section - pick your OS from the Precompiled binary section. In my case I am picking Windows (95 and later). Note there are binary installs for Linux, MacOS X and Windows.
3. If you are given a choice between base and contrib. Choose base. This will give you an install containing the base R packages. Once you are up and running with R, you can get additional packages by using the builit in package installer in R or downloading from the web which we will do later.
4. Run the install package. As of this writing the latest version of R is 2.10.1. The windows install file is named R-2.10.1-win32.exe
Once you have installed the package - open up the RGUI. NOTE: For windows users - this is located on Start menu - Start -> Programs - >R -> R. 2.10.1. If for some reason you don't find it on the start menu - it should be located at "C:\Program Files\R\R-2.10.1\bin\Rgui.exe". If you are on a 64-bit system this will be in C:\Program Files (x86)\R\R-2.10.1
Run the following command at the R GUI Console.
update.packages()
Running the above command should popup a dialog requesting for a CRAN MIRROR - pick one closest to you and then click OK.
A sequence of prompts will then follow requesting if you would like to replace existing packages. Go ahead and type y to each one. After that you will be running the latest version of the currently installed packages.

Installing PL/R

Now that you have both PostgreSQL and R installed, you are now ready to install PLR procedural language for PostgreSQL.

Go to http://www.joeconway.com/plr/
For non-Windows users, follow the instructions here http://www.joeconway.com/plr/doc/plr-install.html.

For Windows users:
1. download the installation file from step 6 of http://www.joeconway.com/web/guest/pl/r
2. As of this writing, there is no install setup for PostgreSQL 8.3/8.4/9* for windows. So what you need to do is copy the plr.dll into your PostgreSQL/(8.3/8.4/9.1/9.2/9.3)/lib folder. If you are installing on PostgreSQL 9.1+, make share to copy the .control, .sql files to share/extension folder.
3. Set the enviroment variable (you get here by going to Control Panel -> System ->Advanced ->
  Environment Variables
4. Add an R_HOME system variable and the R_HOME location of your R install. If you are on a 64-bit system and running 32-bit - it will be installed in Program Files (x86). On 32-bit (or a 64-bit running 64-bit install it will be installed in Program Files.
  If you are running R version 2.12 or above on Windows, the R bin folder has changed. Instead of bin it's bin\i386 or bin\x64. Also if you install the newer version, you'll need to use the binaries and manually register the paths and R_HOME yourself since the installer will not install. You can still use the plr.dll etc. See our other Quick Intro to PL/R for more details and examples.
5. Edit Path system variable and add the R bin folder to the end of it. Do not remove existing ones, just add this to the end
6. Restart your PostgreSQL service from control panel -> Services. On rare circumstances, you may need to restart the computer for changes to take effect.

Loading PL/R functionality into a database

In order to start using PL/R in a database, you need to load the help functions in the database. To do so do the following.

Using PgAdmin III - select the database you want to enable with PL/R and then click the SQL icon to get to the query window.
For users running PostgreSQL 9.1+, install by typing in SQL window:

CREATE EXTENSION plr;

If you are running on PostgreSLQ 9.0 or lower you have to install using the plr.sql file. Choose -> File -> Open -> path/to/PostgreSQL/8.4/contrib/plr.sql (NOTE: on Windows the default location is C:\Program Files\PostgreSQL\8.4\contrib\plr.sql
Click the Green arrow to execute

Testing out PL/R

Next run the following commands from PgAdminIII or psql to test out R

SELECT * FROM plr_environ();
SELECT load_r_typenames();
SELECT * FROM r_typenames();
SELECT plr_array_accum('{23,35}', 42);

Next try to create a helper function (this was copied from (http://www.joeconway.com/plr/doc/plr-pgsql-support-funcs.html) - and test with the following


CREATE OR REPLACE FUNCTION plr_array (text, text)
RETURNS text[]
AS '$libdir/plr','plr_array'
LANGUAGE 'C' WITH (isstrict);

select plr_array('hello','world');

Using R In PostgreSQL

Creating Median Function in PostgreSQL using R

Below is a link creating a median aggregate function. This basically creates a stub aggregate function that calls the median function in R. http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html NOTE: I ran into a problem here installing median from the plr-aggregate-funcs via PgAdmin. Gave R-Parse error when trying to use the function. I had to install median function by removing all the carriage returns (\r\n) so put the whole median function body in single line like below to be safe. Evidentally when copying from IE - IE puts in carriage returns instead of unix line breaks. When creating PL/R functions make sure to use Unix line breaks instead of windows carriage returns by using an editor such as Notepad++ that will allow you to specify unix line breaks.

create or replace function r_median(_float8) 
	returns float as 'median(arg1)' language 'plr';

CREATE AGGREGATE median (
  sfunc = plr_array_accum,
  basetype = float8,
  stype = _float8,
  finalfunc = r_median
);

create table foo(f0 int, f1 text, f2 float8);
insert into foo values(1,'cat1',1.21);
insert into foo values(2,'cat1',1.24);
insert into foo values(3,'cat1',1.18);
insert into foo values(4,'cat1',1.26);
insert into foo values(5,'cat1',1.15);
insert into foo values(6,'cat2',1.15);
insert into foo values(7,'cat2',1.26);
insert into foo values(8,'cat2',1.32);
insert into foo values(9,'cat2',1.30);

select f1, median(f2) from foo group by f1 order by f1;

In the next part of this series, we will cover using PL/R in conjunction with PostGIS.

PLR Part 2: PL/R and PostGIS

PL/R and PostGIS

In this tutorial we will explore using PostGIS and PL/R together. Some examples we will quickly run thru.

Median Function in conjunction with PostGIS
Voronoi Diagrams

If you missed part one of our series and are not familiar with R or PL/R, please read
http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgis_tut01
Getting started with PostGIS and

http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgresql_plr_tut01 PLR Part 1: Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot's Guide

Next create a new PostgreSQL database using the template_postgis database as template and load in the PLR support functions as described in the above article. For this particular exercise, we will assume the database is called testplrpostgis.

Loading Some Test GIS Data

We will be working with census data from MassGIS. Click the following "Download This Layer" - census2000blockgroups_poly.exe, from this link http://www.mass.gov/mgis/cen2000_blockgroups.htm
Get the Massachusetts Town Polygon geometries from this link
http://www.mass.gov/mgis/townssurvey.htm Townsurvey points - ESRI Shapefiles
Get the Massachusetts Census Blocks from this link
http://www.mass.gov/mgis/cen2000_blocks.htm

Get Massachusetts Schools Layer from here
http://www.mass.gov/mgis/schools.htm

Extract the files into a folder (running the exe,right-click extract, and for Linux/Unix/MacOS just extracting with unzip or stuffit or some other similar package)

Now load the shape files into your test database using shp2pgsql by doing the following at your OS commandline. Note for non-windows users or if you installed PostgreSQL in a non-standard directory - change the path to yours - default being "C:\Program Files\PostgreSQL\8.2\bin\". Also replace items in italics with those that match your environment.
"C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\census2000blockgroups_poly cens2000bgpoly > cens2000bgpoly.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f cens2000bgpoly.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_cens2000bgpoly_the_geom ON cens2000bgpoly USING GIST(the_geom)" "C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\TOWNSSURVEY_POLY towns > towns.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f towns.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_towns_the_geom ON towns USING GIST(the_geom)" "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_towns_town ON towns USING btree(town)" "C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\SCHOOLS_PT schools > schools.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f schools.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_schools_the_geom ON towns USING GIST(the_geom)" "C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\census2000blocks_poly cens2000blocks > cens2000blocks.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f cens2000blocks.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_cens2000blocks_the_geom ON cens2000blocks USING GIST(the_geom)" "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "vacuum analyze"

Using Median function in conjunction with POSTGIS constructs

In this example - we will find out the median population of a town census blockgroup, total population of all block groups within the town, and average population of a town census blockgroup, and count of block groups for each town that is within each towns boundary for all Massachusetts towns. Note: we have to do a geomunion here because the towns layer has multiple records per town. Unioning will consolidate so we have a single multipolygon for each town.

The below example assumes you installed the PL/R aggregate median function as described in part one of this tutorial.


SELECT t.town, count(bg.gid) as total_bg, avg(bg.total_pop) as avg_bg_pop, 
median(bg.total_pop) as median_pop, sum(bg.total_pop) as totalpop
FROM cens2000bgpoly bg,
(SELECT towns.town, geomunion(the_geom) as the_geom 
     FROM towns GROUP BY towns.town) t 
WHERE bg.the_geom && t.the_geom AND within(bg.the_geom, t.the_geom) 
GROUP BY t.town 
ORDER BY t.town

The above took about 37859ms ms on my windows 2003 server and 71922 ms on my Windows XP pc, but leaving out the median call did not speed it up much so bottleneck is in the geometry joining part.
If the subquery was a commonly used construct, then we would materialize it as a table and index it appropriately.


CREATE TABLE townsingle AS 
SELECT MAX(gid) as gid, towns.town, geomunion(the_geom) as the_geom 
     FROM towns GROUP BY towns.town;

CREATE INDEX idx_townsingle_the_geom ON townsingle USING GIST(the_geom);
CREATE UNIQUE INDEX idx_townsingle_gid ON townsingle USING btree(gid); 
CREATE UNIQUE INDEX idx_townsingle_town ON townsingle USING btree(town);

Installing R Packages

Before we continue, we need to expand our R environment by installing deldir package which contains voronoi functionality. In order to do so, please follow these steps.

We will be using the deldir package - I listed some other packages here because they looked interesting, but we won't have time to explore those in too much detail in this exercise

foreign - Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, dBase http://cran.us.r-project.org/doc/packages/foreign.pdf
GeoXp - Interactive exploratory spatial data analysis http://cran.us.r-project.org/doc/packages/GeoXp.pdf
deldir - Delaunay Triangulation and Dirichlet (Voronoi) Tesselation. http://cran.us.r-project.org/doc/packages/deldir.pdf
gridBase? - Integration of base and grid graphics
gstat? - geostatistical modelling, prediction and simulation http://www.gstat.org/

Open up your RGUI console
At the console prompt type the following
chooseCRANmirror()
This command should pop open a dialog requesting you to pick a mirror. Pick one closest to you and click OK.
At the console prompt type
utils:::menuInstallPkgs()
This command should pop open a dialog listing all the packages available in your selected CRAN mirror.
If the above doesn't work try
available.packages()
to get a command line listing of packages available at your chosen CRAN mirror.
Hold the control-key down and select the packages listed above and then click ok
For a list of what other packages are available and what they do check out http://cran.us.r-project.org/src/contrib/PACKAGES.html
Alternatively for example if you did available.packages(), you can install the packages individually by doing for example< br/> install.packages('deldir')
where in this case deldir is the name of the package we wish to install.

Exploring Help in R

R offers some neat features as far as help goes that you can access straight from the R GUI console. Below are some useful commands to find out more about a package or drill in at examples.

List functions and summary about a package - help(package=<package name>) Example: help(package=deldir)
Search all help text for a particular phrase - help.search('<phrase>') Example: help.search('median')
See list of demos available in all installed packages - Example: demo(package = .packages(all.available = TRUE))
Load a library and then run a demo in that library
library(packagename)
demo(<demo name>) -
Example: libary(gstat) demo(block)

Delaunay Triangulation and Dirichlet Package

In this example we will use a voronoi PL/R implementation provided by Mike Leahy in the PostGIS users newsgroup - http://postgis.refractions.net/pipermail/postgis-users/2007-June/016102.html

The function and voronoi type code is repeated below: Run the below in PgAdmin or psql. Note for Windows users - if you are copying this - make sure to paste in Notepad and save the file and then open in PgAdmin or psql. This will strip the carriage returns and just leave line breaks. R parser chokes on carriage returns.

--This function uses the deldir library in R to generate voronoi polygons for an input set of points in a PostGIS table.

--Requirements: 
--  R-2.5.0 with deldir-0.0-5 installed
--  PostgreSQL-8.2.x with PostGIS-1.x and PL/R-8.2.0.4 installed

--Usage: select * from voronoi('table','point-column','id-column');

--Where:
--  'table' is the table or query containing the points to be used for generating the voronoi polygons,
--  'point-column' is a single 'POINT' PostGIS geometry type (each point must be unique)
--  'id-column' is a unique identifying integer for each of the originating points (e.g., 'gid')

--Output: returns a recordset of the custom type 'voronoi', which contains the id of the
--  originating point, and a polygon geometry

create type voronoi as (id integer, polygon geometry);

create or replace function voronoi(text,text,text) returns setof voronoi as '
    library(deldir)
  
    # select the point x/y coordinates into a data frame...
    points <- pg.spi.exec(sprintf("select x(%2$s) as x, y(%2$s) as y from %1$s;",arg1,arg2))
    
    # calculate an approprate buffer distance (~10%):
    buffer = ((abs(max(points$x)-min(points$x))+abs(max(points$y)-min(points$y)))/2)*(0.10)
    
    # get EWKB for the overall buffer of the convex hull for all points:
    buffer <- pg.spi.exec(sprintf("select buffer(convexhull(geomunion(%2$s)),%3$.6f) as ewkb from %1$s;",arg1,arg2,buffer))
    
    # the following use of deldir uses high precision and digits to prevent slivers between the output polygons, and uses 
    # a relatively large bounding box with four dummy points included to ensure that points in the peripheral areas of the  
    # dataset are appropriately enveloped by their corresponding polygons:
    voro = deldir(points$x, points$y, digits=22, frac=0.00000000000000000000000001,list(ndx=2,ndy=2), rw=c(min(points$x)-abs(min(points$x)-max(points$x)), max(points$x)+abs(min(points$x)-max(points$x)), min(points$y)-abs(min(points$y)-max(points$y)), max(points$y)+abs(min(points$y)-max(points$y))))
    tiles = tile.list(voro)
    poly = array()
    id = array()
    p = 1
    for (i in 1:length(tiles)) {
        tile = tiles[[i]]
        
        curpoly = "POLYGON(("
        
        for (j in 1:length(tile$x)) {
             curpoly = sprintf("%s %.6f %.6f,",curpoly,tile$x[[j]],tile$y[[j]])
        }
        curpoly = sprintf("%s %.6f %.6f))",curpoly,tile$x[[1]],tile$y[[1]])
        
        # this bit will find the original point that corresponds to the current polygon, along with its id and the SRID used for the 
        # point geometry (presumably this is the same for all points)...this will also filter out the extra polygons created for the 
        # four dummy points, as they will not return a result from this query:
        ipoint <- pg.spi.exec(sprintf("select %3$s as id, intersection(''SRID=''||srid(%2$s)||'';%4$s'',''%5$s'') as polygon from %1$s where intersects(%2$s,''SRID=''||srid(%2$s)||'';%4$s'');",arg1,arg2,arg3,curpoly,buffer$ewkb[1]))
        if (length(ipoint) > 0)
        {
            poly[[p]] <- ipoint$polygon[1]
            id[[p]] <- ipoint$id[1]
            p = (p + 1)
        }
    }
    return(data.frame(id,poly))
' language 'plr';

After you have installed the above Voronoi function. Test it out with the following code

CREATE TABLE schools_voroni(gid int);

SELECT AddGeometryColumn('', 'schools_voroni', 'the_geom', 26986, 'POLYGON',2);

INSERT INTO schools_voroni(gid, the_geom)
    SELECT v.id, v.polygon
        FROM voronoi('(SELECT gid, the_geom 
        FROM schools 
            WHERE town = ''BOSTON'' AND grades LIKE ''%3%'' AND type = ''PUB'') as bs', 'bs.the_geom', 'bs.gid') As v

ALTER TABLE schools_voroni
  ADD CONSTRAINT pk_schools_voroni PRIMARY KEY(gid);

Below is a map of the public elementary schools in Boston overlayed on the Voronoi Polygons of those schools

At this moment, some people may be thinking - Nice polygons, but what is this supposed to tell me about Boston public elementary schools? Well the short simplistic answer is that the Voronoi polygon of a school tells you theoretically what area a school serves, all else being equal. To find out more about this interesting topic, checkout Voronoi links and tidbits. For example we assume people prefer to send their kids to schools near them. With this information and using census blocks, we can theoretically figure out which areas are underserved based on population of elementary school kids in that voroni polygon area, relative to the number of schools serving that area. So a query something like (NOTE I had only population not elementary school population so this is theoretically flawed already.)

 CREATE TABLE vorhighdens As 
SELECT v.thegid, sum(pop_2000)/v.schcount As vpop, v.the_geom
FROM (SELECT MAX(gid) as thegid, COUNT(gid) As schcount, the_geom, area(the_geom) as the_area
FROM schools_voroni 
GROUP BY the_geom, area(the_geom)) As v 
    INNER JOIN cens2000blocks cb ON v.the_geom && cb.the_geom AND Within(cb.the_geom, v.the_geom)
GROUP BY v.the_geom, v.schcount, v.thegid
ORDER BY vpop DESC LIMIT 5

The darkened areas are where based on my simple model, more elementary schools need to be built. Please NOTE this model is gravely flawed for many reasons

I have not taken into consideration the size of each school (assumed all are the same size)
My model is using overall population which in certain areas - like where I live - very few kids live here since its filled with yuppie executives, DINKS, and retired wealthy folk so the population here is probably very skewed.
I assumed census blocks fit nicely in a voronoi zone which may or may not be true. In theory this is probably not too bad of an assumption
I assumed all elementary schools are created equal. For example some schools have special needs programs that others do not.

Hopefully there is enough here to demonstrate how a real model would work.

PLR Part 3: PL/R and Geospatial Data Abstraction Library (GDAL) RGDAL

What is GDAL and RGDAL?

GDAL stands for Geospatial Data Abstraction Library and is a popular open source GIS library originally developed and maintained by Frank Warmerdam with contributions from many. It is used in GIS for processing various types of GIS datasources - both Vector and Raster data. The GDAL part generally refers to RASTER functionality while the complementary OGR refers to Vector data. The RASTER functionality is very useful in analysis such as analysing Weather satellite raster data and Digital Elevation Models (DEMS) or converting between RASTER formats.

RGDAL is an R-statistical package that makes these library functions available from the R statistical environment (both RASTER and Vector functions) - thus making them available within PostgreSQL PL-R environment. In this exercise we shall install and play with this library first from R and then from within PostgreSQL.

Installing RGDAL

For this exercise, we assume you have already installed PostgreSQL and PL/R. If you haven’t already, read our PLR Part 1: Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot's Guide - http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgresql_plr_tut01

RGDAL relies on the package sp which contains the SpatialGridDataFrame class among other things, so you'll need to install that first. To install sp do the following:

Launch your R GUI environment
chooseCRANmirror() - not all CRAN Mirrors have the sp package so try the one closest to you and if that one doesn't have it, then try another.
utils:::menuInstallPkgs()
SELECT sp from list of options

Download the RGDAL library and manual from here - http://cran.r-project.org/web/packages/rgdal/index.html.

NOTE: For Windows users, you can use the binary provided if you do not want to compile yourself.
Once compiled or binary unzipped - copy the resutling rgdal folder into library folder of your R install.

Testing out RGDAL in R Environment

Launch RGUI. A lot of these excercises are similar to ones documented in the RGDAL help file. Now lets test drive the library. Locate a small image on one of your drives. E.g. for windows users, you can copy an image to root of C of server or something of that sort for linux usual /usr/local/whatever will work fine. Then do something like.


library(rgdal)
TestLogo <- readGDAL("C:/test.jpg")
TestLogo

The above should read the jpeg into TestLogo variable and typing TestLogo will spit out all sorts of details about the Jpeg file. That will look something like below:


test.jpg has GDAL driver JPEG 
and has 9 rows and 9 columns
Closing GDAL dataset handle 0x048f1230...  destroyed ... done.
> TestLogo
Object of class SpatialGridDataFrame
Object of class SpatialGrid
Grid topology:
  cellcentre.offset cellsize cells.dim
x               0.5        1         9
y               0.5        1         9
SpatialPoints:
        x   y
 [1,] 0.5 8.5
 [2,] 1.5 8.5
 [3,] 2.5 8.5
 [4,] 3.5 8.5
 [5,] 4.5 8.5
 [6,] 5.5 8.5
 [7,] 6.5 8.5
 [8,] 7.5 8.5
:
:
Coordinate Reference System (CRS) arguments: NA 

Data summary:
     band1            band2           band3      
 Min.   : 40.00   Min.   : 59.0   Min.   :117.0  
 1st Qu.: 51.00   1st Qu.: 71.0   1st Qu.:124.0  
 Median : 52.00   Median : 72.0   Median :125.0  
 Mean   : 86.99   Mean   :103.7   Mean   :147.6  
 3rd Qu.: 61.00   3rd Qu.: 79.0   3rd Qu.:132.0  
 Max.   :255.00   Max.   :255.0   Max.   :255.0

For cleanup and destroy TestLogo variable

GDAL.close(TestLogo)

Now lets try with some real Geospatial data.

Download Digital Elevation Model (DEM) International sample from http://www.mapmart.com/DEM/DEM.htm and extract files into a folder.

Displaying data from RASTER file

The below R code will open up the CustomArea file and start at position Y=8, x=6 of the file and show 5 rows and 3 columns on the R graphics device.


custarea <- GDAL.open("/path/to/CustomAreaSRTM90.dem")
displayDataset(custarea,offset=c(6,8), region.dim=c(5,3), reduction=1, band=1)

To show the whole file you would do

displayDataset(custarea,offset=c(0,0), region.dim=dim(custarea), reduction=1, band=1)

If you had multiple color bands and you wanted to display all bands, then you would do

displayDataset(custarea,offset=c(0,0), region.dim=dim(custarea), reduction=1, band=1:3)

When we are done with the file, we should close it to clean up loose-ends

GDAL.close(custarea)

Reading Data From Raster file

Here are a couple of useful exercises that you would commonly do on a DEM or GeoTiff

Load into a spatial dataframe table
custdf <-open.SpatialGDAL("/path/to/CustomAreaSRTM90.dem")
You should see output such as this:


CustomAreaSRTM90.dem has GDAL driver USGSDEM 
and has 3525 rows and 6275 columns

How to read the projection information

This is slower as it reads the whole file into memory
custarea <- readGDAL("/path/to/CustomAreaSRTM90.dem")
proj4string(custarea)

This is faster as it just opens the file and , but file needs to be closed

custarea <- open.SpatialGDAL("/path/to/CustomAreaSRTM90.dem", read.only=TRUE)
proj4string(custarea)
close(custarea)

How to read bounding box


custarea <- open.SpatialGDAL("/path/to/CustomAreaSRTM90.dem", read.only=TRUE)
bbox(custarea)
close(custarea)

How to read the projection information and other meta data

GDALinfo("/path/to/CustomAreaSRTM90.dem")
cust_summary <- GDALinfo("/path/to/CustomAreaSRTM90.dem")
cust_summary["rows"]
cust_summary["ll.x"]
cust_summary["ll.y"]

Return a RasterTable containing lat long and band data


custarea <- GDAL.open("/path/to/CustomAreaSRTM90.dem")
custarea_gt <- getRasterTable(custarea, band = 1, offset = c(0,0), region.dim = c(10,10))
GDAL.close(custarea)
custarea_gt

The result of the above looks something like this

            x        y band1
1   -127.4554 50.23167   438
2   -127.4546 50.23167   410
3   -127.4537 50.23167   382
4   -127.4529 50.23167   356
5   -127.4521 50.23167   329
6   -127.4512 50.23167   306
7   -127.4504 50.23167   288
8   -127.4496 50.23167   272
9   -127.4487 50.23167   262
10  -127.4479 50.23167   257
11  -127.4554 50.23084   399
12  -127.4546 50.23084   378
13  -127.4537 50.23084   355
14  -127.4529 50.23084   333
15  -127.4521 50.23084   310
16  -127.4512 50.23084   290
:
:

custarea_gt["band1"]

Will give you just the band1 column which gives the elevation levels.

Using RGDAL from PostgreSQL

Now switch over to PostgreSQL and use PgAdmin or psql to write the function. Below is a simple PL/R function that loads the R library and reads the meta data from a file passed in.


CREATE or REPLACE FUNCTION getTileInfo(atilefile varchar, element varchar) returns text AS
$$
    library(rgdal)
    tile_summary <-GDALinfo(atilefile)
    return(tile_summary[element])

$$
language 'plr';

Here is how we can use the above function from within PostgreSQL that returns a bit of info from the file we pass in as argument and the element piece we want. Keep in mind the path is relative to the PostgreSQL server since R is installed on the server and that this function will work for any GDAL supported file format such as DEM or GeoTiff or JPEG etc..

SELECT getTileInfo('/path/to/CustomAreaSRTM90.dem', 'll.x') As x_pos, getTileInfo('/path/to/CustomAreaSRTM90.dem', 'll.y') As y_pos, 
getTileInfo('/path/to/CustomAreaSRTM90.dem', 'rows') As num_rows, getTileInfo('/path/to/CustomAreaSRTM90.dem', 'columns') As num_cols

Part 1: Getting Started with SpatiaLite: An almost Idiot's Guide

What is SpatiaLite?

SpatiaLite is an SQLite database engine with Spatial functions added. You can think of it as a spatial extender for SQLite database engine which is similar in concept to what PostGIS does for the PostgreSQL Object-Relational Database.

For those unfamiliar with SQLite -- it is this little adorable single file relational database system that runs equally well on Windows, Linux, Unix, Mac and is easily embeddable in larger apps. Is Free and Open Soure -- Public domain so no restrictions for commercial and embedding in other applications and I think is superb as a mini-replicator. We use it in some of our projects for synching data back and forth between a mainstream server database such as PostgreSQL, SQL Server, MySQL and a client offline database that needs to synch differential data back and forth to the mother ship.

What is special about it is as follows:

all the tables fit into a single file so easy for transport
It has this cute little dblink like feature that allows you to mount external files as virtual tables and join with your SQLite data.
PHP has built-in drivers for SQLite it, .NET has drivers for it you can dump in your bin folder and get going, SharpMap (has a driver for the SpatiaLite version in the works) - thanks to Bill Dollins similar to what it has for PostGIS. (note the drivers unlike other databases contain the engine as well). This is very important to understand since you want your driver to have the SpatiaLite enhancements.
The SQLite core engine is a relatively tiny application that provides you more or less standard ANSI-SQL 92 syntax so with a basic abstraction layer, you can treat this as you would any other relational database.
My favorite - it even supports SQL views
It supports triggers
Has basic ACID/Transactions -- BEGIN/COMMIT;
Did I mention its cute, but don't let its cuteness deceive you of its potential. Seems to be lacking an animal mascot that embodies its cuteness.

SpatiaLite sweetens this little database by allowing you to store geometries and query them with spatial functions similar to what you would find in PostgreSQL/PostGIS, Microsoft SQL Server 2008, MySQL, IBM DBII, Oracle Locator/Spatial. In terms of the model it uses, it seems closest in syntax to PostGIS and in fact modeled after PostGIS and also piggy-backs on GEOS and Proj.4 like PostGIS. In fact the functionality you will see is pretty much the functions you get in PostGIS including their names minus the aggregates and some other things, except you just need to strip off the ST_ and add in a bbox filter, so PostGIS users should feel very much at home.

Some key features of SpatiaLite

OGC functions similar to what you find implemented in PostGIS/GEOS. This uses GEOS.
R-Tree Spatial Index if you use SQLite 3.6+, and rudimentary MBR index for lower SQLite
Rudimentary support for curves
Lots of MBR functions (Minimum bounding rectangle), similar to what MySQL 5.1 and below has, but also has real functions as described above for more exact calculations
Unfortunately the current release seems to lack spatial aggregate functions that PostGIS has such as Extent, Union, collect.
Spatial Transformation support using Proj.4 that hmm even SQL Server 2008 lacks

As far as licensing goes, SpatiaLite does not have the same License as SQLite. It is under 3 licenses Mozilla Public and GPLv3

In this little exercise, we shall get you up and running quickly with this cute cute database engine.

In terms of tutorials -- here is a recent one by MapFish people -- SpatiaLite in 5 minutes. This article will be similar except we shall go thru similar exercises we did for SQL Server 2008 and PostGIS as a contrast and compare.

Installing SpatiaLite

Installation is simple. -- just download and extract and run the GUI.

Download from the http://www.gaia-gis.it/spatialite/ : Download SpatiaLite GUI statically linked (choose your platform)

under Useful spatial scripts download http://www.gaia-gis.it/spatialite/binaries.html
Extract the spatialite-gui. Note you can use spatialite-gis to view and import tables, but the spatialite-gui gives you ability to do free SQL queries, but not as good viewing features, while the spatialite-gis can create a new database, import and can show you a whole map, but has limited filtering. We'll cover spatialite-gis graphical tool in the next tutorial.

Creating a spatial database

The GUI has imbedded in it the engine that runs the database similar in concept to the way Microsoft Access application has embedded the engine to control an MS Access MDB. So to get started, launch the GUI.

Launch the GUI
From menu choose --> files -> Create a new SQLiteDb
Call the file boston.sqlite
Choose the little icon with tooltip "Execute SQL Script" and choose init_spatialite-2.2.sql
Choose Latin if prompted for encoding
Right mouse-click on the database and hit refresh and you should see three tables. If you right click on the spatial_ref_sys table and click edit rows, Your screen should look like this:

Loading GIS Data Into the Database

Now we have a nice fully functional GIS database with no spatial data. So to do some neat stuff, we need to get some data to play with.

Get the Data

Download data from the MassGIS site.

NOTE: MAssGIS has restructured their site quite a bit. When we wrote this article it was on a file located at ftp://data.massgis.state.ma.us/pub/shape/state/towns.exe. We've updated this article to link to the newer comparable file

For this simple exercise just download Towns

Extract the file into some folder. We will only be using the _POLY files for this exercise.

Figure out SRID of the data

You will notice one of the files it extracts is called TOWNS_POLY.prj. A .prj is often included with ESRI shape files and tells you the projection of the data. We'll need to match this descriptive projection to an SRID (the id field of a spatial ref record in the spatial_ref_sys table) if we ever want to reproject our data.

Open up the .prj file in a text editor. You'll see something like NAD_1983_StatePlane_Massachusetts_Mainland_FIPS_2001 and UNIT["Meter",1.0]
In your SpatiaLite GUI From the top tool bar -> choose the little icon with tooltip search srid by name that looks like this and type in Massachusetts and generates and executes SQL on the query pad like this
```
SELECT *
				FROM spatial_ref_sys
					WHERE ref_sys_name LIKE '%Massachusetts%'
						ORDER BY srid 
```
. This will bring up about 10 records.
Note the srid of the closest match. In this case its 26986. NOTE: srid is an OpenGIS consortium term so you will hear the term used in context of most spatial databases. It is an OGC standard so you will see SRID mentioned a lot in other spatial databases, gis webservices and applications. Most of the common spatial reference systems have globally defined numbers. So 26986 always maps to NAD83_StatePlane_Massachusetts_Mainland_FIPS_2001 Meters. Most if not all MassGIS data is in this particular projection.

Loading the Data

The easiest data to load into SpatiaLite is to use the ESRI shape data loader packaged with the SpatiaLite gui.

Load Towns data

Open up the SpatiaLite gui if you don't already have it open
From the SpatiaLite gui --> File -> Load Shape file and browse for TOWNS_POLY.shp and type in 26986 for SRID. Type in the_geom for field name. Windows Latin1 is selected by default and is fine for Massachusetts data and most data sources. Type in towns for table name. Your screen should look like this. and then click okay. It should state 631 records created and you should see a new table called towns. Note that SQLite is not case sensitive (so similar in concept to SQL Server in normal state), so its okay to use mixed case.

There are a couple of things I would like to point out that the shape file loader auto-magically does for you.

It adds an entry to geometry_columns table -- if you type in -- SELECT * from geometry_columns; in the Query window and click the Execute SQL Statement icon to the right of the window, you'll see one record listed in the query result and that the type field says its a MULTIPOLYGON.
It adds triggers to the TOWNS_POLY table that prevent data that is not of right SRID or geometry from being added. If you expand the TOWNS_POLY table on the left - you'll see these. These serve the same purpose as the geometry contraints in PostGIS. If you show the source of the triggers, one of them looks something like this: So you see it looks back on the geometry_columns table to ensure what is in that table matches what someone tries to insert.
```
CREATE TRIGGER gtu_towns_the_geom BEFORE UPDATE ON towns
FOR EACH ROW BEGIN
SELECT RAISE(ROLLBACK,
 '''towns.the_geom'' violates Geometry constraint [geom-type not allowed]')
WHERE (SELECT type FROM geometry_columns
WHERE f_table_name = 'towns' AND f_geometry_column = 'the_geom'
AND (type = GeometryType(NEW.the_geom)
 OR type = GeometryAliasType(NEW.the_geom))) IS NULL;
END
```

Indexing the data

Table indexes are very important for speeding up the processing of most queries. There is also a downside to indexes and they are the following

Indexes slow down the updating of indexed fields.
Indexes take up space. You can think of an index as another table with bookmarks to the first similar to an index to a book.

Given the above, it is often times tricky to have a good balance. There are a couple general rules of thumb to go by that will help you a long way.

Never put indexes on fields that you will not use as part of a where condition or join condition.
Be cautious when putting index fields on heavily updated fields. For example if you have a field that is frequently updated and is frequently used for updating, you'll need to do benchmark tests to make sure the index does not cause more damage in update situations than it does for select query situations. In general if the number of records you are updating at any one time for a particular field is small, its safe to put in an index.
Corrollary to 2. For bulk uploads of a table - e.g. if you are loading a table from a shape, its best to put the indexes in place after the data load because if an index is in place, the system will be creating indexes as its loading which could slow things down considerably.
If you know a certain field is unique in a table, it is best to use a unique or primary index. The reason for this is that it tells the planner that once its found a match, there is no need to look for another. It also prevents someone from accidentally inserting a duplicate record as it will throw an error.
For spatial indexes -an R-Tree index. An R-Tree basically stores the bounding box of the geometry as the index. For large complex geometries unfortunately, this is not too terribly useful. SpatiaLite 3.6+ as mentioned has R-Tree indexes

The most common queries we will be doing on this query are spatial queries and queries by the town field. So we will create 2 indexes on these fields.

In the tree view that shows the tables, expand towns and right mouse click on the the_geom field and then choose Build Spatial Index. You will see the little icon change on the column once that is done.

The SpatiaLite GUI doesn't seem to have a right-click feature for other indexes besides spatial, so to create an index on the town field do the following:

CREATE INDEX idx_town ON towns(town);
vacuum towns;

Querying Data

Go back into SpatiaLite gui

Test out the following queries from the query tool: -- this is similar to the extent function we did in PostGIS but makes up for the fact that there is no extent function in SpatiaLite. The Rtree index is implemented as a virtual table in SQLite and each record has the bbox min,max settings. So to simulate extent, we simply join by the ROWID and pkid of the two tables.


			
SELECT MIN(idx.xmin) As bxmin,MIN(idx.ymin) As bymin, MAX(idx.xmax)  As bxmax,
			MAX(idx.ymax) As bymax
 FROM towns  As t INNER JOIN idx_towns_the_geom As idx
ON t.rowid = idx.pkid
WHERE t.town = 'BOSTON';


returns -- bxmin: 225473.6094	bymin: 886444.6250	bxmax: 252005.5781	bymax: 905284.0000


SELECT SUM(Area(the_geom)) FROM towns where town = 'BOSTON';
  returns 128251224.3644

--reproject to Massachusetts state plane feet
SELECT town, astext(centroid(transform(the_geom,2249)))
FROM towns
WHERE town > ''
ORDER BY town  LIMIT 2;  returns


ABINGTON	POINT(802961.762366 2868485.109604)
ACTON	POINT(672950.258529 3001540.967504);

--to see what indices its using
EXPLAIN QUERY PLAN
	SELECT SUM(Area(the_geom))
		FROM towns where town = 'BOSTON';

--or the painful steps
EXPLAIN
	SELECT SUM(Area(the_geom))
		FROM towns where town = 'BOSTON';

Example: --Here is an exercise we did in SQL Server 2008 --

Here we arbitrarily take the first point that defines a polygon in Boston and ask what town POLYGON/MULTIPOLYGON geometries are within 1 mile of this point and we also want to know the exact distances and results ordered by distance. The speed of this was surprisingly good and finished in under a second and returned 3 rows.



--I was hoping this would use a spatial index but it doesn't
SELECT t.town, Distance(ref.point1,t.the_geom)/0.3048 As dist_ft,
Distance(ref.point1, t.the_geom) As dist_m
FROM towns As t
INNER JOIN (
SELECT PointN(Boundary(the_geom),1) As point1
FROM towns WHERE town = 'BOSTON' LIMIT 1) As ref
ON MbrIntersects(BuildCircleMbr(X(ref.point1), Y(ref.point1),1609.344), t.the_geom)
WHERE Distance(ref.point1, t.the_geom) < 1609.344
ORDER BY Distance(ref.point1, t.the_geom);

--this seems to perform just as well  -- would need a larger set to do a real test
SELECT t.town, Distance(ref.point1,t.the_geom)/0.3048 As dist_ft,
Distance(ref.point1, t.the_geom) As dist_m
FROM towns As t
INNER JOIN (
SELECT PointN(Boundary(the_geom),1) As point1
FROM towns WHERE town = 'BOSTON' LIMIT 1) As ref
ON ( Distance(ref.point1, t.the_geom) < 1609.344)
ORDER BY Distance(ref.point1, t.the_geom);

Viewing the Data

If you are a GIS newbie, I highly recommend using Quantum GIS. The latest binary distributions of QuantumGIS (as of this writing QGIS 1.4+) now have the SpatiaLite driver built in.

You can also use the latest Spatialite-gis minimalist, which has viewer and shapefile importer packaged in. Latest version is 1.0 Alpha and binaries available for Windows, Linux, and Mac OSX.

OGR2OGR Cheatsheet

OGR Tools

The OGR toolkit is a subset of GDAL project. GDAL is available on many operating systems including Linux, Unix, Mac, and Windows. You can find binaries at GDAL Binaries. It is also included as part of the QGIS Desktop. GDAL comes packaged with several command line tools. The ones we find most useful are:

ogrinfo - inspects a GIS datasource and spits out summary data or detailed information about the layers, kinds of geometries found in the file.
ogr2ogr - this is a command line tool that converts one Ogr defined data source to another Ogr data source. Ogr supports many data formats here is a subset of the ones we commonly use: ESRI Shapefile, MapInfo Tab file, CSV, DBF, GML, KML, Interlis, SQLite, GeoPackage,SpatiaLite, ODBC, ESRI GeoDatabase (MDB format), ESRI GDB database, PostGIS/PostgreSQL, MySQL.

If you want to use GDAL on a desktop, the easiest way to get started is to install QGIS Desktop which has installers available for Linux, Windows, and Mac. If you are on a server, then you should install GDAL via your OS Packager. Windows users who don't want QGIS (or want a non-install required set of binaries) can use one of the GISInternals binaries which includes GDAL and MapServer.

For the rest of this article, we will assume you are using GDAL via QGIS desktop

These 2 command line tools can be accessed via QGIS shell menu of your QGIS install. To start using these tools. The below are instructions for windows, but are very similar for other OS>

InstallQGIS.
Launch the OSGeo4W Shell (it will be called something different on other OS or you can just launch your regular command line on other OS) - in windows this is found under Start->Programs->QGIS ..
From the shell - cd into your directory that has the data you want to convert

Getting more Help

If you want a comprehensive listing of options offered by ogr2ogr or ogrinfo, run the following at the FW Tools Shell.

ogr2ogr --help
ogrinfo --help

Conversions from MapInfo to Other formats

Conversion from MapInfo to ESRI Shape

ogr2ogr -f "ESRI Shapefile" mydata.shp mydata.tab

Conversion from MapInfo to PostGIS

ogr2ogr -f "PostgreSQL" PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" mytabfile.tab

Note: for the above, you can leave out the host if its localhost and user and password if you have your authentication set to trust.

Importing as a different table name

In the below example, we don't want OGR to create a table called mytable. We instead want to call the table something different like newtablename. To do so we use the nln option.

ogr2ogr -f "PostgreSQL" PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" mytabfile.tab -nln newtablename

When OGR guesses wrong or fails to guess

Sometimes OGR does not output the right projection, particularly with Units of Feet or data that has no projection info or the projection information can't be easily translated to your system. Sometimes OGR can't match the projection to one in your spatial_ref_sys table so creates a new entry in that table. In these cases you have to tell OGR what the output projection is. You do this with the -a_srs flag.

ogr2ogr -f "PostgreSQL" -a_srs "EPSG:2249" PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" mytabfile.tab

In the above example I told OGR2OGR to assume the source/output projection is in Massachusetts Mainland US Ft. Note: All Spatial Ref Systems can be found in the spatial_ref_sys table of PostGIS or the Data/gcs.csv file of your FW Tools install.

Conversions from PostGIS to Other formats

Conversion from PostGIS to ESRI Shape

The pgsql2shp and shp2pgsql are usually the best tools for converting back and forth between PostGIS and ESRI for 2 main reasons.

It has fewer idiosyncracies when converting data
It has a lot fewer dependencies so can fit on your floppy.

If you really want to use Ogr2Ogr for this kind of conversion, below is the standard way to do it

ogr2ogr -f "ESRI Shapefile" mydata.shp PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" "mytable"

Selecting specific fields, sets of data and Geometry

Sometimes you have more than one geometry field in a table, and ESRI shape can only support one geometry field per shape. Also you may only want a subset of data. In these cases, you will need to select the geometry field to use. The most flexible way to do this is to use the -sql command which will take any sql statement.

ogr2ogr -f "ESRI Shapefile" mydata.shp PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" -sql "SELECT name, the_geom FROM neighborhoods"

Example snippet converting from PostGIS to KML


ogr2ogr -f "KML" neighborhoods.kml PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" -sql "select gid, name, the_geom from neighborhoods"


ogr2ogr -f "KML" neighborhoods.kml PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" neighborhoods -dsco NameField=name

Exporting multiple tables from PostGIS using Ogr2Ogr

One way in which ogr2ogr excels above using the pgsql2shp tool is that ogr2ogr can export multiple tables at once. This is pretty handy for sharing your postgis data with others who do not have a postgis database.

The code below will export all your postgis tables out into a folder called mydatadump in ESRI shape (shp) format.


	ogr2ogr -f "ESRI Shapefile" mydatadump PG:"host=myhost user=myloginname dbname=mydbname password=mypassword"

The code below will export all your postgis tables out into a folder called mydatadump in MapInfo .tab format.


	ogr2ogr -f "MapInfo File" mydatadump PG:"host=myhost user=myloginname dbname=mydbname password=mypassword"

Now most of the time you probably only want to output a subset of your postgis tables rather than all your tables. This code exports only the neighborhoods and parcels tables to a folder called mydatadump in ESRI shapefile format


	ogr2ogr -f "ESRI Shapefile" mydatadump PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" neighborhood parcels

Conversion from TIGER to other formats

Topologically Integrated Geographic Encoding and Referencing system (TIGER) is the US Census Bureaus proprietary format for exporting US Census geographic and statistical data. Starting in 2007, they will be using ESRI Shapefile (SHP) as there official export format. So this section may be a bit obsolete for the upcoming versions.

To get the files for your location - you can browse their archive at http://www.census.gov/geo/www/tiger/index.html

Reading the meta data using ogrinfo


   ogrinfo TGR25025.RTI

Conversion from Tiger to ESRI shape

Give me all layers in TGR25025

The tiger files contain a set of layers, so unlike the other outputs we have done, we will specify a folder to dump all the layers into

ogr2ogr -f "ESRI Shapefile" masuffolk TGR25025.RTI

Note:

Just One Layer

ogr2ogr -f "ESRI Shapefile" sufcomp.shp TGR25025.RT1 layer CompleteChain

In the above, we are asking for just the CompleteChain layer and to output to a new file called sufcomp.shp. Note it will output shp and the corresponding shx, and prj files.

Conversion from TIGER to MapInfo

The conversion follows a similar path to ESRI Shape

All Layers - Each Layer as a single file

The below will create a folder masuf and output all the layers into that folder and give each a tab file extension

ogr2ogr -f "MapInfo File" masuf TGR25025.RT1

Single Layer - Single File

ogr2ogr -f "MapInfo File" sufcomp.tab TGR25025.RT1 layer CompleteChain

Conversion from Tiger to PostGIS

ogr2ogr -update -append -f "PostGreSQL" PG:"host=myserver user=myusername dbname=mydbname password=mypassword" TGR25025.RT1 layer CompleteChain -nln masuf  -a_srs "EPSG:4269"

Note in the above we needed to put the -update -append option because OGR2OGR will try to create a folder if given a file with no extension, which translates to creating a new database.

We also put in the -nln masuf to prevent OGR2OGR from creating the table name as CompleteChain

Lastly we put in EPSG:4269 to tell OGR to just assume Tiger is in NAD 83 long lat. Without this it creates an entry in the spatial_ref_sys table which is equivalent to the already existing well known 4269.

ESRI GeoDatabase (MDB format)

FWTools at least for windows, has the ESRI Geodatabase driver (referred to as PGeo for Personal Geodatabase) baked into the binary. This makes it relatively simple to export data out of the Personal Geodatabase format into other formats such as PostgreSQL or ESRI Shape

If you want a sample geo database to play with, try the MassGIS annotation personal geo database http://www.mass.gov/mgis/ftpgnm.htm

See list of Feature Classes


ogrinfo C:\GISData\Geonames.mdb

INFO: Open of `c:\GISData\Geonames.mdb'
      using driver `PGeo' successful.
1: GEONAMES_ANNO_PLACES
2: GEONAMES_ANNO_HYDRO
3: GEONAMES_ANNO_HYPSO

Import Personal Geodatabase to PostGIS

To import all feature classes and assign a particular spatial ref


ogr2ogr -f "PostgreSQL" PG:"host=localhost user=someuser dbname=somedb password=somepassword port=5432" C:\GISData\Geonames.mdb -a_srs EPSG:26986

Import 1 feature class and reproject and rename geometry column. This example brings in a feature class that is of projection NAD 83 MA meters and transforms to NAD 83 longlat and also renames the feature class ma_hydro


ogr2ogr -f "PostgreSQL" PG:"host=localhost user=someuser dbname=somedb" C:\Data\Geonames.mdb GEONAMES_ANNO_HYDRO -a_srs EPSG:
26986 -t_srs EPSG:4269 -nln ma_hydro -lco GEOMETRY_NAME=the_geom_4269

Importing from ESRI Shape to MySQL

MySQL 4 and 5 support spatial objects to a limited extent. The functionality is not as rich as PostGIS, but for basic mapping work and limited spatial analysis, and if you already have MySQL installed, it may suit your needs just fine. Below is a snippet of code that will import a shape file called world_adm0.shp into a MySQL database and will rename it world. NOTE: that if you have a virgin MySQL database with no geometry_columns or spatial_ref_sys table, then those will automatically be created.


ogr2ogr -f "MySQL"  MYSQL:"mydb,host=myhost,user=mylogin,password=mypassword,port=3306" -nln "world" -a_srs "EPSG:4326" path/to/world_adm0.shp

Using OGR to Import Non-spatial Data

While OGR was designed primarily to transform data between different spatial datasources, it is a little known fact that it can be used as well for importing non-spatial datasources such as Dbase files and CSV files.

Importing Dbase file into PostgreSQL using Ogr2Ogr

In the below example, we will demonstrate importing a Dbase file into PostgreSQL using OGR2OGR

ogr2ogr -f "PostgreSQL" PG:"host=myserver user=myusername dbname=mydbname password=mypassword" sometable.dbf -nln "sometable"

download

Part 1: Getting Started With SQL Server 2008 Spatial: An almost Idiot's Guide

What Is SQL Server 2008?

Microsoft SQL Server 2008 is the first version of SQL Server to have built-in functionality for doing geographic spatial queries.

This tutorial is similar to our Part 1: Getting Started with PostGIS: An almost Idiot's Guide but written to provide a similar quick primer for SQL Server 2008 users and also just as a parallel exercise in mirroring the camps. Think of it as a big welcome to the new kid on the block who is an old family friend.

We will assume a windows environment for this tutorial since SQL Server only runs on Windows and preferably Windows XP and above (not sure if it works on Windows 2000). All our examples will be using Microsoft SQL Server 2008 Express which is a free version of SQL Server 2008. SQL Server 2008 Express is allowed for both non-hoster commercial and private use. Please note that the spatial functionality in the SQL Server 2008 Express family is just as good as in the Standard and Enterprise versions with the limitation on database size, mirroring, partitioning and some other minor things which are not spatial specific. SQL Server 2008 Standard, Web and Enterprise work on only servers while SQL Server Express 2008 works on both Servers and Workstations.

Installing SQL Server 2008 Express

SQL Server 2008 Express comes in 3 flavors:

SQL Server 2008 Express - which is just the engine (~60-80 MB download)
SQL Server 2008 Express with Tools - which is the engine plus the management studio express. If you don't have 2008 Studio or Express Studio already, we highly suggest using at a minimum this one. - approximately 250 MB
SQL Server 2008 Express with Advanced Services - this is a much bigger install which includes Full-text engine and Reporting Services. ~500 MB

For the below exercise, we will assume you have downloaded at a minimum SQL Server 2008 express with Tools, or that you already have 2008 Express Studio already installed.

Some gotchas before we start:

You need to install .NET Framework 3.5 with SP1
If you have prior VS 2008 Pre SP1 you'll need to uninstall them or upgrade them to SP1 which you can get from http://www.microsoft.com/downloads/details.aspx?FamilyId=FBEE1648-7106-44A7-9649-6D9F6D58056E&displaylang=en if you are running professional or above. If you are running express family item, you need to reinstall the VS 2008 Express family item with SP1 http://www.microsoft.com/Express/Download/ and backup your IDE settings if you care about them.
You also need Windows Powershell installed http://www.microsoft.com/windowsserver2003/technologies/management/powershell/download.mspx

Lets get on to Installing

Download and install .NET Framework 3.5 with SP1 if you don't have it installed from http://www.microsoft.com/downloads/details.aspx?FamilyId=AB99342F-5D1A-413D-8319-81DA479AB0D7&displaylang=en. and then restart your computer.
If you don't have windows powershell - download and install from http://www.microsoft.com/windowsserver2003/technologies/management/powershell/download.mspx
Download one of the above from: http://www.microsoft.com/express/sql/download/
Run the executable for SQL Server Express.
Click on Installation link and Choose first option - New SQL Server Stand-alone installation and once its done with checks click OK. You may be forced to restart after the support file install step.
Under install options - choose everything. Replication is optional.
for SQL Server Instance - choose named instance and call it - Spatial or whatever you want. Note you can use default instance if you have no other SQL Server installs on your pc.
For service account - you can just run under NT AUTHORITY\SYSTEM, though for production installs, that need to interact with network, you may want to create a domain account with run as service rights and use that account.
Then click use same account for all SQL Server Services and pick NT AUTHORITY\SYSTEM
For account provisioning - we often use mixed mode which is useful if you will have non-domain access such as from a stand-alone web server.
In section specify SQL Server adminstrators, click add Current User.
At this point you may get an error if you have installed prior Visual Studio 2008 things. You will need to uninstall those or upgrade them to SP1. If you get to this this point it should be smooth sailing.
click click click install - go get a large cup of coffee.
Next - hopefully you'll get a message that says SQL Server 2008 completed successfully.

Creating a database

Once SQL Server 2008 express is installed with management tools do the following

On windows Start->Programs->Microsoft SQL Server 2008->SQL Server Managment Studio. If by chance you can't find it in your Programs because you have so many - its installed in C:\Program Files\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\Ssms.exe
You should see something like computernamehere\SPATIAL and now login with sa (Standard mode) or just the windows account assuming you gave current user admin rights.
Select Databases -> Right mouse click -> New Database
Give database a name and click the OK button.

Loading GIS Data Into the Database

Now we have a nice fully functional GIS database with no spatial data. So to do some neat stuff, we need to get some data to play with.

First off we have to install a loader. You can use the freely available http://www.sharpgis.net/page/SQL-Server-2008-Spatial-Tools.aspx Which comes with both an ESRI shape loader and SQL Server spatial viewer. To use simply download and extract.

WARNING: One of our friends noted that the SharpGIS Loader comes up with a very suboptimal spatial grid index that will throw of queries relying on a spatial index. This is particularly an issue for data loaded into the geometry planar data type. As a result, SQL Server queries may be unusually slow and you may get a bad impression of SQL Server's performance. This wil be fixed in a later version of the loader.

Get the Data

Download data from the MassGIS site.
For this simple exercise just download Towns with Coast

Extract the file into some folder. We will only be using the _POLY files for these exercises.

Bringing in Towns As Planar

First of all, what is Planar - Planar is a class of spatial reference systems that projects a round earth onto a flat model. SQL Server 2008 supports both a Geometry (Planar) and a Geography (round-earth model). For data brought in as Planar, SQL Server 2008 does not do any validation to ensure it is in the sys.spatial_reference_systems table, and in fact SQL Server 2008 only contains spherical spatial reference systems in that meta table. So if you want to bring in as planar, as long as all your data is in the same planar projection, you should be fine. SQL Server 2008 has no mechanism of transforming data from one planar projection to another.

Those who are familiar with the PostGIS equivalent exercise of this know that MassGIS data is in Massachusetts state plane Meters (Spatial_Reference_ID = 26986 which is a planar projection) so bringing it in as Geometry works fine, but trying to push it into Geodetic we shall find is a little trickier.

Now lets move some data:

Launch the Shape2Sql.exe packaged in the SharpGIS tools zip file
Your screen should look something like this
Point at the towns file you downloaded - Your screen should look something like this when you are done:
Now click the Upload to Database

Querying the data and visualizing it

What good is spatial data if you can't drop your jaws at its fantastic beauty. So lets look at what this animal looks like:

Launch the SQLSpatial.exe which is also packaged in the SharpGIS tools.
Type in the following SQL statement:
SELECT * FROM towns_planar WHERE town = 'BOSTON'
Click the !Execute button, and mouse over a geometry and you should see something like this:
File New Query and type this: SELECT TOP 1 geom.STCentroid().STAsText() FROM towns_planar WHERE town = 'BOSTON'
Should toggle to the table view and give you this - POINT (230137.48055381927 888512.01928805024)
Now lets pick the first geometry in Boston, find the centroid, buffer the centroid 1000 meters and find all fragments of towns in the buffer. People familiar with spatial queries will recognize this as clipping geometries to a buffer.
File-> New Query and do this: - evidentally there are some Massachusetts towns that SQL Server doesn't like thus the need for the IsValid check.
```
SELECT town, geom.STIntersection(buf.aBuffer) As newgeom
FROM towns_planar INNER JOIN
(SELECT TOP 1 geom.STCentroid().STBuffer(1000) As aBuffer 
FROM towns_planar WHERE town = 'BOSTON') As buf
ON (towns_planar.geom.STIntersects(buf.aBuffer) = 1)
WHERE geom.STIsValid() = 1
```
Map and table views of the above query are shown below:

Bringing in Towns As Geodetic -- To Be continued

If you have data measured in degrees e.g. WGS84 longlat (4326) or NAD 83 LongLat (4269 standard TIGER US Census format), bringing in your data as geodetic is simple since 4326 and 4269 are already listed in the sys.spatial_reference_systems. A simple query confirms that -
SELECT * FROM sys.spatial_reference_systems WHERE spatial_reference_id IN(4269,4326);

To do so - you simply follow the prior steps but choose Geography (Spheric) instead.

But what if we want to bring planar data in such as MassGIS towns as Geodetic. Then you need to first transform the data which SQL Server has no mechanism for and then bring it in. We shall go over this in part 2.

Part 2: Getting Started With SQL Server 2008 Spatial: Reproject data and More Spatial Queries

Bringing in Towns As Geography (Geodetic) -- Continued

In the first part we covered bringing in Mass Towns data as Planar geometry, but were stuck because we need to transform the data to a degree based projection (in particular one listed in sys.spatial_reference_systems) to use the Geography data type, but SQL Server 2008 has no mechanism to transform that data. Now we shall demonstrate how to transform and import the data in a supported spatial reference system.

Transforming ESRI Shape from State Plane to WGS 84 long lat

As mentioned before SQL Server 2008 has no mechanism for doing spatial transformations, so what to do?

Using OGR to Transform the data

OGR/GDAL is a free Open Source GIS toolkit that is very useful for data loading and doing spatial transformations among other things. Its probably the easiest to use of all the tools.

Install it as described in our OGR2OGR Cheatsheet
Launch the FWTools Shell from Start->All Programs->FW Tools 2..->FW Tools Shell
CD into the directory you downloaded your data in my case cd C:\data\gisdata
Type the following. The below will take an ESRI shape file called TOWNSSURVEY_POLY.shp that is of Massachusetts State Plane Meters (EPSG:26986) and transforms it to WGS 84 long lat (EPSG:4326)

ogr2ogr -f "ESRI Shapefile" -a_srs "EPSG:26986" -t_srs "EPSG:4326" towns_geodetic.shp TOWNSSURVEY_POLY.shp
Now launch Shape2SQL.exe repeat our import step except choose the new towns_geodetic.shp, but choose Geography instead and SRID 4326. Your screen should look:

Doing Queries with Geography (Geodetic)

Now we have our data imported, Launch SQLSpatial.exe as we did before and run these queries

The below fails because Geography does not support Centroid - get error STCentroid for type STGeography not found.

SELECT TOP 1 geom.STCentroid().STAsText() FROM towns_geodetic WHERE town = 'BOSTON'

So the first thing we learn from the above exercise, is that sometimes planar is still needed since while Geography can cover a larger region, it is lacking many of the functions available in the regular Geometry. Refer to SQL Server 2008 PostGIS MySQL Compare for a compare of function/method differences between SQL Server 2008 Geography and Geometry

.

Distance Searches

One important feature that Geography (Geodetic) has over Geometry is ability to do distances in meters using spherical coordinates and spanning large regions. In fact Isaac Kunen touches on this a little in his blog Nearest Neighbors. In fact doing distance queries and finding neighbors is probably the number one reason why most people will want to use spatial queries. With this one feature, one can answer questions such as

How many low income families are within x miles from this welfare office?
Correlation between outbreaks of cancer and location of a nuclear oil spill taking into consideration the full area of the oil spill?

Of course questions like these could be answered before, but are answered a bit more trivially with a spatially enabled database and are extremely difficult to answer if you are trying to find shortest distances between 2 objects that are not points.

Note we know the distance is in meters because the spatial_reference_systems table tells us so.


SELECT unit_of_measure from sys.spatial_reference_systems 
WHERE spatial_reference_id = 4326;

Most of the spatial refence systems defined in this sys table are in meters except for a few weird ones in Clarke's foot, Survey foot, and German metre.

Here we are going to run this in SQL Server 2008 Studio since we don't have any map data to view and we want to take advantage of SQL Server 2008 Studio show plan features. Keep in mind just as in all OGC compliant spatial databases, the STDistance function defines the minimum distance between 2 geometries. So if you are comparing a Polygon to a polygon then its the distance between the points on each polygon that is the closest.

Below is a slightly different query from what we used in planar and can be used equally in planar. Here we arbitrarily take the first point that defines a polygon in Boston and ask what town POLYGON/MULTIPOLYGON geometries are within 1 mile of this point and we also want to know the exact distances and results ordered by distance.


SELECT t.town, ref.point1.STDistance(t.geom)/0.3048 As dist_ft
FROM towns_geodetic As t 
INNER JOIN (
SELECT TOP 1 geom.STPointN(1) As point1
FROM towns_geodetic WHERE town = 'BOSTON') As ref
ON ref.point1.STDistance(t.geom) < 1609.344
ORDER BY ref.point1.STDistance(t.geom) ;


town	dist_ft
BOSTON	0
BOSTON	140.31019135227
BOSTON	211.728831986735
DEDHAM	2616.66222586371
DEDHAM	2616.73216967261
MILTON	3501.37051762325

Now try clicking the "Include Actual Execution Plan" (or Ctrl-M for short) View Execution Plan and hitting the Execute for the above query.

You should see something like this which will give you a rough idea of where your processing time is going.

Shown above is a very small fragment of the plan used. From this we learn that our query is using a spatial index (this is good), but there is a warning on it, not so good. Usually when you see a little warning like that, it means your planner statistics are either non-existent or out of date. If you right click on it and view details, it will tell you more about the warning. This query is already lightning fast, so we won't worry about this minor issue. In our next part, we shall delve into larger sets of data with more sophisticated queries, where speed will be an issue and we'll want to squeeze out every inch of speed we can.

So I shall leave you with these words of wisdom as far as Query Plans go and these apply for most databases and not just spatial queries. We'll experiment with these rules of thumb in the next section.

Scan percentages and sniff out highest percentage costs and inspect those
Scan for lack of indexes in plans where you would expect indexes to be used. Note just because no index is used even when you have an index is not an absolute cause for concern. Sometimes it is faster to do a table scan than an index scan.
Scan for warning flags such as above

Part 3: Getting Started With SQL Server 2008 Spatial: Spatial Aggregates and More

No Aggregates, what's a girl to do?

Those who read our PostGIS series may be wondering why the examples we chose were not the same as what we did for PostGIS. Sadly SQL Server 2008 does not have any built-in spatial aggregates such as those you would find in PostGIS (e.g. ST_Union, ST_Collect, ST_Extent, ST_MakeLine etc.). Not to fret because Isaac Kunen and his compadres have been filling in the gaps in SQL Server 2008 spatial support. These gaps are filled in with a tool kit called SQL Server Spatial Tools and can be downloaded from Code Plex at http://www.codeplex.com/sqlspatialtools, not to be confused with Morten's SQL Spatial Tools we covered in the first tutorial.

Keep in mind that SQL Server 2008 Spatial tools is a moving target with new functionality constantly added. As of this writing it has the following packaged goodies list in our order of importance:

GeographyUnionAggregate - similar to ST_Union aggregate in PostGIS and other spatial databases. This is only for Geography types.
GeographyCollectionAggregate - similiar to ST_Collect aggregate in PostGIS.
GeometryEnvelopeAggregate - this is similar to what PostGIS people call the ST_Extent aggregate.
Affine Transform - these are functions that make things such as Translate, Rotate, Scale possible and PostGIS folks will recognize this as the Affine family of functions (ST_Affine, ST_Translate, ST_Rotate, ST_TransScale). No its not Projection support. This is generally useful for doing motion pictures or maintaining spatial relationships between related objects such as when you are developing your robot, Robie. It is a bit scary if Robie gets up and forgets the rest of his body behind. Those of us who suffer or have suffered from sleep paralysis know the feeling and it ain't pleasant. Also useful for tiling and map dicing.
A couple of Linear Referencing functions for both Geometry and Geography - locate along, interpolateBetween useful for pinpointing determining street address or resolving a street address to a geometry/geography point. Again parallels exist in PostGIS and other spatial databases.
DensifyGeography - I suspect this is a complement to SQL Server Geometry Reduce function (or in PostGIS lingo ST_Simplify, ST_SimplifyPreserveTopology).

In this exercise we shall demonstrate how to compile the latest build, install and brief examples of using. As of this writing, the latest version is change Set 15818 released in September 12, 2008.

Compiling and Installing

Compiling

Note you can skip the compile step if you are satisfied with not having the latest and greatest and just download the original August compiled release. I would encourage you to be brave though since the newer version contains a lot of improvements include the GeographyCollectionAggregate.

Download the latest source version from http://www.codeplex.com/sqlspatialtools/SourceControl/ListDownloadableCommits.aspx. For those people who are comfortable with using Subversion you can download directly from the codeplex SVN interface at https://sqlspatialtools.svn.codeplex.com/svn (I recommend Tortoise SVN client http://tortoisesvn.net/downloads as a nice plugin to windows explorer for downloading and browsing SVN sources.
If you don't have the full Visual Studio 2008 (e.g. Standard, Professional not that Business Studio thing SQL Server packages in to confuse you) or Visual Studio C# Express, you can download Visual Studio 2008 C# Express for free from http://www.microsoft.com/express/download/default.aspx and run the setup to get in business. Note older versions of Visual Studio will not work.
Open the solution (.sln) by using the File->Open Project option file in Visual Studio 2008 / C# Express) and Right-Click on the Solution and click Build Solution.

Installing

Now after we have compiled, the assembly file will be located in the SpatialTools\bin\Release folder and be called SQLSpatialTools.dll. Before we can start using, we'll need to load this assembly and the function stubs into our database.

First if you have your SQL Server 2008 on a different box from where you compiled, copy over the SQLSpatialTools.dll to a folder on that server.
Open up the Register.sql file in the bin\Release folder with your SQL Server 2008 Studio Manager
Next as the Readme.txt in the bin\Release folder states, there are 2 things you need to change in the Register.sql file.
1. Change USE [] to --> USE testspatial
  replacing testspatial with whatever you called your database.
2. Change create assembly SQLSpatialTools from '' to create assembly SQLSpatialTools from 'C:\SQLSpatialTools.dll' or whereever you copy the file. Note if you want you can delete the dll file later after you have registered it since the registration process copies the dll into the Database anyway.
The other SQL files have some simple use cases of the functions you can try out so give them a test drive to make sure you installed right.
Not only should the test sql files work, but you should also be able to see a site like this from SQL Server 2008 Studio -

Now for the fun

Union the towns so we get one geometry per town and insert into a new table called towns_singular_geodetic


	SELECT town, dbo.GeographyUnionAggregate(geom) As geom
	INTO towns_singular_geodetic
	FROM towns_geodetic 
	GROUP BY town;
	
	--Creates 351 rows.
	compare to 
	SELECT COUNT(*) from towns_geodetic;
	Which gives us 1241 rows.

What towns are adjacent to Boston?


	SELECT t.town
	FROM (SELECT geom FROM towns_singular_geodetic WHERE town = 'BOSTON') as boston
		INNER JOIN towns_singular_geodetic As t 
		ON boston.geom.STIntersects(t.geom) = 1
	ORDER BY t.town

--This query took a whole 9 seconds and returned 13 towns. Hmm what could be wrong. Is this the best we can do?

If we run the Actual Execution Plan - we see that SQL Server tells us in very subtle terms that we are idiots. .
If only that message about the missing index were in Red instead of Green, I just might have thought there was a problem :).

The Power of Indexes

Well looking at our query we see a WHERE based on town and we are obviously doing spatial calculations that can benefit from a spatial index. A spatial index in SQL Server 2008 always relies on a clustered index so lets make our town a clustered primary since we have one town per record now and a spatial index.

Gotchas

Can't make a field a Primary Key if it allows nulls
Need preferably a primary key that is clustered for spatial index
For some reason when using the WYSIWIG it wants to drop the table first. So I'm relying on my memory of ANSI DDL SQL which oddly works.
Some apps don't like unicode nvarchar so I need to convert that nvarchar(255) to varchar(50)

Run the below in SQL Server 2008 Studio/Express.


ALTER TABLE dbo.towns_singular_geodetic ALTER COLUMN town varchar(50) NOT NULL
GO

ALTER TABLE [dbo].[towns_singular_geodetic] ADD PRIMARY KEY CLUSTERED 
(
	town ASC
)
GO

ALTER TABLE dbo.towns_singular_geodetic ALTER COLUMN geom geography NOT NULL
GO

CREATE SPATIAL INDEX [geom_sidx] ON [dbo].[towns_singular_geodetic] 
(
	[geom]
)USING  GEOGRAPHY_GRID 
WITH (
GRIDS =(LEVEL_1 = MEDIUM,LEVEL_2 = MEDIUM,LEVEL_3 = MEDIUM,LEVEL_4 = MEDIUM), 
CELLS_PER_OBJECT = 16) ;

After all that work rerunning our query - we are using the clustered index now, but no spatial index and speed is not improved.

Sadly it seems to get better performance (in fact down to less than 1 sec), we have to put in a hint to tell it to use the spatial index.


	SELECT t.town
	FROM (SELECT geom FROM towns_singular_geodetic WHERE town = 'BOSTON') as boston
		INNER JOIN towns_singular_geodetic As t WITH(INDEX(geom_sidx))
		ON boston.geom.STIntersects(t.geom) = 1
	ORDER BY t.town

A snapshot of part of the show plan is here and we see that mystical warning flag telling us we have no stats for that spatial index:
Force Index

So we have a workaround, but the workaround to me is very unappealing, perhaps because I come from the old school of thought that databases should know more about the state of the data than I do and the ideal solution is not to give the database hints, but figure out why the database doesn't think it needs to use the spatial index. I can go into a whole diatribe of why HINTS are bad, but I'll try to refrain from doing so since I'll just bore people with my incessant whining and start some mini holy wars along the way.

As my big brother likes saying -- When a fine piece of American rocketry doesn't work right, you better hope you have a Russian Cosmonaut hanging around and laughing hysterically. It took me a while to get the full thrust of that joke and appreciate my brother's twisted sense of humor.

Back to Basics

What does all the above nonsense mean. In a nutshell - sophisticated machinery is nice, but don't let the complexity of that machinery shield you from the fundamentals. Admittedly its hard and I thank my lucky stars that I was brought up in a time before computers and programming got too complicated for a little kid to tear apart and successfully put back together. There is nothing as exhilarating for a nerd kid as being able to tear apart his/her toys and realizing "Yes I have the power to fix this myself".

There are a couple of things that most relational databases share with each other regardless of how simple or complex they are

They are cost-based systems. You can imagine a little economist sitting in there being handed stats and saying ah based on these summaries and my impressive knowledge of database economics, I think the best plan (course of action) to take is this.
Statistics are usually represented as some sort of histogram thing defining some distribution of data in your indexes or selected fields. SQL Server usually when you create indexes on a field it creates statistics objects that go with it so most users don't even know what this is. Reader Connor's stat man description for more gory details. Statistics are updated and created frequently and if you follow good SQL Server best practices - you don't quite know why but you use the wizard to create a maintenance plan that does all these wonderful things behind the scenes like reindexing and updating your statistics periodically for you. You use the tuning wizard, pass it a load of queries - saying This is my common workload and it in return says ah we need to start gathering these statistics and we need these indexes to improve the speed of your queries and it generates a script that adds these indexes and creates these objects called statistics and you blindly run the script not looking at it because you are in very good hands. In wizard we trust. Life is great with the wizard.
Heuristics - basically rules of thumb that if you have x pockets of data here and are asking for y then it makes sense to use this and that. To an economist this boils down to statistics, normalizing them, and then throwing coefficients in front of them to create one big ass impressive looking equation.
Good databases just like good economists, make bad decisions when given bad statistics or incomplete statistics
Not just statistics makes them behave badly, but coming up with coefficients is inherently a black art and they sometimes put a cost coefficient in front of a value that is too high or too low. It seems in SQL Server's case, there is no way to fiddle with these coefficients.

With all these great gadgets and wizards, things work and the world is good except when things don't work and then you look up at that Russian Cosmonaut hanging above you in his modest machine with his pint of warm vodka in hand laughing hysterically. Why is he laughing hysterically? Because ironically he has never seen anything quite like this gorgeous machine you have which looks especially magical in his drunken state, but even in his drunken stupor he recognizes your problem, but you, fully sober, and having used this machine for so long are still scratching your head. His machine is not quite so complicated and he has seen the problem less disguised before. He has fewer magical wizards to summon to help out so he relies more on his hard-earned cultivated understanding of fundamentals of how things are supposed to work to guide him. He is one with his machine and you are not because in wizard you trusted, but the wizard seems to be doing very strange things lately.

So what's the problem here? Why must SQL Server be told to use the spatial index sometimes though not all the time? Well it is no surprise, spatial is a bit of a new animal to SQL Server and the planner is probably a bit puzzled by this new beast. I thought perhaps its because there are no statistics for the spatial index and in fact you can't create statistics for the spatial index. If you try to do this:

CREATE STATISTICS [geom_towns_singular_geodetic] ON [dbo].[towns_singular_geodetic]([geom]) GO

You get this - Msg 1978, Level 16, State 1, Line 1 Column 'geom' in table 'dbo.towns_singular_geodetic' is of a type that is invalid for use as a key column in an index or statistics.

Even after trying to force statistics with update towns_singular_geodetic statistics. Still no statistics appeared for the spatial index. One day magically the statistics appeared for both towns_singular_geodetic and towns_geodetic (SQL Server is known to create statistics behind the scenes when the mood strikes it like all good wizards), but they were for some mystical tables called sys.extended..index... that contained cells, grids and stuff - I couldn't see these things, I couldn't select from them, but I could create statistics on them too just like the wizard can. I tried figuring out how this happened by dropping the index and readding and as expected the statistics went away and I'm sure they will come back.

Contrasting and comparing between PostgreSQL and SQL Server. PostgreSQL lets you fiddle with coefficients and for 8.3+ you can fiddle down to individual functions, but you can't HINT. No Hinting is allowed. SQL Server allows you to hint all you want but no fiddling with its coefficients (that is the wizards realm). So what to do what to do. When you feel like HINTING go for SQL Server, but if you are in the mood to write big ass equations go for PostgreSQL :). Well truth be known in most cases you neither have to HINT nor write big ass equations except when the planner seems to be fumbling.

Note: In the other scenario of Part 2, SQL Server gladly used the spatial index without any hints and even without stats, but this time it is not. Why not? Because sometimes you don't need to look at statistics to know it is better to use something and your rule of thumb just drastically prefers one over the other. If you have millions of people in a crowd and you are looking for a person with a blue-checkered shirt, you can guess without looking at the distribution of the index that if you have an index by shirt-color type, it would probably help to use that index. However if you have only 2 people in a crowd and you are looking for someone with a blue-checkered shirt, it is probably faster to just scan the crowd than pulling out your index and cross-referencing with the crowd.

This exercise was a bit of a round about. If I really wanted to know which towns are adjacent to Boston I really don't need to do a spatial union, I could do the below query and the below query magically uses the spatial index without any hinting and returns the same 13 records in the same under 1 sec speed.
SELECT t.town FROM (SELECT geom FROM towns_geodetic WHERE town = 'BOSTON') as boston INNER JOIN towns_geodetic As t ON boston.geom.STIntersects(t.geom) = 1 GROUP BY t.town ORDER BY t.town

So what does this tell us. Possibly the large size of each spatial geometry in the unioned version and the relative small size of the table is making SQL Server think bah -- scanning the table is cheaper. So the cost coefficients it is applying to the spatial index may be over-estimated. Perhaps this will be improved in Service Pack 1.

Spatial Helper Stored procedures

SQL Server 2008 comes with a couple of spatial helper stored procedures to help us diagnose problems. Below are some sample uses.


 DECLARE @qg geography
 SELECT @qg = geom FROM towns_singular_geodetic WHERE town = 'BOSTON'
 EXEC sp_help_spatial_geography_index 'towns_singular_geodetic', 'geom_sidx', 1,@qg

How to Use different kinds of datasources in UMN Mapserver layers

One of the very great things about the UMN Mapserver Web system is that it can support numerous kinds of datasources. In this brief excerpt we will provide examples of how to specify the more common data sources used for layers. The examples below are for Mapserver 4.6, but for the most part are applicable to lower versions.

File locations for file based datasources such as ESRI Shape and MapInfo tab files are defined relative to the SHAPEPATH attribute or as absolute paths. For example the beginning declaration of your .map file might look something like the below


MAP 
    #
    # Start of map file
    #

    NAME MYMAP
    
    EXTENT  732193.725550 2904132.702662 799614.090681 2971466.288170 
    
    SIZE 500 500
    SHAPEPATH "c:\mydata\"
    :
    :

ESRI Shapefile

The most common kind of data used in UMN Mapserver is the ESRI shapefile which has a .shp extension. For this kind of datasource you simply specify the location of the file without even specifying the extension. Below is a sample declaration of a polygon layer that uses a shape file

    LAYER
        NAME buildings
        TYPE POLYGON
        STATUS DEFAULT
        DATA buildings
        PROJECTION
            "init=epsg:2249"
        END
        CLASS
            OUTLINECOLOR 10 10 10
        END
    END

MapInfo Tab Files

Many datasources are available to mapserver via the GDAL OGR driver. Map Info is one of those datasources. Below example is what a mapinfo layer definition looks like. Note the tab file specified should be placed in the folder denoted by SHAPEPATH at top of map file

    LAYER
        NAME buildings
        STATUS DEFAULT
        MINSCALE 7000
        CONNECTIONTYPE OGR
        CONNECTION "buildings.tab" 
        TYPE POLYGON
        PROJECTION
            "init=epsg:2249"  
        END
        # -- MapInfo has projection information built in the tab file 
        # -- so you can often auto read this information with the below
        #PROJECTION
        # AUTO
        #END
        CLASS
            OUTLINECOLOR 10 10 10
        END
    END

PostGIS Layer

Mapserver has a custom driver for the PostGIS spatial database. In order to use this, your mapserver cgi or mapscript must be compiled with the PostGIS driver. Below is what a postgis mapserver layer looks like.


    LAYER
      CONNECTIONTYPE postgis
      NAME "buildings"
      CONNECTION "user=dbuser dbname=mydb host=myserver"
      # the_geom column is the name of a spatial geometry field in the table buildings
      DATA "the_geom from buildings"
      STATUS DEFAULT
      TYPE POLYGON
      # Note if you use a filter statement - this is basically like a where clause of the sql statement
      FILTER "storyhg > 2"
      CLASS
            OUTLINECOLOR 10 10 10
      END
    END

WMS Layer

Mapserver has the ability to act as a WMS Server as well as a WMS Client. The WMS Client capabilities are accessed by defining WMS layers that connect to WMS servers. Below is an example of a WMS layer using the Microsoft Terraservices WMS Server.


    LAYER
          NAME "msterraservicedoq"
          TYPE RASTER
          STATUS DEFAULT
          CONNECTION "http://terraservice.net/ogcmap.ashx?"
          CONNECTIONTYPE WMS
          MINSCALE 3000
          MAXSCALE 20000
          #DEBUG ON
          METADATA
            "wms_srs"             "EPSG:26919"
            "wms_name"            "doq"
            "wms_server_version"  "1.1.1"
            "wms_format"          "image/jpeg"
            "wms_style"         "UTMGrid_Cyan"
            "wms_latlonboundingbox" "-71.19 42.23 -71 42.40"
          END
     END

Using Mapserver as a WMS Client.

Example mapserver map that calls microsoft terraservice WMS.

In this example we show how to use Mapserver as a WMS client by utilizing Microsoft's Terra Service WMS server. For more details about Microsft's OGC WMS check out the GetCapabilities of Microsoft Terraservice.

download

PostGIS Concave Hull

This is a work in progress, and demonstrates the ST_ConcaveHull function that will be available in PostGIS 2.0. The basic approach is that it first creates a convexhull of the geometry and then uses the ST_ClosestPoint function introduced in PostGIS 1.5 to cave in the hull to transform it into a concave hull. That's the basic idea, the second arguments is the percent target of convexhull. The routine will recursively shrink until it determines it can't reach the target percent or it gets lower than target.

More details can be found at http://postgis.net/docs/manual-dev/ST_ConcaveHull.html

Geometry	Concave Hull	Convex Hull
The letter S copied from Simon's blog CREATE TABLE s_test(geom geometry); INSERT INTO s_test(geom) VALUES(ST_GeomFromText('MULTIPOINT( (120 -224), (185 -219), (190 -234), (200 -246), (212 -256), (227 -261), (242 -264), (257 -265), (272 -264), (287 -263),(302 -258), (315 -250), (323 -237), (321 -222), (308 -213), (293 -208), (278 -205), (263 -202), (248 -199), (233 -196), (218 -193), (203 -190), (188 -185), (173 -180), (160 -172), (148 -162), (138 -150), (133 -135), (132 -120), (136 -105), (146 -92), (158 -82), (171 -74), (186 -69), (201 -65), (216 -62), (231 -60), (246 -60), (261 -60), (276 -60), (291 -61), (306 -64), (321 -67), (336 -72), (349 -80), (362 -89), (372 -101), (379 -115), (382 -130), (314 -134), (309 -119), (298 -108), (284 -102), (269 -100), (254 -99), (239 -100), (224 -102), (209 -107), (197 -117), (200 -132), (213 -140), (228 -145), (243 -148), (258 -151), (273 -154), (288 -156), (303 -159), (318 -163), (333 -167), (347 -173), (361 -179), (373 -189), (383 -201), (389 -215), (391 -230), (390 -245), (384 -259), (374 -271), (363 -282), (349 -289), (335 -295), (320 -299), (305 -302), (290 -304), (275 -305), (259 -305), (243 -305), (228 -304), (213 -302), (198 -300), (183 -295), (169 -289), (155 -282), (143 -272), (133 -260), (126 -246), (136 -223), (152 -222), (168 -221), (365 -131), (348 -132), (331 -133), (251 -177), (183 -157), (342 -98), (247 -75), (274 -174), (360 -223), (192 -85), (330 -273), (210 -283), (326 -97), (177 -103), (315 -188), (175 -139), (366 -250), (321 -204), (344 -232), (331 -113), (162 -129), (272 -77), (292 -192), (144 -244), (196 -272), (212 -89), (166 -236), (238 -167), (289 -282), (333 -187), (341 -249), (164 -113), (238 -283), (344 -265), (176 -248), (312 -273), (299 -85), (154 -261), (265 -287), (359 -111), (160 -150), (212 -169), (351 -199), (160 -98), (228 -77), (376 -224), (148 -120), (174 -272), (194 -100), (292 -173), (341 -212), (369 -209), (189 -258), (198 -159), (275 -190), (322 -82))') ) ; SELECT ST_AsBinary( geom ) FROM s_test ;	`SELECT ST_AsBinary( ST_ConcaveHull(ST_Collect(geom),0.99) ) FROM s_test;` 99% `SELECT ST_AsBinary( ST_ConcaveHull(geom,0.90) ) FROM s_test ;` 90% `SELECT ST_AsBinary( ST_ConcaveHull(geom,0.90,true) ) FROM s_test;` 90% allowing holes	`SELECT ST_AsBinary( ST_ConvexHull(ST_Collect(geom)) ) FROM s_test ;`
Stumper Lines	`SELECT ST_AsBinary( ST_ConcaveHull(ST_Collect(geom),0.80) ) FROM stumper;`	`SELECT ST_AsBinary( ST_ConvexHull(ST_Collect(geom)) ) FROM stumper;`
Point Wave `CREATE TABLE pointwave(gid serial primary key, geom geometry); INSERT INTO pointwave(geom) SELECT ST_Point(irandom(),jrandom()) As the_geom FROM generate_series(1,100) As i CROSS JOIN generate_series(-20,20) As j ; SELECT geom FROM pointwave;`	`SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.80 ) ) FROM pointwave;` 80% `SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.50 ) ) FROM pointwave;` 50%	`SELECT ST_AsBinary(ST_ConvexHull(ST_Collect(geom))) FROM stumper;`
Point Locs (~5000 points) `SELECT ST_AsBinary(geom) FROM pointlocs WHERE geom IS NOT NULL;`	99% `SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.99 ) ) FROM pointlocs WHERE geom IS NOT NULL;` 90% `SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.90 ) ) FROM pointlocs WHERE geom IS NOT NULL;` 90% allow holes `SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.90 ),true ) FROM pointlocs WHERE geom IS NOT NULL;` 70% `SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.70 ) ) FROM pointlocs WHERE geom IS NOT NULL;` 70% allow holes `SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.70,true ) ) FROM pointlocs WHERE geom IS NOT NULL;`	`SELECT ST_AsBinary( ST_ConvexHull( ST_Collect(geom) ) ) FROM pointlocs WHERE geom IS NOT NULL;`
Neighborhoods `SELECT ST_Union(geom) FROM neighborhoods;`	`SELECT ST_AsBinary( ST_ConcaveHull(ST_Union(geom),0.99) ) FROM neighborhoods;` 99% `SELECT ST_AsBinary( ST_ConcaveHull(ST_Union(geom),0.90) ) FROM neighborhoods;` 90% `SELECT ST_AsBinary( ST_ConcaveHull(ST_Union(geom),0.70) ) FROM neighborhoods;` 70% `SELECT ST_AsBinary( ST_ConcaveHull(ST_Union(geom),0.70) ) FROM neighborhoods;` 50%	`SELECT ST_AsBinary( ST_ConvexHull(ST_Union(the_geom)) ) FROM neighborhoods;`
MBTA routes (about 1000 multilinestrings) `SELECT ST_AsBinary( geom ) FROM mbta_busroutes ;`	Note: In some cases the 99% (such as this example) gives a better and much faster result than some of the lower compressions. This is because of GEOS topological errors we run into that force the alogrithm to fall back on the convexhull for certain regions. So as a general rule, try the 99% first. It is generally much much faster than lower percentages, for fairly huge patches of empty area, also returns an areal percentage much less than the convexhull, and is also less likely to run into topological issues 99% `SELECT ST_AsBinary( ST_ConcaveHull(ST_Collect(geom),0.99) ) FROM mbta_busroutes ;` 90% `SELECT ST_AsBinary( ST_ConcaveHull(ST_Collect(geom),0.90) ) FROM mbta_busroutes ;` `SELECT ST_AsBinary( ST_ConcaveHull(ST_Collect(geom),0.70) ) FROM mbta_busroutes;` 70% do not allow holes `SELECT ST_AsBinary( ST_ConcaveHull(ST_Collect(geom),0.70, true) ) FROM mbta_busroutes;` 70% allow holes	`SELECT ST_AsBinary( ST_ConvexHull(ST_Collect(geom)) ) FROM mbta_busroutes ;`

PostGIS ST_Dump, Dump

What is ST_Dump, Dump

ST_Dump is a function that takes a geometry and returns a set of Postgis geometry_dump structure. Geometry_dump is composed of 2 properties. geom and path. The geom is a geometry and path is a multi-dimensional integer array that has the exact location of the geom in the MULTI, Geometry Collection. For MULTI geometries, the path is one dimensional looking like {1}. For single geometries, the path is an empty array. For GeometryCollection it has multiple array elements depending on the complexity of the Geometry Collection.

ST_Dump comes in very handy for expanding a MULTI geometry into single geometries. Below is an example. That expands a set of MULTIPOLYGONs into single POLYGONs


SELECT somefield1, somefield2, 
 (ST_Dump(the_geom)).geom As the_geom 
FROM sometable

It comes in very handy in conjunction with ST_Collect. ST_Collect will return a MULTI geom if you give it single geoms, but will return a Geometry collection if you try to feed it MULTI geometries. In general ST_Collect is faster than ST_Union. To take advantage of the speed and simplicity of ST_Collect without generation of GeometryCollections, you can expand your MULTIPOLYGONS, MULTILINESTRINGS, MULTIPOINTS into single geoms and then recollect them according to the grouping you desire. If you want to regroup a set of MULTI.. Into a new grouping, you can do something like the below.


SELECT stusps, 
 ST_Multi(ST_Collect(f.the_geom)) as singlegeom FROM (SELECT stusps, 
 (ST_Dump(the_geom)).geom As the_geom FROM somestatetable ) As f
   GROUP BY stusps

Extent Expand Buffer Distance: PostGIS - ST_Extent, Expand, ST_Buffer, ST_Distance

Extent Expand Buffer Distance ST_DWithin

In this quick exercise, we will explore the following PostGIS OGC functions: Extent, Expand, Buffer, Distance

Extent

Extent is an aggregate function - meaning that it is used just as you would use SQL sum, average, min, and max often times in conjunction with group by to consolidate a set of rows into one. Pre-1.2.2 It returned a 2-dimensional bounding box object (BBOX2D) that encompasses the set of geometries you are consolidating.

Unlike most functions in PostGIS, it returns a postgis BBOX object instead of a geometry object.
In some cases, it may be necessary to cast the resulting value to a PostGIS geometry for example if you need to do operations that work on projections etc. Since a bounding box object does not contain projection information, it is best to use the setSRID function as opposed to simply casting it to a geometry if you need projection information. SetSRID will automagically convert a BBOX to a geometry and then stuff in SRID info specified in the function call.

Extent3d

Extent has a sibling called Extent3d which is also an aggregate function and is exactly like Extent except it returns a 3-dimensional bounding box (BBOX3D).

ST_Extent

Starting around version 1.2.2, Extent and Extent3d will be deprecated in favor of ST_Extent . ST_Extent will return a BOX3D object.

Expand (< 1.3.1), ST_Expand (1.2.2 +)

Expand returns a geometry object that is a box encompassing a given geometry. Unlike extent, it is not an aggregate function. It is often used in conjunction with the distance function to do proximity searches because it is less expensive than the distance function alone.

The reason why expand combined with distance is much faster than distance alone is that it can utilize gist indexes since it does compares between boxes so therefore reduces the set of geometries that the distance function needs to check.

Note in versions of PostGIS after 1.2, there exists a new function called ST_DWithin which utilizes indexes, simpler to write than the expand, &&, distance combination.

ST_Distance will be the preferred name in version 1.2.2 and up

The following statements return equivalent values, but the one using Expand is much faster especially when geometries are indexed and commonly used attributes are indexed.

Find all buildings located within 100 meters of Roslindale
	Using distance and Expand: Time with indexed geometries: 14.5 seconds - returns 8432 records

	
		SELECT b.the_geom_nad83m
FROM neighborhoods n, buildings b
WHERE n.name = 'Roslindale' and expand(n.thegeom_meter, 100) && b.thegeom_meter
	and distance(n.thegeom_meter, b.thegeom_meter) < 100 
	

	Using distance alone: Time with indexed geometries: 8.7 minutes - returns 8432 records
	
		SELECT b.the_geom_nad83m
FROM neighborhoods n, buildings b
WHERE n.name = 'Roslindale' 
	and distance(n.thegeom_meter, b.thegeom_meter) < 100

ST_DWithin (1.3.1 and above)

We will write the above using the new ST_DWithin to demonstrate how much easier it is.

Using ST_DWithin: Time with indexed geometries: 14.5 seconds - returns 8432 records

	
SELECT b.the_geom_nad83m
	FROM neighborhoods n, buildings b
		WHERE n.name = 'Roslindale' and ST_DWithin(n.thegeom_meter, b.thegeom_meter, 100)

Within (< 1.3.1), ST_Within (1.3.1 and above)

ST_Within(A,B) returns true if the geometry A is within B. There is an important distinction between Within and ST_Within and that is the ST_Within does an implicit A&&B call to utilize indexes where as Within and _ST_Within do not.

1.3.1 and above do

	
SELECT b.the_geom_nad83m
	FROM neighborhoods n, buildings b
		WHERE n.name = 'Roslindale' and ST_Within(b.thegeom_meter, n.thegeom_meter)

Pre 1.3.1 do

	
SELECT b.the_geom_nad83m
	FROM neighborhoods n, buildings b
		WHERE n.name = 'Roslindale' and b.thegeom_meter && n.thegeom_meter AND Within(b.thegeom_meter, n.thegeom_meter)

ST_Buffer (+1.2.2), Buffer (< 1.2.2)

Buffer returns a geometry object that is the radial expansion of a geometry expanded by the specified number of units. Calculations are in units of the Spatial Reference System of this Geometry. The optional third parameter sets the number of segments used to approximate a quarter circle (defaults to 8) if third argument is not provided.
This is a much more involved process than the expand function because it needs to look at every point of a geometry whereas the expand function only looks at the bounding box of a geometry.

Aliases: ST_Buffer (MM-SQL)

Correcting Invalid Geometries with Buffer

Buffer can also be used to correct invalid geometries by smoothing out self-intersections. It doesn't work for all invalid geometries, but works for some. For example the below code will correct invalid neighborhood geometries that can be corrected. Note in here I am also combining the use with MULTI since in this case buffer will return a polygon and our table geometries are stored as multi-polygons. If your column geometry is a POLYGON rather than a MULTIPOLYGON then you can leave out the MULTI part.


	UPDATE neighborhoods 
		SET the_geom = multi(buffer(the_geom, 0.0)) 
		WHERE isvalid(the_geom) = false AND isvalid(buffer(the_geom, 0.0)) = true

Pictorial View of Buffer, Expand, Extent

In this section we provide a graphical representation of what the operations Buffer, Expand, Extent look like when applied to geometries.

Legend
extent expand buffer legend

Corresponding Queries
Original:

SELECT the_geom, name 
FROM neighborhoods 
	WHERE name IN('Hyde Park','Roxbury')

Expand: Draws a box that extends out 500 units from bounding box of each geometry

SELECT expand(the_geom, 500) as geom, name 
FROM neighborhoods 
WHERE name IN('Hyde Park', 'Roxbury')

Extent: Draws a single bounding box around the set of geometries

SELECT extent(the_geom) as thebbox, name 
FROM neighborhoods 
WHERE name IN('Hyde Park', 'Roxbury')

Buffer: Extends each geometry out by 500 units.

SELECT buffer(the_geom,500) as geom, name 
FROM neighborhoods 
WHERE name IN('Hyde Park', 'Roxbury')

PostGIS ST_GeomFromText

ST_GeomFromText: Loading a Geometry in Well-Known-text (WKT)) into PostGIS

The ST_GeomFromText OGC function is used to convert a Well Known Text (WKT) version of a geometry into a PostGIS geometry. Aliases: ST_GeometryFromText
SQL-MM Spec Equivalents ST_GeomFromText, ST_WKTToSQL ()

Example inserting a linestring

This example demonstrates using a WKT representation of a Street in NAD 83 LongLat format and inserting it into a PostGIS geometry field.


INSERT INTO streets(street_name, the_geom)
SELECT 'some street', 
ST_GeomFromText('LINESTRING(-70.729212 42.373848,-70.67569 42.375098)',4269)

The ST_GeomFromText function or equivalent exists in other spatial databases. For example in MySQL 5 and above, the function is called the same and uses the same syntax as above. In Oracle 10g Spatial and Locator, you would use the SDO_GEOMETRY function. So the above in Oracle would look something like.

INSERT INTO streets(street_name, the_geom)
SELECT 'some street', 
SDO_GEOMETRY('LINESTRING(-70.729212 42.373848,-70.67569 42.375098)',SDO_CS.MAP_EPSG_SRID_TO_ORACLE(4269)))
FROM DUAL

Note in the above code we are using an Oracle function called SDO_CS.MAP_EPSG_SRID_TO_ORACLE(), because Oracle Spatial Reference System IDs (SRID) are usually not the same as the commonly used European Petroleum Survey Group (EPSG)standard codes. PostGIS and MySQL SRIDs are generally the same as the EPSG IDs so that call is not necessary.

Example inserting a multipolygon


INSERT INTO boston_buildings(name, the_geom) 
VALUES('some name', ST_GeomFromText('MULTIPOLYGON(((235670.354215375 
894016.780856,235668.324215375 894025.050856,235681.154215375
 894028.210856,235683.184215375 894019.940856,235670.354215375 894016.780856)))', 2805) )

Intersects Intersection: PostGIS - ST_Intersects, ST_Intersection

ST_Intersects, Intersects

ST_Intersects is a function that takes two geometries and returns true if any part of those geometries is shared between the 2. In PostGIS versions before 1.3 you would use the following syntax to utilize indexes

SELECT a.somfield, b.somefield2
FROM a INNER JOIN b ON 
	(a.the_geom && b.the_geom AND intersects(a.the_geom, b.the_geom))

In versions from 1.3 forward, the && indexable operator is automatically included in the definition of ST_Intersects so you can simply write


SELECT a.somfield, b.somefield2
FROM a INNER JOIN b ON ST_Intersects(a.the_geom, b.the_geom)

ST_Intersection, Intersection

The functions ST_Intersection and Intersection are compliments to ST_Intersects. What they return is that portion of geometry A and geometry B that is shared between the two geometries. If the two geometries do not intersect, then what is returned is an empty GEOMETRYCOLLECTION object.

NOTE: PostGIS versions 1.2.2 and up you should use ST_Intersection as Intersection is deprecated. The two functions are more or less equivalent though.

Pictorial View of ST_Intersection

In this section we provide a graphical representation of what the Intersection looks like. We will use an example of a building and a parcel where the building does not completely sit inside the parcel.

SELECT b.the_geom As bgeom, p.the_geom As pgeom, 
		ST_Intersection(b.the_geom, p.the_geom) As intersect_bp
	FROM buildings b INNER JOIN parcels p ON ST_Intersection(b,p)
	WHERE ST_Overlaps(b.the_geom, p.the_geom)
	LIMIT 1;

Parcel Geometry: pgeom:
parcel

Building Geometry Overlayed on Top Of Parcel: bgeom: The building is in green

The geometry returned by ST_Intersection(b.the_geom, p.the_geom) overlayed on top of Parcel: (intersection is in brown)

PostGIS MakeLine ST_MakeLine Examples

ST_Makeline is an aggregate function that takes a sequence of points and strings them together to make a line. In versions of PostGIS prior to 1.2.2, this was called MakeLine. For prior versions - replace ST_MakeLine with MakeLine.

ST_MakeLine Example: Create a line string of Boston Marathon practice route for each day

In this example we have a table of gps point snapshots for our marathon practice for each day broken out by time. We want to convert the points for each day into a single record containing the line path of our run for that day.

NOTE: For this example the trip_datetime field is of type timestamp so records the date and time the gps position was recorded. CAST(trip_datetime As date) (this is the ANSI sql standard) or PostgreSQL specific short-hand trip_datetime::date strips off the time part so we are just left with the day.

We do a subselect with an order by to force the points to be in order of time so that the points of the line follow the path of our run across time.


SELECT St_MakeLine(the_point) as the_route, bp.trip_date
FROM (SELECT the_point, CAST(trip_datetime As date) as trip_date
		FROM boston_marathon_practice 
			ORDER BY trip_datetime) bp 
GROUP BY bp.trip_date;

Convert Parcel points into line segments

In the previous example since our grouping field is in the same order as the datetime, we only needed to order by one field in our subselect. If your groupings are not in the same order as your line order, then you need to do additional orders for each field you plan to group by. If you don't your resulting points may get shuffled. In the below simplified example we have parcel centroids and we want to make line segments that join them such that for each unique street and zip we have one line segment. For extra measure we want our final set to include the start number range and end number for this segment.

This is a failry simplistic example. In reality you would probably need to do a little bit more because street segments have same names in Boston even within the same zip. We are also assuming the line segment drawing should follow the order of the street numbers and our street numbers are numeric.

SELECT p.street, p.zip, ST_MakeLine(p.the_point) As streetsegment, 
          MIN(st_num) as st_num_start, MAX(st_num) As st_num_end
FROM (SELECT street, zip, centroid(p.the_geom) as the_point, st_num  
          FROM parcels ORDER BY zip, street, st_num) p
GROUP BY p.zip, p.street

PostGIS MakePoint

Creating Point Geometries with ST_MakePoint

There are numerous ways of creating point geometries in PostGIS. We have covered these ways in other snippets. Check out the fulling links for other examples.

ST_MakePoint is perhaps in terms of speed the fastest way of creating a PostGIS geometry and on tests I have done, can be as much as 5 to 10 times faster. The downside of MakePoint is that it is not defined in the OGC spec so its not quite as portable across other spatial databases as is ST_GeomFromText and PointFromText.

ST_MakePoint is used in the form MakePoint(x, y) which for pretty much all spatial reference systems means ST_MakePoint(longitude, latitude)

Transformable (Reprojectable) and Non-Transformable Geometries

MakePoint used alone creates a nontransformable geometry (one that has an SRID = -1, so because of that, MakePoint is usually used in conjunction with SetSRID to create a point geometry with spatial reference information. Examples below and the EWKT output to demonstrate the differences.

Example below can not be used in a transformation to transform to another Spatial Reference System

SELECT ST_MakePoint(-71.09405383923, 42.3151215523721) as the_point, AsEWKT(ST_MakePoint(-71.09405383923, 42.3151215523721)) as ewkt_rep

Note - the ewkt_rep output looks exactly like the WKT output - e.g. no spatial reference information so can not be passed into a transformation call - POINT(-71.09405383923 42.3151215523721)

Emparting Spatial Information to MakePoint Geometry

SELECT SETSRID(ST_MakePoint(-71.09405383923, 42.3151215523721),4326) as the_point, ST_AsEWKT(ST_SetSRID(ST_MakePoint(-71.09405383923, 42.3151215523721),4326)) as ewkt_rep

Note - the ewkt_rep output looks like the regular WKT output but has extra information about the spatial refence system the point is defined in SRID=4326;POINT(-71.09405383923 42.3151215523721)

PointFromText, LineFromText, ST_PointFromText, ST_LineFromText OGC functions - PostGIS

Loading Well-Known-Text (WKT) using ST_PointFromText, ST_LineFromText, ST_MPointFromText, MLineFromText, MPolyFromText, PolyFromText

Syntax: ST_PointFromText(wkt representation, SRID)
PointFromText(wkt representation) Note the versions that don't take an SRID return unprojectable geometries so should probably be avoided

Note: Beginning with 1.2.1+ - the names without the ST_ prefix are preferred over the non-ST prefixed ones to conform to newer OGC standards. In future releases the non-ST names will be deprecated and eventually removed from PostGIS.

ST_PointFromText (previously known as PointFromText) and ST_LineFromText (previously known as LineFromText) and the other <Geometry Type>FromText functions are very similar to the all-purpose ST_GeomFromText, except that PointFromText only converts Point WKT geometry representations to a geometry object and returns null if another kind of Geometry is passed in. LineFromText similarly only converts WKT representations of Lines to geometry and returns null if another WKT representation is passed in.

If you look at the current underlying implementation of these functions in PostGIS (1.2.1 and below), you will observe that they in fact call ST_GeomFromText, but if the geometry type of the generated geometry is not a Point or a LineString respectively, then the functions return null.

The M*FromText, ST_M*FromText functions create Multi geometry types from the WKT representation of those types.

Since these function do have some added overhead (not much but some), it makes sense to just use ST_GeomFromText unless you have a mixed bag of WKT geometries coming in and only wish to selectively insert say the Points or the Lines or if just for plain readability to make it clear that you are indeed inserting Points or lines or MultiLineStrings etc.


UPDATE points_of_interest
        SET thepoint_lonlat = PointFromText('POINT(' || longitude || ' ' || latitude || ')',4326)

PostGIS Simplify

Simplify: Reduce the weight of a geometry

Simplify reduces the complexity and weight of a geometry using the Douglas-Peucker algorithm. It in general reduces the number of vertices in a polygon, multipolygon, line, or multi-line geometry. It takes 2 arguments, the geometry and the tolerance. To demonstrate we will apply the simplify function on Boston neighborhoods that are in SRID 2249 (NAD83 / Massachusetts Mainland (ftUS)) and compare the sizes of the shape files if we were to dump these out as ESRI shape files. We will also show pictorially how simplify changes the geometries.

Original: Size of Shp - 95 kb

SELECT the_geom 
	FROM neighborhoods

Simplify 500: Size of Shp - 7 kb

SELECT simplify(the_geom,500) as simpgeom 
	FROM neighborhoods

Simplify 1000: Size of Shp - 5 kb

SELECT simplify(the_geom,1000) as simpgeom 
	FROM neighborhoods

Note: When simplifying longlat data, you often get very strange results with simplify. It is best to first transform data from longlat to some other coordinate system, then simplify and then transform back to longlat. So something like SELECT transform(simplify(transform(the_geom, 2249), 500),4326) from neighborhoods

Summary (pre 1.3.1 name), ST_Summary (+1.3.1)

ST_Summary(geometry) gives you a brief summary of a geometry telling you how many simple geometries, rings and type of geometry it is.


select ST_Summary(the_geom) from neighborhoods;

Output of the above ran from psql looks like this

 MultiPolygon[BS] with 1 elements
   Polygon[] with 2 rings
    ring 0 has 649 points
    ring 1 has 10 points

 MultiPolygon[BS] with 1 elements
   Polygon[] with 1 rings
    ring 0 has 182 points

 MultiPolygon[BS] with 2 elements
   Polygon[] with 1 rings
    ring 0 has 319 points
   Polygon[] with 1 rings
    ring 0 has 4 points

 MultiPolygon[BS] with 2 elements
   Polygon[] with 1 rings
    ring 0 has 376 points
   Polygon[] with 1 rings
    ring 0 has 68 points

 MultiPolygon[BS] with 3 elements
   Polygon[] with 1 rings
    ring 0 has 215 points
   Polygon[] with 2 rings
    ring 0 has 4 points
    ring 1 has 23 points
   Polygon[] with 1 rings
    ring 0 has 45 points

 MultiPolygon[BS] with 1 elements
   Polygon[] with 1 rings
    ring 0 has 71 points

 MultiPolygon[BS] with 1 elements
   Polygon[] with 2 rings
    ring 0 has 124 points
    ring 1 has 5 points

 MultiPolygon[BS] with 1 elements
   Polygon[] with 1 rings
    ring 0 has 199 points

 MultiPolygon[BS] with 1 elements
   Polygon[] with 1 rings
    ring 0 has 197 points

 MultiPolygon[BS] with 1 elements
   Polygon[] with 1 rings
    ring 0 has 168 points

 MultiPolygon[BS] with 1 elements
   Polygon[] with 1 rings
    ring 0 has 400 points

 MultiPolygon[BS] with 1 elements
   Polygon[] with 1 rings
    ring 0 has 355 points

 MultiPolygon[BS] with 1 elements
   Polygon[] with 1 rings
    ring 0 has 176 points

 MultiPolygon[BS] with 1 elements
   Polygon[] with 1 rings
    ring 0 has 392 points

 MultiPolygon[BS] with 1 elements
   Polygon[] with 2 rings
    ring 0 has 1965 points
    ring 1 has 29 points

PostGIS: ST_Translate

ST_Translate, Translate

The ST_Translate function takes any geometry (linestring, multiline etc) returns a new geometry that is the original geometry moved by a vector defined by X,Y,Z. Note the units of measurement are always in the units of the spatial reference system of the geometry argument. There are two forms of it ST_Translate. ST_Translate(geometry, X, Y, Z) and ST_Translate(geometry, X,Y).

In the below example we use the translate function to create a grid that divides the extent of Boston into into 200x200 rectangles (40,000). By doing the following

Create a reference box starting at the origin of our extent of Massachusetts that is of dimension 1485x911 meters - in quasi OGC notation - BOX(xorigin yorigin, (xorigin + 1485) (yorigin + 911)) -> BOX(33863.73046875 777606.3125,35348.73046875 778517.3125)
Next take this box and use it as a paint brush to paint across and then down by translating it hor.n*1485, ver.n*911


CREATE TABLE throwaway_grid(gid SERIAL PRIMARY KEY);
SELECT AddGeometryColumn('public', 'throwaway_grid', 'the_geom', 26986, 'POLYGON', 2);

INSERT INTO throwaway_grid(the_geom)
SELECT ST_translate(ref.boxrep, hor.n*width, ver.n*height) As slice
FROM
(SELECT ST_xmin(ST_Extent(the_geom)) As xstart, ST_xmin(ST_Extent(the_geom)) as ystart, ST_SetSRID(CAST('BOX(33863.73046875 777606.3125,35348.73046875 778517.3125)' as box2d), 26986) as boxrep,
ceiling((ST_xmax(ST_Extent(the_geom)) - ST_xmin(ST_Extent(the_geom)))/200) as width, 
    ceiling((ST_ymax(ST_Extent(the_geom)) - ST_ymin(ST_Extent(the_geom)))/200) as height
FROM towns) As ref, generate_series(1,200) as hor(n), generate_series(1,200) as ver(n);

CREATE INDEX idx_throwaway_grid_the_geom ON throwaway_grid USING gist(the_geom);

The resulting image of this looks like this. Massachusetts Extent diced into 40,000 squares
Take a look at Map Dicing to see an example of how you can use this in conjunction with ST_Interseciton to dice up maps into smaller pieces.

The translate function can be used in conjunction with other functions such as Rotate to do interesting things like rotate a geometry about a point or create ellipses such as described here

Table Of Contents

OpenStreetMap and OpenLayers Tutorials

PostGIS, pgRouting, and PostgreSQL Tutorials

SpatiaLite Tutorials

Miscellaneous Tutorials/Cheatsheets/Examples

SQL Server 2008 Tutorials

UMN Mapserver Tutorials

PostGIS Code Snippets

OpenStreetMap and OpenLayers Tutorials

Some PostGIS gotchas before we start-- that you may or may not need to address

Getting Mapnik up

Building the Tiles

Figure out extent to build for

Generate the osm styling xml and prepare tile directory

Test rendering

Generate Massachusetts Tiles

Loading the OSM Planet data

Spot checking the tables

Index your hstore column

Query the data

What is Open Layers?

Downloading the classes and setting up the Base page

Dissecting the Pieces of the Application

Adding the necessary script references

Creating the Open Layers map Object and adding layers

Centering the map and adding the layer switcher

Initializing Layers Off

Forcing a layer to be a Base Layer

To Tile or Not to Tile

Use Untiled in the following scenarios

Use tiled during the following scenarios

Creating your own Tile Layer Class

Using your custom tile layer class

PostGIS, pgRouting, and PostgreSQL Tutorials

Windows specific instructions

Install PostGIS and pgRouting

Using OSM2PO to prepare OSM data for pgRouting use

Directions

A generic solution to PostGIS nearest neighbor

The parts of the solution

Key features

Caveats

Use Cases

More Advanced Queries

Example of two simultaneous neighbor queries in one query

Example limit distance exclusion queries

Caching data by storing closest neighbor in a table

What Is PostGIS?

Installing PostgreSQL with PostGIS Functionality

Creating a spatial database using SQL

Creating a spatial database using pgAdmin GUI

Loading GIS Data Into the Database

Get the Data

Figure out SRID of the data

Loading the Data

Make a PostGIS mini toolkit

Load Towns data

Indexing the data

Querying Data

Viewing the Data

What Is PostGIS?

Installing PostgreSQL with PostGIS Functionality

Creating a spatial database

Loading GIS Data Into the Database

Get the Data

Figure out SRID of the data

Loading the Data

Make a PostGIS mini toolkit

Load Towns data

Indexing the data

Querying Data

Viewing the Data

What is SFSQL?

Some Common Queries

Transforming from One Coordinate System to Another

NAD 83 Meters to NAD 83 Ft

Getting Latitude and Longitude Centroid

Aggregate Functions

Extent

ST_Union