This is a continuation of our Loading OpenStreetMap data in PostGIS. In this tutorial we will build a tile cache of the Massachusetts data we loaded in Part 1 and then render it in OpenLayers. We will be using Mapnik toolkit to do this.
In PostgreSQL database -- had to add these 2 columns because it was looking for it and didn't exist in my dump:
ALTER TABLE planet_osm_polygon ADD COLUMN "addr:housename" text; ALTER TABLE planet_osm_point ADD COLUMN "addr:housename" text;
The Mapnik scripts use the PostGIS deprecated functions Extent,AsBinary, and SetSRID to name a few which were removed in PostGIS 2.0. You will sadly need to run the legacy.sql for now to get back those functions if you are testing with PostGIS 2.0. The legacy.sql is installed in the contrib/share/postgis-2.0 folder.
;C:\Program Files\HOTOSM\python\2.5\site-packages
cd C:\Program Files\HOTOSM\demo\python c:\python26\python rundemo.py
You may also want to take a look at http://weait.com/content/build-your-own-openstreetmap-server if some of what we say doesn't make sense to you.
Once you have Mapnik installed, you want to download the python scripts for generating OpenStreetMap tiles. which you can get from openstreetmap subversion repository http://svn.openstreetmap.org/applications/rendering/mapnik/ and then copy these into a folder say osm_mapnik.
For ease of use, we have exported these out of the subversion repository and made the latest available (as of May 18th,2011) -- at http://www.bostongis.com/downloads/osm_mapnik_r26011.zip
Note: If you have an svn client you can download the folder yourself with command svn co http://svn.openstreetmap.org/applications/rendering/mapnik/.
svn co http://svn.openstreetmap.org/applications/rendering/mapnik/
cd osm_mapnik #this is the folder you extracted the scripts wget http://tile.openstreetmap.org/world_boundaries-spherical.tgz tar xvzf world_boundaries-spherical.tgz wget http://tile.openstreetmap.org/processed_p.tar.bz2 tar xvjf processed_p.tar.bz2 -C world_boundaries wget http://tile.openstreetmap.org/shoreline_300.tar.bz2 tar xjf shoreline_300.tar.bz2 -C world_boundaries wget http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/cultural/110m-admin-0-boundary-lines.zip unzip 110m-admin-0-boundary-lines.zip -d world_boundaries wget http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/10m-populated-places.zip unzip 10m-populated-places.zip -d world_boundaries
Before we begin, we need to figure out the extent of our tiles. We do that with this SQL Statement (sadly there is data in the osm ma dataset that far exceeds the extents of Massachusetts), so I opted to use a US states table I had lying around instead of ST_Extent of any of the osm tables:
SELECT ST_Extent(the_geom) As wgs_84, ST_Extent(ST_Transform(the_geom,900913)) as web_merc FROM tiger.states WHERE stusps = 'MA';
BOX(-73.508142 41.187053,-69.858861 42.88679) --wgs_84 BOX(-8182888.93659964 5011826.70961612,-7776652.83391808 5265667.90278576) --web_merc
A little bit about the osm.xml. This file contains all the styling rules for generating the tiles and serves as a template for building your final styling .xml. It defines the icons to use, how to draw the lines and so forth. If you are not happy with the default mapnik styling and want to highlight certain features differently, you can manually edit this xml file and resave it under say osm_custom.xml before you start to build you run the generate_xml.py and use that where we are using osm.xml.
Note: If you are on windows and want to put the files in non C drive -- do a mkdir D:/osm_tiles or whatever folder you want
mkdir /osm_tiles generate_xml.py osm.xml /osm_tiles/ma_osm.xml --dbname osm --user postgres --port 5432 --extent -8182889,5011827,-7776653,5265668 --accept-none
Next we edit the generate_image.py file and change the ll to -73.508142,41.187053,-69.858861,42.88679 and then at command line:
-73.508142,41.187053,-69.858861,42.88679
generate_image.py
Confirm that the image.png generated looks like Massachusetts. It will be very high-res around 8 MB and look something like:
Next resave the generate_tiles.py as generate_tiles_ma.py. Edit generate_times_ma.py and the render_tiles section way at the bottom,change to:
bbox = (-73.508142,41.187053,-69.858861,42.88679) render_tiles(bbox, mapfile, tile_dir, 6, 8, "Massachusetts") minZoom = 10 maxZoom = 17 bbox = (-73.508142,41.187053,-69.858861,42.88679) render_tiles(bbox, mapfile, tile_dir, minZoom, maxZoom)
set MAPNIK_MAP_FILE=/osm_tiles/ma_osm.xml set MAPNIK_TILE_DIR=/osm_tiles set HOME=/osm_tiles generate_tiles_ma.py
If you just want to generate for Boston
Resave generate_tiles.py as generate_tiles_boston.py and change the tiles section to:
bbox = (-71.2,42.23,-70.99, 42.4) render_tiles(bbox, mapfile, tile_dir, 10, 17, "Boston")
set MAPNIK_MAP_FILE=/osm_boston_tiles/ma_osm.xml set MAPNIK_TILE_DIR=/osm_boston_tiles set HOME=/osm_boston_tiles generate_tiles_boston.py
In Part 3 of our series, we will show you how to use your own custom tiles in OpenLayers.
For this exercise, we will download Massachusetts OSM data and then load it into our PostGIS spatially enabled PostgreSQL database. The OSM data contains roads, points of interests, building footprints, administrative boundaries, addresses, and too many other things to itemize. Note that much of the data provided in OSM for Massachusetts is provided by our very own Massachusetts Office of Geographic Information (MassGIS) as well as contributions from people like you. Now on with the show.
CREATE EXTENSION hstore;
In order to load OpenStreetMap .OSM XML files, you will need osm2pgsql which you can find out more about at http://wiki.openstreetmap.org/wiki/Osm2pgsql. There are compiled binaries available for many Linux variants, Windows, and Mac OSX.
If you are on windows, go here http://wiki.openstreetmap.org/wiki/Osm2pgsql#Windows_XP. If you plan to build map tiles with the data later, we recommend the HOTOSM package which installs osm2pgsql as well as MapNik and OSMOSIS. These will be useful for generating tiles.
Note: IF you plan to setup a mapping tile server with OSM data later, check out Dane Springmeyer's Mapnik tutorials: http://www.dbsgeo.com/.
osm2pgsql massachusetts.osm.bz2 -d osm -U postgres -P 5432 -S default.style --hstore
If you want to load additional states, use the --append option switch. So for example if I wanted to load neighboring states like New Hampshire, I would download New Hampshire and then follow with this command.
osm2pgsql new_hampshire.osm.bz2 --append -d osm -U postgres -P 5432 -S default.style --hstore
osm2pgsql SVN version 0.69-21289M Using projection SRS 900913 (Spherical Mercator) Setting up table: planet_osm_point NOTICE: table "planet_osm_point" does not exist, skipping NOTICE: table "planet_osm_point_tmp" does not exist, skipping Setting up table: planet_osm_line NOTICE: table "planet_osm_line" does not exist, skipping NOTICE: table "planet_osm_line_tmp" does not exist, skipping Setting up table: planet_osm_polygon NOTICE: table "planet_osm_polygon" does not exist, skipping NOTICE: table "planet_osm_polygon_tmp" does not exist, skipping Setting up table: planet_osm_roads NOTICE: table "planet_osm_roads" does not exist, skipping NOTICE: table "planet_osm_roads_tmp" does not exist, skipping Mid: Ram, scale=100 !! You are running this on 32bit system, so at most !! 3GB of RAM can be used. If you encounter unexpected !! exceptions during import, you should try running in slim !! mode using parameter -s. Reading in file: massachusetts.osm Processing: Node(10082k) Way(621k) Relation(2k) Node stats: total(10082538), max(1202366398) Way stats: total(621446), max(104206285) Relation stats: total(2846), max(1463423) Writing way(621k) Writing rel(2k) Committing transaction for planet_osm_point Sorting data and creating indexes for planet_osm_point Completed planet_osm_point Committing transaction for planet_osm_line Sorting data and creating indexes for planet_osm_line Completed planet_osm_line Committing transaction for planet_osm_polygon Sorting data and creating indexes for planet_osm_polygon Completed planet_osm_polygon Committing transaction for planet_osm_roads Sorting data and creating indexes for planet_osm_roads Completed planet_osm_roads
If your data loaded, you should see three new tables all with a column called way that holds the PostGIS geometry and another column called tags which holds the hstore key value pairs.
The way column holds the PostGIS geometry in spherical web mercator projection or if you used the reproject switch, a different projection. NOTE that while spherical mercator is good for web mapping display, it sucks for measuring distances, area or anything that has to do with measurement. We'll talk about that later. So in your database you should see these 3 tables:
There is some data available in Hstore that is just not available in any of the columns. Some of the more commonly used tags, you will find as columns in the data. With that said, we will index our hstore columns with these SQL commands.
CREATE INDEX idx_planet_osm_point_tags ON planet_osm_point USING gist(tags); CREATE INDEX idx_planet_osm_polygon_tags ON planet_osm_polygon USING gist(tags); CREATE INDEX idx_planet_osm_line_tags ON planet_osm_line USING gist(tags);
Now for a simple query to pull all sushi places. Sadly it seems the sushi offering is not very complete:
SELECT name, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext -- tags FROM planet_osm_point WHERE tags @> 'cuisine=>sushi'::hstore; -- Result -- name | pt_lonlattext -------------+------------------------------- Moshi Moshi | POINT(-72.6285103 42.3202165) Mr Sushi | POINT(-71.1553199 42.4162195)
SELECT name, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext -- tags FROM planet_osm_point WHERE tags @> 'cuisine=>sushi'::hstore;
This you can write one of two ways. Using the hstore tag (second query) is faster since its indexed.
SELECT DISTINCT amenity, tags->'source_url' As source FROM planet_osm_point WHERE amenity > '' ORDER BY amenity;
SELECT DISTINCT tags->'amenity' As amenity, tags->'source_url' As source FROM planet_osm_point WHERE tags ? 'amenity' ORDER BY tags->'amenity'; amenity | source ----------------------------+--------------------------------------------------------------------- : bus_station | cafe | campsite | : | cinema | City Hall | http://mass.gov/mgis/townhalls.htm Clinic | college | : library | http://mass.gov/mgis/libraries.htm :
SELECT DISTINCT tags->'amenity' As amenity, tags->'source_url' As source FROM planet_osm_point WHERE tags ? 'amenity' ORDER BY tags->'amenity';
Open Layers is an opensource client-side JavaScript/AJAX framework for overlaying various mapping services. It supports various mapping apis such as Google, Yahoo, Microsoft Virtual Earth, OGC WMS, OGC WFS, KaMap, Text layers, and Markers to name a few. The nice thing about it being a pure client-side implementation is that you can drive it with any server language such as ASP.NET, PHP, PERL and for simple maps, embed directly into a plain html file. There is minimal requirement from the webserver if you are using publicly available or subscription layers.
In the next couple of sections, we'll test drive OpenLayers by overlaying various layers from Boston.
We will be using the 2.2 API which you can download from http://openlayers.org/download/OpenLayers-2.2.zip
Copy the contents of http://www.bostongis.com/demos/OpenLayers/example1.htm to your blank file.
In the above example which you can see from the link above, we have drawn a Microsoft Virtual Earth Map layer, and 3 WMS layers - Boston Neighborhood Boundary, Boston Mainstreets Boundary, and Boston Water Taxi stops. Two of our WMS layers come from a UMN Mapserver dishing out two layers at once and our Water Taxi layer is being dished out by Mass GIS Geoserver - http://lyceum.massgis.state.ma.us/wiki/doku.php?id=history:home .
First for any 3rd party mapping services you will be using, you need to include the libraries in addition to the OpenLayers library file and these should be included before your OpenLayers include. The code to do that is on line 11 and 12 of the htm file and look like this. 11: <script src='http://dev.virtualearth.net/mapcontrol/v3/mapcontrol.js'></script> 12: <script src="ol22/OpenLayers.js"></script>
11: <script src='http://dev.virtualearth.net/mapcontrol/v3/mapcontrol.js'></script> 12: <script src="ol22/OpenLayers.js"></script>
Alternatively, you can use the latest release OpenLayers.js directly from the OpenLayers site by replace line 12 with 12: <script src="http://openlayers.org/api/OpenLayers.js"></script>
12: <script src="http://openlayers.org/api/OpenLayers.js"></script>
Next we create a blank div with id=map that has the dimensions we want and position that where we want on the page. Our map will live there. 16: <div id="map" style="width: 400px; height: 400px"></div>
16: <div id="map" style="width: 400px; height: 400px"></div>
Next we write the javascript code to create the map and load into our div and add the layers 18: var map = new OpenLayers.Map(document.getElementById("map")); 19: var dndwmsurl = "http://dndmaps.cityofboston.gov/mapserv/scripts/mapserv410/mapserv410.exe?map=\\mapserv\\dndwms\\dndbasepg.map&" 20: 21: map.addLayer(new OpenLayers.Layer.VirtualEarth("VE")); 22: 23: /**Note we don't have to specify an SRS, Service or Request for WMS layers below. 24: OpenLayer will ask for projection based on base our base layer, EPSG:4326, Service: WMS, Request: GetMap. 25: We chose image/gif because IE6 and below doesn't natively support transparency for png without a hack. **/ 26: wmstaxi = new OpenLayers.Layer.WMS("MASSGIS Boston Taxi Stops", "http://giswebservices.massgis.state.ma.us/geoserver/wms", 27: {layers: "massgis:GISDATA.WATERTAXISTOPS_PT", transparent: "true", format: "image/gif"}, 28: {tileSize: new OpenLayers.Size(400,400), buffer: 1 }) 29: map.addLayer(wmstaxi) 30: 31: var wmsbos = new OpenLayers.Layer.WMS("Boston Neigborhoods and Mainstreets",dndwmsurl, 32: {layers: "neighborhoods,mainstreets", transparent:"true",format: "image/gif"}, 33: {tileSize: new OpenLayers.Size(400,400), buffer: 1 }); 34: 35: map.addLayer(wmsbos);
18: var map = new OpenLayers.Map(document.getElementById("map")); 19: var dndwmsurl = "http://dndmaps.cityofboston.gov/mapserv/scripts/mapserv410/mapserv410.exe?map=\\mapserv\\dndwms\\dndbasepg.map&" 20: 21: map.addLayer(new OpenLayers.Layer.VirtualEarth("VE")); 22: 23: /**Note we don't have to specify an SRS, Service or Request for WMS layers below. 24: OpenLayer will ask for projection based on base our base layer, EPSG:4326, Service: WMS, Request: GetMap. 25: We chose image/gif because IE6 and below doesn't natively support transparency for png without a hack. **/ 26: wmstaxi = new OpenLayers.Layer.WMS("MASSGIS Boston Taxi Stops", "http://giswebservices.massgis.state.ma.us/geoserver/wms", 27: {layers: "massgis:GISDATA.WATERTAXISTOPS_PT", transparent: "true", format: "image/gif"}, 28: {tileSize: new OpenLayers.Size(400,400), buffer: 1 }) 29: map.addLayer(wmstaxi) 30: 31: var wmsbos = new OpenLayers.Layer.WMS("Boston Neigborhoods and Mainstreets",dndwmsurl, 32: {layers: "neighborhoods,mainstreets", transparent:"true",format: "image/gif"}, 33: {tileSize: new OpenLayers.Size(400,400), buffer: 1 }); 34: 35: map.addLayer(wmsbos);
A couple of notes about the above code that may not be entirely obvious
The layer switcher is the little + sign on the right side that shows you the layers active and allows you to deactivate a layer or reactivate a layer.
37: var boston = new OpenLayers.LonLat(-71.0891380310059,42.3123226165771); 38: map.setCenter(boston, 10); 39: map.addControl(new OpenLayers.Control.LayerSwitcher());
In this tutorial we will just show some example snippets of using OpenLayers that we have found most useful. We will be assuming Open Layers 2.4 and above.
26: wmstaxi = new OpenLayers.Layer.WMS("MASSGIS Boston Taxi Stops", "http://giswebservices.massgis.state.ma.us/geoserver/wms", 27: {layers: "massgis:GISDATA.WATERTAXISTOPS_PT", transparent: "true", format: "image/gif"}, 28: {tileSize: new OpenLayers.Size(400,400), buffer: 1, visibility: false })
If you want to turn off a layer after initialization, then you would use the setVisibility method of a layer. For example - mylayer.setVisibility(false). This would be useful say if you did not want to use the built-in layer switcher, but instead for example have sets of layers display and not display based on others.
Within an OpenLayers map, you can have only one layer selected as Base. This layer will control the projection of other layers among other things. In the layer switcher, you will see base layer options marked at the top in a category section called Base Layer. Each option will have a radio button instead of a checkbox to prevent from selecting more than one. By default the first Base Layer added to the map becomes the active base layer.
Certain layers have to be base layers because they are not transparent and there are other things you can't control about them like projection and so forth. For example commercial layers such as Google, Yahoo, and Virtual Earth are just assumed to be base layers so you don't have to explicitly state them to be and actually can't force them not to be.
If you have a WMS layer for example, it can be used as a base layer or an overlay layer. If you don't explicitly state it is a base layer, it will be assumed to be an overlay layer. So how do you force baseness you ask, simple by setting the isBaseLayer property of the layer as shown below.
map.addLayer(new OpenLayers.Layer.WMS.Untiled("MassGIS Boundary", "http://giswebservices.massgis.state.ma.us/geoserver/wms", {'layers': "massgis:GISDATA.BOUNDARY_ARC",'transparent':"true",'format': "image/png"}, {'isBaseLayer':true}));
There are currently two kinds of layers in which you have the option of tiling and not tiling. These are WMS layers and Mapserver layers. Each of these are exposed as separate layer classes WMS, WMS.Untiled, Mapserver, Mapserver.Untiled. In what circumstances would you want to tile and other circumstances where you would not want to tile. To first answer this question, its useful to think about what happens when you tile.
When maps are tiled, the map viewport is broken out into several quandrants - which tries to approximate what you have set for TileSize, but often falls short. So for each load of the screen, multiple requests sometimes in the order of 60 per layer paint is requested. You can imagine this can be a considerable strain on a weak server and especially silly when what you are asking for is a light boundary file of say Boston. In the untiled case, only one request is made per layer, but the request is usually on the order of 4 times larger than the view port to allow for smoother panning effects at the cost of a little slower load time. Outlined below are the cases where we've found tiling or untiling useful. Some are a matter of taste and some are a matter of practicality.
An example of an untiled layer is shown in our visibility discussion. An example of an untiled is shown in our Base Layer discussion.
In this third Part of our OpenStreetMap series we will demonstrate how to use the osm tiles we built in Part 2: Building Tiles with PostGIS OpenStreetMap data.
This first part is optional: Note that when you built the tiles, as you got deeper and deeper levels, therw were a lot more tiles and even worse, a lot of those damn tiles were empty. I personally hate carrying around thousands of files of nothingness. It takes a long time to upload nothing. So what I do first is delete those nothing tiles which there seem to be more of than real tiles before I upload them to my server. If you are on Windows, the easiest way to do that is with a command line VBscript. Here is a little script to get rid of nothing. Before you do this, set aside one blank tile to use when you get a 404 request. This will only work on Windows. I haven't figured out the complimentary command for Linux yet. Simply save the contents of this code to a file called delete_osm_blank_tiles.vbs and edit the last line to point to your osm maptiles cache. Then run the script at a command line with cscript delete_osm_blank_tiles.vbs.
Sub RecursiveDeleteBlankOSMImages(ByVal strDirectory) Dim objFolder, objSubFolder, objFile Dim strExt Dim strExtensionsToDelete Dim lngSize lngSize = 104 ' The threshold in bytes (this is 104bytes - anything less seems to be a blank image) strExtensionsToDelete = "png" Set objFolder = objFSO.GetFolder(strDirectory) For Each objFile in objFolder.Files For Each strExt in Split(Ucase(strExtensionsToDelete),",") If Right(Ucase(objFile.Path), Len(strExt)+1) = "." & strExt AND objFile.Size < lngSize Then wscript.echo "Deleting:" & objFile.Path objFile.Delete End If Next Next For Each objSubFolder in objFolder.SubFolders RecursiveDeleteBlankOSMImages objSubFolder.Path Next End Sub Dim objFSO Set objFSO = CreateObject("Scripting.FileSystemObject") RecursiveDeleteBlankOSMImages "C:\osm_matiles\"
OpenLayers.Util.OSMLocal = {}; /** * Constant: MISSING_TILE_URL * {String} URL of image to display for missing tiles */ OpenLayers.Util.OSMLocal.MISSING_TILE_URL = "http://localhost/images/404.png"; /** * Function: onImageLoadError */ OpenLayers.Util.onImageLoadError = function() { this.src = OpenLayers.Util.OSMLocal.MISSING_TILE_URL; }; /** * Class: OpenLayers.Layer.OSM.LocalMassachusettsMapnik * * Inherits from: * - <OpenLayers.Layer.OSM> */ OpenLayers.Layer.OSM.LocalMassachusettsMapnik = OpenLayers.Class(OpenLayers.Layer.OSM, { /** * Constructor: OpenLayers.Layer.OSM.LocalMassachuettsMapnik * * Parameters: * name - {String} * options - {Object} Hashtable of extra options to tag onto the layer */ initialize: function(name, options) { var url = [ "http://localhost/osm_matiles/${z}/${x}/${y}.png", "http://localhost/osm_matiles/${z}/${x}/${y}.png", "http://localhost/osm_matiles/${z}/${x}/${y}.png" ]; options = OpenLayers.Util.extend({ numZoomLevels: 19, buffer: 0, transitionEffect: "resize" }, options); var newArguments = [name, url, options]; OpenLayers.Layer.OSM.prototype.initialize.apply(this, newArguments); }, CLASS_NAME: "OpenLayers.Layer.OSM.LocalMassachusettsMapnik" });
Now you link in this file as you would OpenStreetMap file and use this new class as you would any other OpenStreetMap OpenLayer layer class and you can even create ones with different themes if you want by creating a different set of tiles and theming them differently. That's it folks. Here is a complete version just containing Massachusetts and neighboring State tiles with a complimentary Town boundary layer dished out by our local MassGIS government agency.
<html><title>OpenStreetMap with Local Tile Cache</title> <script src="http://www.openlayers.org/api/OpenLayers.js"></script> <script src="js/OpenStreetMapLocal.js"></script> <script> var lat= 42.06651; //POINT(-71.7169041387889 42.0665181226572) var lon=-71.71690; var zoom=5; var map; //complex object of type OpenLayers.Map var ls = new OpenLayers.Control.LayerSwitcher(); var mgisurls = ["http://giswebservices.massgis.state.ma.us/geoserver/wms"]; var prjwgs84 = new OpenLayers.Projection("EPSG:4326"); function init() { //var boundsms = new OpenLayers.Bounds(33869, 772032, 335919, 959719) //EPSG:26986 var boundswm = new OpenLayers.Bounds(-8182919, 5035362.5,-7784059.5, 5304535) //EPSG:900913 Web Mercator (native projection of tiles) map = new OpenLayers.Map ("map", { controls:[ new OpenLayers.Control.Navigation({'mouseWheelOptions': {interval: 100}}), new OpenLayers.Control.PanZoomBar(),ls , new OpenLayers.Control.Attribution(), new OpenLayers.Control.MousePosition() ], maxExtent: boundswm, maxResolution: 700, numZoomLevels: 18, units: 'm', projection: new OpenLayers.Projection("EPSG:900913"), displayProjection: new OpenLayers.Projection("EPSG:4326") } ); var lyrMapnik = new OpenLayers.Layer.OSM.LocalMassachusettsMapnik("OSM Mapnik Massachusetts", {'isBaseLayer': true}); var lyrTowns = new OpenLayers.Layer.WMS("Towns", mgisurls, { 'layers': "massgis:GISDATA.TOWNS_POLYM", 'styles': "", 'transparent': "true", 'FORMAT': "image/png"}, { 'isBaseLayer': false, 'attribution': '<br /><a href="http://www.mass.gov/mgis/">MASSGIS (MA ITD)</a>', 'visibility': false, 'buffer': 1, 'singleTile':false, 'tileSize': new OpenLayers.Size(240,150) }) map.addLayers([lyrMapnik, lyrTowns]); if( ! map.getCenter() ){ var lonLat = new OpenLayers.LonLat(lon, lat).transform(new OpenLayers.Projection("EPSG:4326"), map.getProjectionObject()); map.zoomToExtent(boundswm); ls.maximizeControl(); } } </script> <body> <div style="width:880; height:550" id="map"></div> <script type="text/javascript" defer="true"> init(); </script> </body> </html>
To see it in action go to http://www.bostongis.com/demos/OpenLayers/mapnik_localosm.htm
We gave a presentation on writing PostGIS spatial queries at OSCon 2009 last week. Slides will be available on the site, but you can download them here as well.
Some of the sample data were were using you can grab here
PostGIS 1.4 is finally out as well, and hopefully should be hitting your favorite distros soon if it hasn't already. For windows users, we have compiled binaries http://postgis.refractions.net/download/windows/ for 8.2,8.3, and 8.4 and hope to have the stack builder installs sometime next week.
We gave a lecture at PGCon 2009. Below are the links and the generated data set. Some of the examples use features in PostgreSQL 8.4 and PostGIS 1.4, but most should be applicable to earlier versions. The data generation part does use PostgreSQL 8.4 WITH RECURSIVE construct, but not in a terribly useful way.
Lots of great presentations at PGCon2009 -- the slides should be up shortly for all the presentations.
For display we use OpenJump. Flash Movie of OpenJump rendering the autogenerated town
Windows users: There are PostGIS 1.4 Windows binaries currently stack builder installer will be available shortly hhttp://postgis.refractions.net/download/windows/. Please note nothing has changed in the binaries between 1.4RC2 and 1.4 official except for the version number stamped on them when you look at SELECT postgis_full_version()
SELECT postgis_full_version()
If you want to play with the generated data set - do the following
ALTER DATABASE pgcon2009 SET search_path=assets, public;
For this exercise we are going to demonstrate how to use OSM2PO and an OSM pbf to create a routing network. Much of this is borrowed from Anita Graser's tutorial OSM2PO QuickStart
For our exercise, we love using the Teczno Metro Extracts. These are great and have been recently updated April 2013 as of this writing. They are separate OSM feeds for major world cities, and are packaged in various formats Standard: osm xml, osm pbf (compressed OSM format), and shape files. We'll be using Boston for this tutorial, but you can swap Boston with your favorite metro destination.
These instructions are the same for all OS if your PostGIS build was compiled with raster support (don't get extension feature without raster, sorry) ad using latest development version of pgRouting (1.07+) that now includes CREATE EXTENSION support.
In an existing database or new db run
CREATE EXTENSION postgis; CREATE EXTENSION pgrouting;
OSM2PO is both a routing engine and a OSM 2 pgRouting loader written in Java. So it's crossplatform assuming you have a Java runtime loaded on your computer
java -Xmx512m -jar osm2po-core-4.7.7-signed.jar prefix=hh tileSize=x http://osm-extracted-metros.s3.amazonaws.com/boston.osm.pbf
If Java is not in your OS path, you will have to use the full path instead of just java. E.g. on windows would be something like (C:\Program Files (x86)\Java\jre7\bin\java)
http://localhost:8888/Osm2poService
You should now have an SQL file in the hh folder called: hh_2po_4pgr.sql
Load this file up with PSQL. Note this file will be generally too big to use pgAdmin, so you need to use PSQL. If you are on windows you can create a batch script with something like this:
set PSQL="C:\Program Files\PostgreSQL\9.2\bin\psql" set PGPORT=5432 set PGHOST=localhost set PGPASSWORD=something cd hh %PSQL% -U postgres -d mydb -q -f "hh_2po_4pgr.sql" pause
and then execute the batch script by Right Click -> Run As Administrator
For Linux you can set the settings much the same except use ${PSQL} instead of %PSQL% and get rid of the set So Unix/Linux sh would look something like below. If psql is not in your path, you might have to give full path to psql executable.
PSQL="psql" PGPORT=5432 PGHOST=localhost PGPASSWORD=something cd hh ${PSQL} -U postgres -d mydb -q -f "hh_2po_4pgr.sql" pause
Create a view to get in structure wanted by astar_sp_delta.
CREATE OR REPLACE VIEW vw_boston_routing AS SELECT id As gid, osm_id, osm_name, osm_meta, osm_source_id, osm_target_id, clazz, flags, source, target, km, kmh, cost, cost as length, reverse_cost, x1, y1, x2, y2, geom_way As the_geom FROM hh_2po_4pgr;
Okay now for the fun part, putting pgRouting to work.
Let's compute the shortest path using the source: 131904 and Target: 17500 we had pictured. The function that takes advantage of spatial index is called astar_sp_delta. The alternative one shortest_path_astar (took 1 second) takes an SQL statement, but for this example is about 4 times slower since it doesn't utilize geometry.
The 0.1 is in units of the spatial reference (in this case degrees) and is bounding box expansion to expand the bounding box that includes both source and target vertex).
The astar_sp_delta function will return a table consisting of columns (id, gid, the_geom) which you can then join back with your input data to get additional elements. The below queries took about 250ms on my 32-bit PostgreSQL 9.2 on windows 7.
If you want to map the route, you'll need the geometry. As shown in this query
SELECT r.id, s.gid, s.osm_name,s.cost, s.km, s.kmh , r.the_geom FROM astar_sp_delta('vw_boston_routing', 131904, 17500, 0.1) As r INNER JOIN vw_boston_routing AS s ON r.gid = s.gid ;
If you just want the edges output and don't need to render it, leave out the geometry column.
SELECT r.id, s.osm_name FROM astar_sp_delta('vw_boston_routing', 131904, 17500, 0.1) As r INNER JOIN vw_boston_routing AS s ON r.gid = s.gid ;
id | osm_name ---+--------------------------- 1 | Kent Street 2 | Kent Street 3 | Beacon Street : 19 | Cambridge Street 20 | Cambridge Street : 68 | Merrimac Street : 83 | John F. Fitzgerald Surface Road 84 | State Street (84 rows)
A common problem encountered in GIS is the Nearest Neighbor problem. In a nutshell the problem is to find the x number of nearest neighbors given a geometry and n geometries of data.
The nearest neighbor crops up in other disciplines as well, except in other disciplines the units of measurement are not spatial distance but instead some sort of matrix distance. For this particular talk, I will focus on the GIS nature of the problem and specifically solving it in PostGIS.
The easiest and most naive way to do this is to do an exhaustive search of the whole space comparing the distances against the geometry of interest and sorting them. This solution while intuitive and easy to write is the slowest way of doing this.
If you were to write such a simple solution in PostgreSQL, then it would take something of the form shown below. The below gives you the nearest 5 neighbors from a reference row with gid = 1.
SELECT g1.gid As gref_gid, g1.description As gref_description, g2.gid As gnn_gid, g2.description As gnn_description FROM sometable As g1, sometable As g2 WHERE g1.gid = 1 and g1.gid <> g2.gid ORDER BY ST_Distance(g1.the_geom,g2.the_geom) LIMIT 5
Now for moderate to large datasets, this exercise becomes painfully slow because distance is not an indexable function because it involves relations between two entities.
If you knew something about your data like there are no neighbors past 300 units away from your geometry of interest or that at that point you could care less what lies past 300 units, then you could significantly improve the speed of this query by writing it like this:
SELECT g1.gid As gref_gid, g1.description As gref_description, g2.gid As gnn_gid, g2.description As gnn_description FROM sometable As g1, sometable As g2 WHERE g1.gid = 1 and g1.gid <> g2.gid AND ST_DWithin(g1.the_geom, g2.the_geom, 300) ORDER BY ST_Distance(g1.the_geom,g2.the_geom) LIMIT 5
If you needed to get the nearest neighbor for all records in a table, but you only need the first nearest neighbor for each, then you can use PostgreSQL's distinctive DISTINCT ON syntax. Which would look something like this :
SELECT DISTINCT ON(g1.gid) g1.gid As gref_gid, g1.description As gref_description, g2.gid As gnn_gid, g2.description As gnn_description FROM sometable As g1, sometable As g2 WHERE g1.gid <> g2.gid AND ST_DWithin(g1.the_geom, g2.the_geom, 300) ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom)
If you needed to run this for more than one record in sometable, but wanted a cross tab like query that lists the n nearest neighbors in one row with neighbors listed as n1, n2, n3 etc, then you can take advantage of PostgreSQL unique array offerings which we detail in SQL Idiom Design Patterns. The below code is excerpted from that article.
SELECT m.building_name, m.nn_precincts[1] As nn_precinct1, m.nn_precincts[2] As nn_precinct2, m.nn_precincts[3] As nn_precinct3 FROM (SELECT b.building_name, b.the_geom, ARRAY(SELECT p.precint_name FROM police_precints p WHERE ST_DWithin(b.the_geom, p.the_geom, 5000) ORDER BY ST_Distance(b.the_geom, p.the_geom) LIMIT 3) As nn_precincts FROM buildings b ) m ORDER BY m.builidng_name;
If you needed to run this for more than one record in sometable, then you can either put this in a loop and run it for each record of interest in sometable. Alternatively you could create an SQL function that returns a table for each record and use it like shown below.
CREATE OR REPLACE FUNCTION fnsometable_nn(gid integer, geom1 geometry, distguess double precision, n umnn integer) RETURNS SETOF sometable AS $BODY$ SELECT g2.* FROM sometable As g2 WHERE $1 <> g2.gid AND expand($2, $3) && g2.the_geom ORDER BY distance(g2.the_geom, $2) LIMIT $4 $BODY$ LANGUAGE 'sql' VOLATILE;
To use the function to run for the first 100 records in sometable and return the 5 closest neighbors for each of those records, you would do this
SELECT g1.gid, g1.description, (fnsometable_nn(g1.gid, g1.the_geom, 300, 5)).gid As nn_gid, (fnsometable_nn(g1.gid, g1.the_geom, 300, 5)).description as nn_description, distance((fnsometable_nn(g1.gid, g1.the_geom, 300, 5)).the_geom,g1.the_geom) as dist FROM (SELECT * FROM sometable ORDER BY gid LIMIT 100) g1
A lot of people are probably thinking "HUH" at this point. I will now wave my hand and leave it as an exercise for people to figure out why this works. I will say this much. I am using a feature in PostgreSQL which from my readings is an accident of design and only works with functions written in SQL and C language. This accident will probably be later corrected in future versions so don't expect this to work moving forward.
Returning tables in the select part of an sql statement is super non-standard and should I say bizarre and unelegant, but is the only way to do it currently in PostgreSQL where the you have a set returning function whose inputs depend on values from a relating table. This is because PostgreSQL currently does not support subselects and set returning functions in the FROM clause that dynamically change based on inputs from other table fields. In fact most databases do not support this.
Unfortunately PostgreSQL doesn't support a predicate similar to the CROSS APPLY predicate that SQL Server 2005 has. I think Oracle does by pretty much just leaving out the CROSS APPLY predicate. CROSS APPLY is more elegant looking I think in general and also more self-documenting in the sense that when you see the clause - you know what is being done.
SQL Server which I use very often - introduced this feature in the SQL Server 2005 offering and in a few cases its a life-saver.
I think the next next release of PostgreSQL will probably support a feature similar to this. Just to demonstrate how powerful this feature can be, if we had the cross apply feature, we could do away with our intermediate set returning function and simply write our query like this.
SELECT g1.gid, g1.description, nn.gid As nn_gid, nn.description As nn_description, distance(nn.the_g eom,g1.the_geom) as dist FROM (SELECT * FROM sometable ORDER BY gid LIMIT 100) g1 CROSS APPLY (SELECT g2.* FROM sometable As g2 WHERE g1.gid <> g2.gid AND expand(g1.the_geom, 300) && g2.the_geom) ORDER BY distance(g2.the_geom, g1.the_geom) LIMIT 5) As nn
This would alleviate the need to create a new nn function for each table or kind of query we are formulating.
Its always interesting to see how different databases tackle the same issues. Oracle Locator and Oracle Spatial for example does have an SDO_NN and SDO_NN_DISTANCE functions which magically does all this for you, but I haven't used it so not sure of its speed or if it has any caveats. One I did notice from reading is that it doesn't deal with where clauses too well.
After some heavy brainstorming, I have come up with a faster and more generic solution to calculating nearest neighbors than my previous solutions. For the gory details on how I arrived at this solution - please check out the following blog entries Boston GIS Blog entries on Nearest Neighbor Search.
In the sections that follow I will briefly describe the key parts of the solution, the strengths of the solution, the caveats of the solution, and some general use cases.
The solution is composed of a new PostgreSQL type, and 2 PostgreSQL functions. It involves a sequence of expand boxes that fan out from the bounding box of a geometry to as far out as you want. Below are the descriptions of each part.
The code can be downloaded here http://www.bostongis.com/downloads/pgis_nn.txt. I will also be posting this to the PostGIS Wiki once I clean it up a bit more.
My objective in creating this solution is to give PostGIS the same expressiveness that Oracle's SDO_NN and SDO_NN_DISTANCE functions provide Oracle. Unlike the Oracle solution, this solution is not implemented as an Operator, but instead as a set returning function.
Features are outlined below
All the data we will be using is in SRID: 26986. For a refresher course on loading data and setting up indexes, check Part 1: Getting Started With PostGIS: An almost Idiot's Guide
SELECT g1.gid as gid_ref, f.name, g1.nn_dist/1609.344 as dist_miles, g1.nn_gid FROM (SELECT b.gid, (pgis_fn_nn(b.the_geom, 1000000, 1,1000,'fire_stations', 'true', 'gid', 'the_geom')).* FROM (SELECT * FROM buildings limit 1000) b) As g1, fire_stations f WHERE f.gid = g1.nn_gid ORDER BY g1.nn_dist DESC
Total query runtime: 687 ms. 296 rows retrieved. SELECT g1.gid as gid_ref, g1.street, g1.fraddl, g1.toaddl, f.name, g1.nn_dist as dist_meters, g1.nn_gid FROM (SELECT b.*, (pgis_fn_nn(b.the_geom, 1000000, 2,1000,'fire_stations', 'true', 'gid', 'the_geom')).* FROM (SELECT * FROM census2000tiger_arc where zipl = '02109') b) As g1, fire_stations f WHERE f.gid = g1.nn_gid ORDER BY g1.street, g1.fraddl, g1.gid, g1.nn_dist
--Total query runtime: 2656 ms. 296 rows retrieved. SELECT g1.gid as gid_ref, g1.street, g1.fraddl, g1.toaddl, area(b.the_geom) as bldg_area, g1.nn_dist as dist_meters, g1.nn_gid FROM (SELECT s.*, (pgis_fn_nn(s.the_geom, 1000000, 2,1000,'buildings', 'area(the_geom) > 5000', 'gid', 'the_geom')).* FROM (SELECT * FROM census2000tiger_arc where zipl = '02109') s) As g1, buildings b WHERE b.gid = g1.nn_gid ORDER BY g1.street, g1.fraddl, g1.gid, g1.nn_dist
Now for this, I think we can safely assume that no street we care about should be further than 30,000 meters from its nearest school and hospital so we will limit our search to 30,000 meters and dice up our box into 10 expand boxes - so boxes expand at a rate of 3000 meters out per expansion.
--Total query runtime: 5516 ms.(~5.5 secs). 291 rows retrieved. SELECT emer.gid as gid_ref, emer.street, emer.fraddl, emer.toaddl, hospitals.name As emer_name, emer.nn_dist as emer_dist_meters, schools.name as sch_name, sch.nn_dist as sch_dist_meters FROM (SELECT b.gid, b.street,b.fraddl, b.toaddl, (pgis_fn_nn(b.the_geom, 30000, 1,10,'hospitals', 'er_status=''Y''', 'gid', 'the_geom')).* FROM (SELECT * FROM census2000tiger_arc where zipl IN('02109', '02110')) As b) As emer INNER JOIN (SELECT b.gid, (pgis_fn_nn(b.the_geom, 30000, 1,10,'schools', 'true', 'gid', 'the_geom')).* FROM (SELECT * FROM census2000tiger_arc where zipl IN('02109', '02110')) As b) As sch ON emer.gid = sch.gid INNER JOIN hospitals ON hospitals.gid = emer.nn_gid INNER JOIN schools ON schools.gid = sch.nn_gid ORDER BY emer.street, emer.fraddl, emer.gid
--Total query runtime: 2969 ms. 75 rows retrieved. SELECT sch.gid as gid_ref, sch.name as sch_name FROM schools sch LEFT JOIN (SELECT b.gid, (pgis_fn_nn(b.the_geom, 3000, 1,2,'hospitals', 'er_status=''Y''', 'gid', 'the_geom')).* FROM (SELECT * FROM schools WHERE schools.the_geom && GeomFromText('POLYGON((225446.34375 886435.5625,225446.34375 905270.8125,242278.640625 905270.8125,242278.640625 886435.5625,225446.34375 886435.5625))',26986)) b) As g1 ON sch.gid = g1.gid WHERE g1.gid IS NULL AND sch.the_geom && GeomFromText('POLYGON((225446.34375 886435.5625,225446.34375 905270.8125,242278.640625 905270.8125,242278.640625 886435.5625,225446.34375 886435.5625))',26986) ORDER BY sch.name
-- Total query runtime: 63 ms. 75 rows retrieved. SELECT sch.gid as gid_ref, sch.name as sch_name FROM schools sch LEFT JOIN (SELECT b.gid FROM schools b , hospitals h WHERE b.the_geom && GeomFromText('POLYGON((225446.34375 886435.5625,225446.34375 905270.8125,242278.640625 905270.8125,242278.640625 886435.5625,225446.34375 886435.5625))',26986) AND h.er_status = 'Y' AND expand(b.the_geom, 3000) && h.the_geom AND distance(b.the_geom, h.the_geom) <= 3000) g1 ON sch.gid = g1.gid WHERE g1.gid IS NULL AND sch.the_geom && GeomFromText('POLYGON((225446.34375 886435.5625,225446.34375 905270.8125,242278.640625 905270.8125,242278.640625 886435.5625,225446.34375 886435.5625))',26986) ORDER BY sch.name
For this exercise, we will create 2 new fields in our schools table called nearest_hospital and nearest_hospital_gid and update the fields with the closest hospital to that school. We will set our maximum distance search to 100,000 meters carved out into 10 10,000 meter expansions.
-- Query returned successfully: 2613 rows affected, 20625 ms (~1/3 minute) execution time. ALTER TABLE schools ADD nearest_hospital character varying(255); ALTER TABLE schools ADD nearest_hospital_gid integer; UPDATE schools SET nearest_hospital = h.name , nearest_hospital_gid = h.gid FROM (SELECT s.gid As sch_gid, (pgis_fn_nn(s.the_geom, 100000, 1,10,'hospitals', 'er_status=''Y''', 'gid', 'the_geom')).* FROM schools As s) As nn_emer INNER JOIN hospitals h ON nn_emer.nn_gid = h.gid WHERE schools.gid = nn_emer.sch_gid
Later on we realize we need an alternate hospital in case the first hospital is filled or because of traffic situation we can't get there, so we create a new set of fields to hold the second closest hospital. This particular exercise demonstrates where clause dependent on the additional attribute data of the reference record.
-- Query returned successfully: 2613 rows affected, 23032 ms (~1/3 minute) execution time. ALTER TABLE schools ADD nearest_hospital_2 character varying(255); ALTER TABLE schools ADD nearest_hospital_gid_2 integer; UPDATE schools SET nearest_hospital_2 = h.name , nearest_hospital_gid_2 = h.gid FROM (SELECT s.gid As sch_gid, (pgis_fn_nn(s.the_geom, 100000, 1,10,'hospitals', 'er_status=''Y'' AND gid <> ' || CAST(s.nearest_hospital_gid As varchar(20)) , 'gid', 'the_geom')).* FROM schools As s) As nn_emer INNER JOIN hospitals h ON nn_emer.nn_gid = h.gid WHERE schools.gid = nn_emer.sch_gid AND nearest_hospital_gid IS NOT NULL
PostGIS is an open source, freely available, and fairly OGC compliant spatial database extender for the PostgreSQL Database Management System. In a nutshell it adds spatial functions such as distance, area, union, intersection, and specialty geometry, geography, and raster data types to the database. PostGIS is very similar in functionality to SQL Server Spatial support, ESRI ArcSDE, Oracle Spatial, and DB2 spatial extender except it has more functionality and generally better performance than all of those (and it won't bankrupt you). The latest release version now comes packaged with the PostgreSQL DBMS installs as an optional add-on. As of this writing PostGIS 3.3.2 is the latest stable release and PostGIS 3.4.0 will be coming out within the next year. Features new in PostGIS 3.3 and new in Upcoming PostGIS 3.4. Notable features of PostGIS:
The PostGIS Windows bundle also includes cool PostGIS related extensions that augment your spatial enjoyment. In the bundle you will also find:
We will assume a windows environment for this tutorial, but most of the tutorial will apply to other supported platforms such as Linux, Unix, BSD, Mac etc. We will also be using Massachusetts/Boston data for these examples. For desktop users, the EnterpriseDB one-click installer exists as well for Mac/OSX and Linux desktops, so you should be able to follow along without too much fuss.
We will not go into too much detail here since the install wizard (at least the windows one) is pretty good. Below are the basic steps.
You can upgrade from 2.+ to 3.+ without having to backup and restore your db. and ALTER EXTENSION postgis UPDATE; will do the trick after you've installed the latest PostGIS.
ALTER EXTENSION postgis UPDATE;
For versions of PostGIS from 2.5.3+ you can use SELECT postgis_extensions_upgrade(); to upgrade postgis, postgis_raster, postgis_topology, postgis_sfcgal, and postgis_tiger_geocoder.
SELECT postgis_extensions_upgrade();
There is a postgis sampler option) which creates a regular spatial db with all the extensions installed. The dumper,loader commandline and GUI tools in PostgreSQL bin folder will get overwritten by the last PostGIS you installed so be careful. Generally speaking PostGIS 3.1 should work just fine everywhere you were using PostGIS before. If you were upgrading from PostGIS < 2.2 you may need to install the legacy_minimal.sql or legacy.sql scripts packaged in PostgreSQL//contrib/postgis-3.1
CREATE EXTENSION postgis;
The create spatial database checkbox is optional, and we generally uncheck it. It creates a spatial database for you to experiment with and has all the extensions packaged with PostGIS Bundle pre-installed.
ALTER SYSTEM (for 9.4+)
For those of you who want to try experimental builds -- e.g. We have experimental Windows builds made as part of our continuous integration process when anything changes in the PostGIS code base. These can be downloaded from http://postgis.net/windows_downloads.
You can use something like psql or the pgAdmin query window to create a database and spatially enable your database. This is the way to go if you have only a terminal interface to your db or you have a butt load of extensions you want to enable. Your psql steps would look something like this:
To install a bunch of extensions, just open up the pgAdmin SQL Query window (which we'll cover shortly) or psql and run this including only the extensions you want.
CREATE DATABASE gisdb; \connect gisdb; -- Enable PostGIS (includes raster) CREATE EXTENSION postgis; -- Enable Topology CREATE EXTENSION postgis_topology; -- Enable PostGIS Advanced 3D -- and other geoprocessing algorithms CREATE EXTENSION postgis_sfcgal; -- fuzzy matching needed for Tiger CREATE EXTENSION fuzzystrmatch; -- rule based standardizer CREATE EXTENSION address_standardizer; -- example rule data set CREATE EXTENSION address_standardizer_data_us; -- Enable US Tiger Geocoder CREATE EXTENSION postgis_tiger_geocoder; -- routing functionality CREATE EXTENSION pgrouting; -- spatial foreign data wrappers CREATE EXTENSION ogr_fdw; -- LIDAR support CREATE EXTENSION pointcloud; -- LIDAR Point cloud patches to geometry type cases CREATE EXTENSION pointcloud_postgis; --- Uber h3 hexagon indexing scheme for PostGIS 3.3.2+ bundles CREATE EXTENSION h3; --- converts between h3 index representations -- and postgis geometry/geography CREATE EXTENSION h3_postgis;
PostgreSQL comes packaged with a fairly decent admin tool called PgAdmin3. If you are a newbie or not sure what extensions you want, it's best just to use that tool to create a new database and look at the menu of extensions you have available to you.
On windows PgAdmin is under Start->Programs->PostgreSQL 14->PgAdmin 4
You can also run pgAdmin not on the server and also just install it separately downloading from https://pgadmin.org,
host all all 127.0.0.1/32 md5tohost all all 127.0.0.1/32 trust
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
The ::1/128 is usually the controlling one and is what localhost resolves to in IPV6 so you'll want to set this one.
This will allow any person logging locally to the computer that PostgreSQL is installed on to access all databases without a password. (127.0.0.1/32) means localhost only (32 is the bit mask). Note you can add additional lines to this file or remove lines to allow or block certain ip ranges. The lines highest in the file take precedence. So for example if you wanted to allow all users logging in access as long as they successfully authenticate with an md5 password, then you can add the line
host all all 0.0.0.0/0 md5
To install it --- switch to postgres database
and run this command in the sql window of PgAdmin or using psql: CREATE EXTENSION adminpack;
CREATE EXTENSION adminpack;
(NOTE: you can also use the extensions gui of PgAdmin to install in the postgres db)
Verify you have the newest PostGIS with this query:
SELECT postgis_full_version();
Should output something of the form
POSTGIS="3.3.2 3.3.2" [EXTENSION] PGSQL="150" GEOS="3.11.1-CAPI-1.17.1" SFCGAL="SFCGAL 1.4.1, CGAL 5.3, BOOST 1.78.0" PROJ="7.2.1" GDAL="GDAL 3.4.3, released 2022/04/22" LIBXML="2.9.9" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" TOPOLOGY RASTER
Now we have a nice fully functional GIS database with no spatial data. So to do some neat stuff, we need to get some data to play with.
Download data from the MassGIS site. For this simple exercise just download Towns Extract the files into some folder. We will only be using the _POLY files for this exercise.
NOTE: Someone asked how you extract the file if you are on a linux box.
If you are on Linux/Unix, I find the exercise even easier. If you are on linux or have Wget handy - you can do the below to download the file after you have cded into the folder you want to put it in.
wget http://wsgw.mass.gov/data/gispub/shape/state/boundaries.zip
Now to extract it simply do the following from a shell prompt
unzip boundaries.zip
You will notice one of the files it extracts is called BOUNDARY_POLY.prj. A .prj is often included with ESRI shape files and tells you the projection of the data. We'll need to match this descriptive projection to an SRID (the id field of a spatial ref record in the spatial_ref_sys table) if we ever want to reproject our data.
select srid, srtext, proj4text from spatial_ref_sys where srtext ILIKE '%Massachusetts%'
The easiest data to load into PostGIS is ESRI shape data since PostGIS comes packaged with a nice command line tool called shp2pgsql which converts ESRI shape files into PostGIS specific SQL statements that can then be loaded into a PostGIS database.
This file is located in the PostgreSQL bin folder which default location in Windows is Program Files/PostGreSQL/15/bin
Since these files are so embedded, it is a bit annoying to navigate to. To create yourself a self-contained toolkit you can carry with you anywhere, copy the following files from the bin folder into say c:\pgutils:
comerr32.dll krb5_32.dll libeay32.dlllibiconv-2.dll libintl-2.dll libpq.dll pgsql2shp.exe psql.exepg_dump.exe pg_restore.exe shp2pgsql.exe ssleay32.dll libproj-9.dll geos_c.dll geos
Note: The GUI loader is packaged as a self-contained postgisgui folder in the bin of your PostgreSQL install. If you prefer the GUI interface, you can copy that folder and run the shp2pgsql-gui.exe file from anywhere even an external file network path.
c:\pgutils\shp2pgsql -s 26986 BOUNDARY_POLY towns > towns.sql
psql -d gisdb -h localhost -U postgres -f towns.sql
One thing you can do with the shp2pgsql command line version pacakged with PostGIS 2.0-2.2, that you can't do with the GUI is to do a spatial transformation from one coordiante system to another. So witht eh command line, we can transform to 4326 (WGS 84) and load to geography type with a single command. Hopefully we'll see this in the GUI in a future release.
Table indexes are very important for speeding up the processing of most queries. There is also a downside to indexes and they are the following
Given the above, it is often times tricky to have a good balance. There are a couple general rules of thumb to go by that will help you a long way.
The most common queries we will be doing on this query are spatial queries and queries by the town field. So we will create 2 indexes on these fields. NOTE: The loader has an option to create the spatial index which we took advantage of, so the spatial index one is not necessary but we present it here, just so if you ever need to create a spatial index, like for csv loaded data, you know how to.
CREATE INDEX idx_towns_geom ON towns USING gist(geom); CREATE INDEX idx_towns_town ON towns USING btree(town);
Go back into PgAdmin III and refresh your view. Verify that you have a towns database now.
SELECT ST_Extent(geom) FROM towns WHERE town = 'BOSTON';SELECT ST_Area(ST_Union(geom)) FROM towns WHERE town = 'BOSTON';
If you are a GIS newbie, I highly recommend using QGIS. QGIS has ability to view PostGIS data both geometry and raster directly, do simple filters on it, is free, is cross-platform (Linux, Windows, MacOSX,Unix) and is the least threatening of all the GIS Viewers I have seen out there for people new to GIS.
PostGIS is an open source, freely available, and fairly OGC compliant spatial database extender for the PostgreSQL Database Management System. In a nutshell it adds spatial functions such as distance, area, union, intersection, and specialty geometry data types to the database. PostGIS is very similar in functionality to SQL Server 2008 Spatial support, ESRI ArcSDE, Oracle Spatial, and DB2 spatial extender. The latest release version now comes packaged with the PostgreSQL DBMS installs as an optional add-on. As of this writing PostGIS 2.0.0 is the latest stable release. Noteable enhancements in this release:
CREATE EXTENSION postgis
ALTER EXTENSION postgis ..
Note for Vista Users Because of the new added security in Vista, you may run into issues installing PostgreSQL. Please refer to the Windows Vista gotchas http://trac.osgeo.org/postgis/wiki/UsersWikiWinVista in the PostGIS wiki if you run into issues.
The create spatial database checkbox is optional, and we generally uncheck it. It creates a spatial database for you to experiment with, but the template_postgis20 is always created for windows. If you are running PostgreSQL 9.1+, we recommend you don't even bother with the template database and just use CREATE EXTENSION postgis in the database of your choosing or use PgAdmin Extensions install feature which we will cover in this tutorial.
For those of you who want to try experimental builds -- e.g. We have experimental Windows builds made weekly or as frequently as anything interesting happens in the PostGIS code base. These can be downloaded from http://postgis.net/windows_downloads. For those who want to test out 2.0.1SVN, just replace the postgis-2.0.dll in the PostgreSQL lib folder with the one in the zip and run the respective postgis upgrade sql file in the share/contrib/postgis-2.0 or if you are using extensions ALTER EXTENSION postgis UPDATE TO "2.0.1SVN".
ALTER EXTENSION postgis UPDATE TO "2.0.1SVN"
PostgreSQL comes packaged with a fairly decent admin tool called PgAdmin3. If you are a newbie, its best just to use that tool to create a new database.
On windows the file is located in C:\Program Files\PostgreSQL\9.0\share\contribs\adminpack.sql for versions of PostgreSQL below 9.1. It is located in the share folder of Linux installs as well. To install it --- switch to postgres database and run the adminpack.sql script in that database.
If you are running 9.1, however, it doesn't matter which OS you are running with -- just connect to your postgres database and run this command in the sql window of PgAdmin or using psql: CREATE EXTENSION adminpack;
UPDATE: - The remaining steps in this section are not needed if you chose template_postgis20 for your new database. However if you are trying to spatially enable an existing database or you didn't get the template_postgis20 option. Do the remaining steps.
For PostgreSQL 9.1 -- expand your database and you should see an extensions section , right-mouse click on that and select the New Extension option.
Note the two options -- postgis and postgis_topology -- ain't that a beaut. This will work even if your PostgreSQL server is not on Windows. . Select postgis (this includes geometry, geography, and raster support). Then if you want to use topology, then repeat the step picking postgis_topology. Also note the Definition tab, which allows you to install postgis in the non-default schema and will also allow you a painless way to upgrade to PostGIS 2.0.1 when that comes out by changing the version number from drop down.
If you are unlucky enough to be using PostGIS on pre-9.1 and also don't have a template database, things are a bit tougher. The pre-PostgreSQL 9.1 way --Next go to tools->Query tool in pgAdmin III and browse to the postgresql install contrib postgis.sql file (on Windows the default install is Program files\Postgresql\8.4\share\contrib\postgis-2.0. You'll need to run all these scripts:
postgis.sql rtpostgis.sql spatial_ref_sys.sql topology.sql (if you want topology support)
As of PgAdmin III 1.10 -- the Plugin Icon has PSQL as an option and if you responded to the message Yes to the Overwrite my plugins.ini (for 9.1 you won't get a prompt since these files are stored separately), you should get an additional PostGIS Shapefile and DBF Loader option. . This option should become ungreyed when you select a database and when you launch it, it will pass in the credentatials to the database for you and launch a PSQL connection.
NOTE: Someone asked how you extract the file if you are on a linux box. ---FOR LINUX USERS ---
wget http://wsgw.mass.gov/data/gispub/shape/state/townssurvey_shp.exe
unzip townssurvey_shp.exe
NOTE: As of PostGIS 1.5.0, the windows build is now packaged with a shp2pgsql Graphical User Interface. You can also download it separately if you are using a lower version of PostGIS or want to load data from a separate workstation that doesn't have PostgreSQL installed.download this separately and use with any version of PostGIS from 1.2 - 2.0.0 and enable it as a Plug-In in PgAdminIII . Check out our screencast on configurating the shp2pgsql-gui as a PgAdmin III plug-in and using it. or our write-up on registering it
You will notice one of the files it extracts is called TOWNS_POLY.prj. A .prj is often included with ESRI shape files and tells you the projection of the data. We'll need to match this descriptive projection to an SRID (the id field of a spatial ref record in the spatial_ref_sys table) if we ever want to reproject our data.
This file is located in the PostGresql bin folder which default location in Windows is Program Files/PostGreSQL/8.4/bin
comerr32.dll krb5_32.dll libeay32.dlllibiconv-2.dll libintl-2.dll libpq.dll pgsql2shp.exe psql.exepg_dump.exe pg_restore.exe shp2pgsql.exe ssleay32.dll
c:\pgutils\shp2pgsql -s 26986 TOWNS_POLY towns > towns.sql
One thing you can do with the shp2pgsql command line version pacakged with PostGIS 2.0, that you can't do with the GUI is to do a spatial transformation from one coordiante system to another. So witht eh command line, we can transform to 4326 (WGS 84) and load to geography type with a single command. Hoepfully we'll see this in the GUI in PostGIS 2.0.1 or 2.1.0.
The most common queries we will be doing on this query are spatial queries and queries by the town field. So we will create 2 indexes on these fields. NOTE: The loader has an option to create the spatial index which we took advantage of, so the spatial index one is necessary aside form just to keep in mind or if you forgot to enable the index opton in the loader.
For PostGIS installations of 1.2.2 and above, the preferred function names start with ST_
Old syntax pre PostGIS 1.2.2 - this will not work in PostGIS 1.4+. If you have old code like this -- change it to the above syntax. We have crossed out the below code to demonstrate it is BAD
SELECT Extent(geom) from towns where town = 'BOSTON';SELECT Area(GeomUnion(geom)) FROM towns where town = 'BOSTON';
Most functions in new postgis installs just have an ST_ prefixed, except for GeomUnion which became ST_Union. The other difference is that relational operators with ST_ now automagically use index operators where as the ones without ST_ you need to do an additional && call.
Example: a.geom && b.geom AND Intersects(a.geom,b.geom) can simply be written as ST_Intersects(a.geom, b.geom)
a.geom && b.geom AND Intersects(a.geom,b.geom)
ST_Intersects(a.geom, b.geom)
If the above gives you an error such as mixed SRIDs, most likely you are running 1.3.2 postgis which was very defective. Upgrade to 1.3.3 at your next opportunity. To verify your install - SELECT postgis_full_version();
If you are a GIS newbie, I highly recommend using Quantum GIS. Quantum GIS has ability to view PostGIS data directly, do simple filters on it, is free, is cross-platform (Linux, Windows, MacOSX,Unix) and is the least threatening of all the GIS Viewers I have seen out there for people new to GIS.
One of the greatest things about Spatial Relational Databases is that they bring GIS to a new level by allowing you to apply the expressive SQL declarative language to the spatial domain. With spatial relational databases, you can easily answer questions such as what is the average household income of a neighborhood block. What political district does X reside in. This new animal that marries SQL with GIS is called Simple Features for SQL (SFSQL). In essence SFSQL introduces a new set of functions and aggregate functions to the SQL Language.
In this next section we'll go over some common queries utilizing the data we downloaded in Part 1. Although we are using PostGIS for this exercise, most Spatial Relational Databases such as Oracle Spatial, ArcSDE, DB Spatial Extender, have similar syntax.
As noted in Part 1, the data we downloaded is in NAD 83 Meters MA State Plane. What if we needed the data in NAD 83 feet. To accomplish this, we would need to transform our spatial data to the new coordinate system with the transform function. If you look in the spatial_ref_sys table you'll notice that srid = 2249 is the srid of the Nad 83 ft MA State Plane. Our query would look something like this.
SELECT town, ST_Transform(geom, 2249) as geom_nad83ft FROM towns
In order to get the latitude and longitude of our data, we need our coordinate reference system to be in some for of longlat. To begin with, we first transform our data from NAD 83 Meters Massachusetts State Plane to some variant of longlat - closest match is NAD 83 North American Datum (srid = 4269). Then we find the centroid and then the x and y coordinates of that.
SELECT town, ST_X(ST_Centroid(ST_Transform(geom, 4269))) as longitude,
ST_Y(ST_Centroid(ST_Transform(geom, 4269))) as latitude FROM towns
Spatial aggregate functions are much like regular SQL aggregate functions such as AVG, SUM, COUNT in that they work with GROUP BY and HAVING predicates and collapse multiple records into a single record. If you are unfamiliar with the above terms - take a look at Summarizing data with SQL (Structured Query Language)
The extent function is an aggregate function that gives you the bounding box of a set of geometries. It is especially useful for determining the bounding rectangle of the area you are trying to map. For example if we wanted to find the bounding box of the boston area in NAD 83 feet, we would do something like this.
SELECT town, ST_Extent(ST_Transform(geom, 2249)) as the_extent FROM towns WHERE town = 'BOSTON' GROUP BY town
The ST_Union function is an aggregate function that takes a set of geometries and unions them together to create a single geometry field. In versions prior to PostGIS 1.2, this was called geomunion and the old name was completely removed in PostGIS 2.0. For our towns data, we may have multiple records per town. To get a single geometry that represents the total region of a town, we could use the geomunion function like the example below.
select town, ST_Union(geom) as thegeom from towns group by town;
It is important to note that while the above query will give you one record per town. Our original plain vanilla of
select town, geom as thegeom from towns;
To get a visual sense of what all these different queries look like, you can dump out the above outputs as an ESRI shape file using the pgsql2shp tool and view it using a shape viewer such as QuantumGIS or use the query plugin tools directly to output directly from the db. .
pgsql2shp -f myshp -h myserver -u apguser -P apgpassword -g thegeom mygisdb "select town, geomunion(geom) as thegeom from towns group by town"
One caveat: the shape dumper utility can only dump out fields of type geometry. So for example to dump out a bbox type such as what is returned by the extent function, you'll need to cast the output as a geometry something like
SELECT town, ST_SetSRID(ST_Extent(ST_Transform(geom, 2249))::geometry,2249) as theextent FROM towns WHERE town = 'BOSTON' GROUP BY town
One common question asked for of a spatial database is how far one object is from another. PostGIS like similar spatial databases has a function called distance which will tell you the minimum distance between one geometry from another.
An example of such a use is below. The below query will tell you How far each zip in your zipcode table is from another zip 02109?. In this case we are comparing the field geom_meter which we have in NAD 83 State Plane meters. Distance is always measured in the metric of your spatial reference system.
SELECT z.zipcode As zip1, z2.zipcode As zip2, ST_Distance(z.geom_meter,z2.geom_meter) As thedistance FROM zipcode z, zipcode z2 WHERE z2.zipcode = '02109'
If the above geometries were polygons, then we would be getting the minimal distance from any point on each polygon to the other. For point geometries there is no distinction between a min distance and other distances.
If we are not dealing with just points, then there are all kinds of distances we could be asking for. For example If we wanted the average distance between 2 polygons, then we would want the distance between the centroids. An average distance compare would then look like.
SELECT z.zipcode As zip1, z2.zipcode As zip2, ST_Distance(ST_Centroid(z.geom_meter),ST_Centroid(z2.geom_meter)) As averagedistance FROM zipcode z, zipcode z2 WHERE z2.zipcode = '02109'
An often asked question is what zipcodes are within x away from my zip?. To do that one would think that simply doing a distance < x would suffice. This would suffice except it would be slow since it is not taking advantage of PostGIS indexes and also because if the geometries you are comparing are fairly complex, the search could be exhaustive. To write the query in the most efficient way, we would include the use of the Expand function. Like so SELECT z.zipcode FROM zipcode z, zipcode z2 WHERE z2.zipcode = '02109' AND ST_DWithin(z2.geom_meter, z.geom,3000)
SELECT z.zipcode FROM zipcode z, zipcode z2 WHERE z2.zipcode = '02109' AND ST_DWithin(z2.geom_meter, z.geom,3000)
The ST_DWithin takes the bounding box of a geometry and expands it out X in all directions and then does a more exhaustive distance check on the actual geometries. So in this case we would be expanding our 02109 zip geometry out 3000 meters.
It internally uses the && operator is the interacts operator that returns true if two geometries bounding boxes intersect and false if they do not. It is a much faster especially if you have indexes on your geometries because it utilizes the bounding box index.
Often you will receive data in a non-spatial form such as comma delimited data with latitude and longitude fields. To take full advantage of PostGIS spatial abilities, you will want to create geometry fields in your new table and update that field using the longitude latitude fields you have available.
General Note: All the command statements that follow should be run from the PgAdminIII Tools - Query Tool or any other PostGreSQL Administrative tool you have available. If you are a command line freak - you can use the psql command line tool packaged with PostGreSQL.
For this exercise, we will use US zip code tabulation areas instead of just Boston data. The techniques here will apply to any data you get actually.
First step is to download the data from US Census. http://www.census.gov/geo/www/gazetteer/places2k.html
PostgreSQL comes with a COPY function that allows you to import data from a delimited text file. Since the ZCTAs data is provided in fixed-width format, we can't import it easily without first converting it to a delimited such as the default tab-delimited format that COPY works with. Similarly for data in other formats such as DBF, you'll either want to convert it to delimited using tools such as excel, use a third party tool that will import from one format to another, or one of my favorite tools Microsoft Access that allows you to link any tables or do a straight import and export to any ODBC compliant database such as PostgreSQL.
For fixed width data such as ZCTA, we demonstrate a more generic way of importing fixed with data into PostgreSQL that doesn't require any additional tools beyond what is packaged with PostgreSQL.
First you will need to create the table in Postgres. You want to make sure the order of the fields is in the same order as the data you are importing.
CREATE TABLE zctas ( state char(2), zcta char(5), junk varchar(100), population_tot int8, housing_tot int8, water_area_meter float8, land_area_meter float8, water_area_mile float8, land_area_mile float8, latitude float8, longitude float8 ) WITHOUT OIDS;
For this part of the exercise, I'm going to use Microsoft Excel because it has a nice wizard for dealing with fixed-width and a lot of windows users have it already. If you open the zcta file in Excel, it should launch the Text Import Wizard. MS Access has a similarly nice wizard and can deal with files larger than excels 65000 some odd limitation. Note there are trillions of ways to do this step so I'm not going to bother going over the other ways. For non-MS Office users other office suites such as Open-Office probably have similar functionality.
Now copy the data into the table using the COPY command. Note the Copy command works using the PostGreSQL service so the file location must be specified relative to the Server.
COPY zctas FROM 'C:/Downloads/GISData/zcta5.tab';
To create the Geometry field, use the AddGeometryColumn opengis function. This will add a geometry field to the specified table as well as adding a record to the geometry_columns meta table and creating useful constraints on the new field. A summary of the function can be found here http://postgis.refractions.net/docs/ch06.html#id2526109.
SELECT AddGeometryColumn( 'public', 'zctas', 'thepoint_lonlat', 4269, 'POINT', 2 );
The above code will create a geometry column named thepoint_longlat in the table zctas that validates to make sure the inputs are 2-dimensional points in SRID 4269 (NAD83 longlat).
UPDATE zctas SET thepoint_lonlat = ST_SetSRID(ST_Point( longitude, latitude),4269) )
The above code will generate a PostGIS point geometry object of spatial reference SRID 4269.
There are a couple of things I would like to point out that may not be apparently clear to people not familiar with PostGreSQL or PostGis
The above is great if you want your geometry in longlat spatial reference system. In many cases, longlat is not terribly useful. For example if you want to do distance queries with your data, you don't want your distance returned back in longlat. You want it in a metric that you normally measure things in.
In the code below, we will create a new geometry field that holds points in the WGS 84 North Meter reference system and then updates that field accordingly.
SELECT AddGeometryColumn( 'public', 'zctas', 'thepoint_meter', 32661, 'POINT', 2 ); UPDATE zctas SET thepoint_meter = transform(setsrid(makepoint(longitude, latitude),4269), 32661);
Please note in earlier versions of this article I had suggested doing the above with
UPDATE zctas SET thepoint_meter = transform(PointFromText('POINT(' || longitude || ' ' || latitude || ')',4269),32661) ;
On further experimentation, it turns out that PointFromText is perhaps the slowest way of doing this in Postgis. Using a combination of ST_Setsrid and ST_Point is on the magnitude of 7 to 10 times faster at least for versions of PostGIS 1.2 and above. ST_GeomFromText comes in second (replace ST_PointFromText with ST_GeomFromText) at about 3 to 1.5 times slower than ST_SetSRID ST_Point. See about PointFromText on why PointFromText is probably slower. In ST_GeomFromText, there appears to be some caching effects so on first run with similar datasets ST_GeomFromText starts out about 3-4 times slower than the ST_makepoint (ST_Point) way and then catches up to 1.5 times slower. This I tested on a dataset of about 150,000 records and all took - 26 secs for ST_PointFromText fairly consistently, 10.7 secs for first run of GeomFromText then 4.1 secs for each consecutive run, 3.5 secs fairly consistently for setsrid,makepoint on a dual Xeon 2.8 Ghz, Windows 2003 32-bit with 2 gig RAM).
One of the number one reasons for poor query performance is lack of attention to indexes. Putting in an index can make as much as a 100 fold difference in query speed depending on how many records you have in the table. For large updates and imports, you should put your indexes in after the load, because while indexes help query speed, updates against indexed fields can be very slow because they need to create index records for the updated/inserted data. In the below, we will be putting in GIST indexes against our spatial fields.
CREATE INDEX idx_zctas_thepoint_lonlat ON zctas USING GIST (thepoint_lonlat); CREATE INDEX idx_zctas_thepoint_meter ON zctas USING GIST (thepoint_meter); ALTER TABLE zctas ALTER COLUMN thepoint_meter SET NOT NULL; CLUSTER idx_zctas_thepoint_meter ON zctas; VACUUM ANALYZE zctas;
In the above after we create the indexes, we put in a constraint to not allow nulls in the thepoint_meter field. The not null constraint is required for clustering since as of now, clustering is not allowed on gist indexes that have null values. Next we cluster on this index. Clustering basically physically reorders the table in the order of the index. In general spatial queries are much slower than attribute based queries, so if you do a fair amount of spatial queries especially bounding box queries, you get a significant gain.
Please note: (for those familiar with Cluster in SQL Server - Cluster in PostgreSQL pretty much means the same thing, except in PostgreSQL - at least for versions below 8.3 (not sure about 8.3 and above), if you cluster and then add new data or update data, the system does not ensure that your new or updated data is ordered according to the cluster as it does in SQL Server. The upside of this is that there is no additional penalty in production mode with clustering as you have in SQL Server. The downside is you must create a job of some sort to maintain the cluster for tables that are constantly updated.
In the above we vacuum analyze the table to insure that index statistics are updated for our table.
R is both a language as well as an environment for doing statistical analysis. R is available as Free Software under the GPL. For those familiar with environments such as S, MatLab, and SAS - R serves the same purpose. It has powerful constructs for manipulating arrays, packages for importing data from various datasources such as relational databases, csv, dbf files and spreadsheets. In addition it has an environment for doing graphical plotting and outputting the results to both screen, printer and file.
For more information on R check out R Project for Statistical Computing.
PL/R is a PostgreSQL language extension that allows you to write PostgreSQL functions and aggregate functions in the R statistical computing language.
With the R-language you can write such things as aggregate function for median which doesn't exist natively in PostgreSQL and exists only in a few relational databases natively (e.g. Oracle) I can think of. Even in Oracle the function didn't appear until version 10.
Another popular use of R is for doing Voronoi diagrams using the R Delaunay Triangulation and Dirichlet (Voronoi) Tesselation (deldir) Comprehensive R Archive Network (CRAN) package. When you combine this with PostGIS you have an extremely powerful environment for doing such things as nearest neighbor searches and facility planning.
In the past, PL/R was only supported on PostgreSQL Unix/Linux/Mac OSX environments. Recently that has changed and now PLR can be run on PostgreSQL windows installs. For most of this exercise we will focus on the Windows installs, but will provide links for instructions on Linux/Unix/Mac OSX installs.
In order to use PLR, you must first have the R-Language environment installed on the server you have PostgreSQL on. In the next couple of sections, we'll provide step by step instructions.
update.packages()
Now that you have both PostgreSQL and R installed, you are now ready to install PLR procedural language for PostgreSQL.
In order to start using PL/R in a database, you need to load the help functions in the database. To do so do the following.
For users running PostgreSQL 9.1+, install by typing in SQL window:
CREATE EXTENSION plr;
If you are running on PostgreSLQ 9.0 or lower you have to install using the plr.sql file. Choose -> File -> Open -> path/to/PostgreSQL/8.4/contrib/plr.sql (NOTE: on Windows the default location is C:\Program Files\PostgreSQL\8.4\contrib\plr.sql
Next run the following commands from PgAdminIII or psql to test out R
SELECT * FROM plr_environ(); SELECT load_r_typenames(); SELECT * FROM r_typenames(); SELECT plr_array_accum('{23,35}', 42);
Next try to create a helper function (this was copied from (http://www.joeconway.com/plr/doc/plr-pgsql-support-funcs.html) - and test with the following
CREATE OR REPLACE FUNCTION plr_array (text, text) RETURNS text[] AS '$libdir/plr','plr_array' LANGUAGE 'C' WITH (isstrict); select plr_array('hello','world');
Below is a link creating a median aggregate function. This basically creates a stub aggregate function that calls the median function in R. http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html NOTE: I ran into a problem here installing median from the plr-aggregate-funcs via PgAdmin. Gave R-Parse error when trying to use the function. I had to install median function by removing all the carriage returns (\r\n) so put the whole median function body in single line like below to be safe. Evidentally when copying from IE - IE puts in carriage returns instead of unix line breaks. When creating PL/R functions make sure to use Unix line breaks instead of windows carriage returns by using an editor such as Notepad++ that will allow you to specify unix line breaks. create or replace function r_median(_float8) returns float as 'median(arg1)' language 'plr'; CREATE AGGREGATE median ( sfunc = plr_array_accum, basetype = float8, stype = _float8, finalfunc = r_median ); create table foo(f0 int, f1 text, f2 float8); insert into foo values(1,'cat1',1.21); insert into foo values(2,'cat1',1.24); insert into foo values(3,'cat1',1.18); insert into foo values(4,'cat1',1.26); insert into foo values(5,'cat1',1.15); insert into foo values(6,'cat2',1.15); insert into foo values(7,'cat2',1.26); insert into foo values(8,'cat2',1.32); insert into foo values(9,'cat2',1.30); select f1, median(f2) from foo group by f1 order by f1;
create or replace function r_median(_float8) returns float as 'median(arg1)' language 'plr'; CREATE AGGREGATE median ( sfunc = plr_array_accum, basetype = float8, stype = _float8, finalfunc = r_median ); create table foo(f0 int, f1 text, f2 float8); insert into foo values(1,'cat1',1.21); insert into foo values(2,'cat1',1.24); insert into foo values(3,'cat1',1.18); insert into foo values(4,'cat1',1.26); insert into foo values(5,'cat1',1.15); insert into foo values(6,'cat2',1.15); insert into foo values(7,'cat2',1.26); insert into foo values(8,'cat2',1.32); insert into foo values(9,'cat2',1.30); select f1, median(f2) from foo group by f1 order by f1;
In the next part of this series, we will cover using PL/R in conjunction with PostGIS.
In this tutorial we will explore using PostGIS and PL/R together. Some examples we will quickly run thru.
If you missed part one of our series and are not familiar with R or PL/R, please read http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgis_tut01 Getting started with PostGIS and http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgresql_plr_tut01 PLR Part 1: Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot's Guide
Next create a new PostgreSQL database using the template_postgis database as template and load in the PLR support functions as described in the above article. For this particular exercise, we will assume the database is called testplrpostgis.
We will be working with census data from MassGIS. Click the following "Download This Layer" - census2000blockgroups_poly.exe, from this link http://www.mass.gov/mgis/cen2000_blockgroups.htm Get the Massachusetts Town Polygon geometries from this link http://www.mass.gov/mgis/townssurvey.htm Townsurvey points - ESRI Shapefiles Get the Massachusetts Census Blocks from this link http://www.mass.gov/mgis/cen2000_blocks.htm Get Massachusetts Schools Layer from here http://www.mass.gov/mgis/schools.htm
Extract the files into a folder (running the exe,right-click extract, and for Linux/Unix/MacOS just extracting with unzip or stuffit or some other similar package)
Now load the shape files into your test database using shp2pgsql by doing the following at your OS commandline. Note for non-windows users or if you installed PostgreSQL in a non-standard directory - change the path to yours - default being "C:\Program Files\PostgreSQL\8.2\bin\". Also replace items in italics with those that match your environment. "C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\census2000blockgroups_poly cens2000bgpoly > cens2000bgpoly.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f cens2000bgpoly.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_cens2000bgpoly_the_geom ON cens2000bgpoly USING GIST(the_geom)" "C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\TOWNSSURVEY_POLY towns > towns.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f towns.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_towns_the_geom ON towns USING GIST(the_geom)" "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_towns_town ON towns USING btree(town)" "C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\SCHOOLS_PT schools > schools.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f schools.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_schools_the_geom ON towns USING GIST(the_geom)" "C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\census2000blocks_poly cens2000blocks > cens2000blocks.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f cens2000blocks.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_cens2000blocks_the_geom ON cens2000blocks USING GIST(the_geom)" "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "vacuum analyze"
"C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\census2000blockgroups_poly cens2000bgpoly > cens2000bgpoly.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f cens2000bgpoly.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_cens2000bgpoly_the_geom ON cens2000bgpoly USING GIST(the_geom)" "C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\TOWNSSURVEY_POLY towns > towns.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f towns.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_towns_the_geom ON towns USING GIST(the_geom)" "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_towns_town ON towns USING btree(town)" "C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\SCHOOLS_PT schools > schools.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f schools.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_schools_the_geom ON towns USING GIST(the_geom)" "C:\Program Files\PostgreSQL\8.2\bin\shp2pgsql" -s 26986 C:\Data\massgis\census2000blocks_poly cens2000blocks > cens2000blocks.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -f cens2000blocks.sql "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "CREATE INDEX idx_cens2000blocks_the_geom ON cens2000blocks USING GIST(the_geom)" "C:\Program Files\PostgreSQL\8.2\bin\psql" -h localhost -U postgres -d testplrpostgis -c "vacuum analyze"
In this example - we will find out the median population of a town census blockgroup, total population of all block groups within the town, and average population of a town census blockgroup, and count of block groups for each town that is within each towns boundary for all Massachusetts towns. Note: we have to do a geomunion here because the towns layer has multiple records per town. Unioning will consolidate so we have a single multipolygon for each town.
SELECT t.town, count(bg.gid) as total_bg, avg(bg.total_pop) as avg_bg_pop, median(bg.total_pop) as median_pop, sum(bg.total_pop) as totalpop FROM cens2000bgpoly bg, (SELECT towns.town, geomunion(the_geom) as the_geom FROM towns GROUP BY towns.town) t WHERE bg.the_geom && t.the_geom AND within(bg.the_geom, t.the_geom) GROUP BY t.town ORDER BY t.town
CREATE TABLE townsingle AS SELECT MAX(gid) as gid, towns.town, geomunion(the_geom) as the_geom FROM towns GROUP BY towns.town; CREATE INDEX idx_townsingle_the_geom ON townsingle USING GIST(the_geom); CREATE UNIQUE INDEX idx_townsingle_gid ON townsingle USING btree(gid); CREATE UNIQUE INDEX idx_townsingle_town ON townsingle USING btree(town);
Before we continue, we need to expand our R environment by installing deldir package which contains voronoi functionality. In order to do so, please follow these steps.
We will be using the deldir package - I listed some other packages here because they looked interesting, but we won't have time to explore those in too much detail in this exercise
chooseCRANmirror()
utils:::menuInstallPkgs()
If the above doesn't work try
available.packages()
Alternatively for example if you did available.packages(), you can install the packages individually by doing for example< br/> install.packages('deldir') where in this case deldir is the name of the package we wish to install.
install.packages('deldir')
R offers some neat features as far as help goes that you can access straight from the R GUI console. Below are some useful commands to find out more about a package or drill in at examples.
help(package=deldir)
help.search('median')
demo(package = .packages(all.available = TRUE))
Example: libary(gstat) demo(block)
In this example we will use a voronoi PL/R implementation provided by Mike Leahy in the PostGIS users newsgroup - http://postgis.refractions.net/pipermail/postgis-users/2007-June/016102.html
The function and voronoi type code is repeated below: Run the below in PgAdmin or psql. Note for Windows users - if you are copying this - make sure to paste in Notepad and save the file and then open in PgAdmin or psql. This will strip the carriage returns and just leave line breaks. R parser chokes on carriage returns.
--This function uses the deldir library in R to generate voronoi polygons for an input set of points in a PostGIS table. --Requirements: -- R-2.5.0 with deldir-0.0-5 installed -- PostgreSQL-8.2.x with PostGIS-1.x and PL/R-8.2.0.4 installed --Usage: select * from voronoi('table','point-column','id-column'); --Where: -- 'table' is the table or query containing the points to be used for generating the voronoi polygons, -- 'point-column' is a single 'POINT' PostGIS geometry type (each point must be unique) -- 'id-column' is a unique identifying integer for each of the originating points (e.g., 'gid') --Output: returns a recordset of the custom type 'voronoi', which contains the id of the -- originating point, and a polygon geometry create type voronoi as (id integer, polygon geometry);
create or replace function voronoi(text,text,text) returns setof voronoi as ' library(deldir) # select the point x/y coordinates into a data frame... points <- pg.spi.exec(sprintf("select x(%2$s) as x, y(%2$s) as y from %1$s;",arg1,arg2)) # calculate an approprate buffer distance (~10%): buffer = ((abs(max(points$x)-min(points$x))+abs(max(points$y)-min(points$y)))/2)*(0.10) # get EWKB for the overall buffer of the convex hull for all points: buffer <- pg.spi.exec(sprintf("select buffer(convexhull(geomunion(%2$s)),%3$.6f) as ewkb from %1$s;",arg1,arg2,buffer)) # the following use of deldir uses high precision and digits to prevent slivers between the output polygons, and uses # a relatively large bounding box with four dummy points included to ensure that points in the peripheral areas of the # dataset are appropriately enveloped by their corresponding polygons: voro = deldir(points$x, points$y, digits=22, frac=0.00000000000000000000000001,list(ndx=2,ndy=2), rw=c(min(points$x)-abs(min(points$x)-max(points$x)), max(points$x)+abs(min(points$x)-max(points$x)), min(points$y)-abs(min(points$y)-max(points$y)), max(points$y)+abs(min(points$y)-max(points$y)))) tiles = tile.list(voro) poly = array() id = array() p = 1 for (i in 1:length(tiles)) { tile = tiles[[i]] curpoly = "POLYGON((" for (j in 1:length(tile$x)) { curpoly = sprintf("%s %.6f %.6f,",curpoly,tile$x[[j]],tile$y[[j]]) } curpoly = sprintf("%s %.6f %.6f))",curpoly,tile$x[[1]],tile$y[[1]]) # this bit will find the original point that corresponds to the current polygon, along with its id and the SRID used for the # point geometry (presumably this is the same for all points)...this will also filter out the extra polygons created for the # four dummy points, as they will not return a result from this query: ipoint <- pg.spi.exec(sprintf("select %3$s as id, intersection(''SRID=''||srid(%2$s)||'';%4$s'',''%5$s'') as polygon from %1$s where intersects(%2$s,''SRID=''||srid(%2$s)||'';%4$s'');",arg1,arg2,arg3,curpoly,buffer$ewkb[1])) if (length(ipoint) > 0) { poly[[p]] <- ipoint$polygon[1] id[[p]] <- ipoint$id[1] p = (p + 1) } } return(data.frame(id,poly)) ' language 'plr';
After you have installed the above Voronoi function. Test it out with the following code
CREATE TABLE schools_voroni(gid int); SELECT AddGeometryColumn('', 'schools_voroni', 'the_geom', 26986, 'POLYGON',2); INSERT INTO schools_voroni(gid, the_geom) SELECT v.id, v.polygon FROM voronoi('(SELECT gid, the_geom FROM schools WHERE town = ''BOSTON'' AND grades LIKE ''%3%'' AND type = ''PUB'') as bs', 'bs.the_geom', 'bs.gid') As v ALTER TABLE schools_voroni ADD CONSTRAINT pk_schools_voroni PRIMARY KEY(gid);
Below is a map of the public elementary schools in Boston overlayed on the Voronoi Polygons of those schools
At this moment, some people may be thinking - Nice polygons, but what is this supposed to tell me about Boston public elementary schools? Well the short simplistic answer is that the Voronoi polygon of a school tells you theoretically what area a school serves, all else being equal. To find out more about this interesting topic, checkout Voronoi links and tidbits. For example we assume people prefer to send their kids to schools near them. With this information and using census blocks, we can theoretically figure out which areas are underserved based on population of elementary school kids in that voroni polygon area, relative to the number of schools serving that area. So a query something like (NOTE I had only population not elementary school population so this is theoretically flawed already.)
CREATE TABLE vorhighdens As SELECT v.thegid, sum(pop_2000)/v.schcount As vpop, v.the_geom FROM (SELECT MAX(gid) as thegid, COUNT(gid) As schcount, the_geom, area(the_geom) as the_area FROM schools_voroni GROUP BY the_geom, area(the_geom)) As v INNER JOIN cens2000blocks cb ON v.the_geom && cb.the_geom AND Within(cb.the_geom, v.the_geom) GROUP BY v.the_geom, v.schcount, v.thegid ORDER BY vpop DESC LIMIT 5
The darkened areas are where based on my simple model, more elementary schools need to be built. Please NOTE this model is gravely flawed for many reasons
GDAL stands for Geospatial Data Abstraction Library and is a popular open source GIS library originally developed and maintained by Frank Warmerdam with contributions from many. It is used in GIS for processing various types of GIS datasources - both Vector and Raster data. The GDAL part generally refers to RASTER functionality while the complementary OGR refers to Vector data. The RASTER functionality is very useful in analysis such as analysing Weather satellite raster data and Digital Elevation Models (DEMS) or converting between RASTER formats.
RGDAL is an R-statistical package that makes these library functions available from the R statistical environment (both RASTER and Vector functions) - thus making them available within PostgreSQL PL-R environment. In this exercise we shall install and play with this library first from R and then from within PostgreSQL.
For this exercise, we assume you have already installed PostgreSQL and PL/R. If you haven’t already, read our PLR Part 1: Up and Running with PL/R (PLR) in PostgreSQL: An almost Idiot's Guide - http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgresql_plr_tut01
RGDAL relies on the package sp which contains the SpatialGridDataFrame class among other things, so you'll need to install that first. To install sp do the following:
Launch RGUI. A lot of these excercises are similar to ones documented in the RGDAL help file. Now lets test drive the library. Locate a small image on one of your drives. E.g. for windows users, you can copy an image to root of C of server or something of that sort for linux usual /usr/local/whatever will work fine. Then do something like.
library(rgdal) TestLogo <- readGDAL("C:/test.jpg") TestLogo
The above should read the jpeg into TestLogo variable and typing TestLogo will spit out all sorts of details about the Jpeg file. That will look something like below:
test.jpg has GDAL driver JPEG and has 9 rows and 9 columns Closing GDAL dataset handle 0x048f1230... destroyed ... done. > TestLogo Object of class SpatialGridDataFrame Object of class SpatialGrid Grid topology: cellcentre.offset cellsize cells.dim x 0.5 1 9 y 0.5 1 9 SpatialPoints: x y [1,] 0.5 8.5 [2,] 1.5 8.5 [3,] 2.5 8.5 [4,] 3.5 8.5 [5,] 4.5 8.5 [6,] 5.5 8.5 [7,] 6.5 8.5 [8,] 7.5 8.5 : : Coordinate Reference System (CRS) arguments: NA Data summary: band1 band2 band3 Min. : 40.00 Min. : 59.0 Min. :117.0 1st Qu.: 51.00 1st Qu.: 71.0 1st Qu.:124.0 Median : 52.00 Median : 72.0 Median :125.0 Mean : 86.99 Mean :103.7 Mean :147.6 3rd Qu.: 61.00 3rd Qu.: 79.0 3rd Qu.:132.0 Max. :255.00 Max. :255.0 Max. :255.0
For cleanup and destroy TestLogo variable
Now lets try with some real Geospatial data.
Download Digital Elevation Model (DEM) International sample from http://www.mapmart.com/DEM/DEM.htm and extract files into a folder.
The below R code will open up the CustomArea file and start at position Y=8, x=6 of the file and show 5 rows and 3 columns on the R graphics device.
custarea <- GDAL.open("/path/to/CustomAreaSRTM90.dem") displayDataset(custarea,offset=c(6,8), region.dim=c(5,3), reduction=1, band=1)
To show the whole file you would do
displayDataset(custarea,offset=c(0,0), region.dim=dim(custarea), reduction=1, band=1)
If you had multiple color bands and you wanted to display all bands, then you would do
displayDataset(custarea,offset=c(0,0), region.dim=dim(custarea), reduction=1, band=1:3)
When we are done with the file, we should close it to clean up loose-ends
GDAL.close(custarea)
Here are a couple of useful exercises that you would commonly do on a DEM or GeoTiff
custdf <-open.SpatialGDAL("/path/to/CustomAreaSRTM90.dem")
CustomAreaSRTM90.dem has GDAL driver USGSDEM and has 3525 rows and 6275 columns
This is slower as it reads the whole file into memory custarea <- readGDAL("/path/to/CustomAreaSRTM90.dem") proj4string(custarea) This is faster as it just opens the file and , but file needs to be closed custarea <- open.SpatialGDAL("/path/to/CustomAreaSRTM90.dem", read.only=TRUE) proj4string(custarea) close(custarea)
custarea <- readGDAL("/path/to/CustomAreaSRTM90.dem") proj4string(custarea)
custarea <- open.SpatialGDAL("/path/to/CustomAreaSRTM90.dem", read.only=TRUE) proj4string(custarea) close(custarea)
custarea <- open.SpatialGDAL("/path/to/CustomAreaSRTM90.dem", read.only=TRUE) bbox(custarea) close(custarea)
GDALinfo("/path/to/CustomAreaSRTM90.dem") cust_summary <- GDALinfo("/path/to/CustomAreaSRTM90.dem") cust_summary["rows"] cust_summary["ll.x"] cust_summary["ll.y"]
custarea <- GDAL.open("/path/to/CustomAreaSRTM90.dem") custarea_gt <- getRasterTable(custarea, band = 1, offset = c(0,0), region.dim = c(10,10)) GDAL.close(custarea) custarea_gt
The result of the above looks something like this
x y band1 1 -127.4554 50.23167 438 2 -127.4546 50.23167 410 3 -127.4537 50.23167 382 4 -127.4529 50.23167 356 5 -127.4521 50.23167 329 6 -127.4512 50.23167 306 7 -127.4504 50.23167 288 8 -127.4496 50.23167 272 9 -127.4487 50.23167 262 10 -127.4479 50.23167 257 11 -127.4554 50.23084 399 12 -127.4546 50.23084 378 13 -127.4537 50.23084 355 14 -127.4529 50.23084 333 15 -127.4521 50.23084 310 16 -127.4512 50.23084 290 : :
custarea_gt["band1"]
Now switch over to PostgreSQL and use PgAdmin or psql to write the function. Below is a simple PL/R function that loads the R library and reads the meta data from a file passed in.
CREATE or REPLACE FUNCTION getTileInfo(atilefile varchar, element varchar) returns text AS $$ library(rgdal) tile_summary <-GDALinfo(atilefile) return(tile_summary[element]) $$ language 'plr';
Here is how we can use the above function from within PostgreSQL that returns a bit of info from the file we pass in as argument and the element piece we want. Keep in mind the path is relative to the PostgreSQL server since R is installed on the server and that this function will work for any GDAL supported file format such as DEM or GeoTiff or JPEG etc..
SELECT getTileInfo('/path/to/CustomAreaSRTM90.dem', 'll.x') As x_pos, getTileInfo('/path/to/CustomAreaSRTM90.dem', 'll.y') As y_pos, getTileInfo('/path/to/CustomAreaSRTM90.dem', 'rows') As num_rows, getTileInfo('/path/to/CustomAreaSRTM90.dem', 'columns') As num_cols
SpatiaLite is an SQLite database engine with Spatial functions added. You can think of it as a spatial extender for SQLite database engine which is similar in concept to what PostGIS does for the PostgreSQL Object-Relational Database.
For those unfamiliar with SQLite -- it is this little adorable single file relational database system that runs equally well on Windows, Linux, Unix, Mac and is easily embeddable in larger apps. Is Free and Open Soure -- Public domain so no restrictions for commercial and embedding in other applications and I think is superb as a mini-replicator. We use it in some of our projects for synching data back and forth between a mainstream server database such as PostgreSQL, SQL Server, MySQL and a client offline database that needs to synch differential data back and forth to the mother ship.
What is special about it is as follows:
SpatiaLite sweetens this little database by allowing you to store geometries and query them with spatial functions similar to what you would find in PostgreSQL/PostGIS, Microsoft SQL Server 2008, MySQL, IBM DBII, Oracle Locator/Spatial. In terms of the model it uses, it seems closest in syntax to PostGIS and in fact modeled after PostGIS and also piggy-backs on GEOS and Proj.4 like PostGIS. In fact the functionality you will see is pretty much the functions you get in PostGIS including their names minus the aggregates and some other things, except you just need to strip off the ST_ and add in a bbox filter, so PostGIS users should feel very much at home.
Some key features of SpatiaLite
As far as licensing goes, SpatiaLite does not have the same License as SQLite. It is under 3 licenses Mozilla Public and GPLv3
In this little exercise, we shall get you up and running quickly with this cute cute database engine.
In terms of tutorials -- here is a recent one by MapFish people -- SpatiaLite in 5 minutes. This article will be similar except we shall go thru similar exercises we did for SQL Server 2008 and PostGIS as a contrast and compare.
Installation is simple. -- just download and extract and run the GUI.
The GUI has imbedded in it the engine that runs the database similar in concept to the way Microsoft Access application has embedded the engine to control an MS Access MDB. So to get started, launch the GUI.
Download data from the MassGIS site.
For this simple exercise just download Towns
SELECT * FROM spatial_ref_sys WHERE ref_sys_name LIKE '%Massachusetts%' ORDER BY srid
The easiest data to load into SpatiaLite is to use the ESRI shape data loader packaged with the SpatiaLite gui.
There are a couple of things I would like to point out that the shape file loader auto-magically does for you.
SELECT * from geometry_columns;
CREATE TRIGGER gtu_towns_the_geom BEFORE UPDATE ON towns FOR EACH ROW BEGIN SELECT RAISE(ROLLBACK, '''towns.the_geom'' violates Geometry constraint [geom-type not allowed]') WHERE (SELECT type FROM geometry_columns WHERE f_table_name = 'towns' AND f_geometry_column = 'the_geom' AND (type = GeometryType(NEW.the_geom) OR type = GeometryAliasType(NEW.the_geom))) IS NULL; END
The most common queries we will be doing on this query are spatial queries and queries by the town field. So we will create 2 indexes on these fields.
In the tree view that shows the tables, expand towns and right mouse click on the the_geom field and then choose Build Spatial Index. You will see the little icon change on the column once that is done.
The SpatiaLite GUI doesn't seem to have a right-click feature for other indexes besides spatial, so to create an index on the town field do the following:
CREATE INDEX idx_town ON towns(town); vacuum towns;
Go back into SpatiaLite gui
SELECT MIN(idx.xmin) As bxmin,MIN(idx.ymin) As bymin, MAX(idx.xmax) As bxmax, MAX(idx.ymax) As bymax FROM towns As t INNER JOIN idx_towns_the_geom As idx ON t.rowid = idx.pkid WHERE t.town = 'BOSTON'; returns -- bxmin: 225473.6094 bymin: 886444.6250 bxmax: 252005.5781 bymax: 905284.0000 SELECT SUM(Area(the_geom)) FROM towns where town = 'BOSTON'; returns 128251224.3644 --reproject to Massachusetts state plane feet SELECT town, astext(centroid(transform(the_geom,2249))) FROM towns WHERE town > '' ORDER BY town LIMIT 2; returns ABINGTON POINT(802961.762366 2868485.109604) ACTON POINT(672950.258529 3001540.967504); --to see what indices its using EXPLAIN QUERY PLAN SELECT SUM(Area(the_geom)) FROM towns where town = 'BOSTON'; --or the painful steps EXPLAIN SELECT SUM(Area(the_geom)) FROM towns where town = 'BOSTON';
Example: --Here is an exercise we did in SQL Server 2008 --
Here we arbitrarily take the first point that defines a polygon in Boston and ask what town POLYGON/MULTIPOLYGON geometries are within 1 mile of this point and we also want to know the exact distances and results ordered by distance. The speed of this was surprisingly good and finished in under a second and returned 3 rows.
--I was hoping this would use a spatial index but it doesn't SELECT t.town, Distance(ref.point1,t.the_geom)/0.3048 As dist_ft, Distance(ref.point1, t.the_geom) As dist_m FROM towns As t INNER JOIN ( SELECT PointN(Boundary(the_geom),1) As point1 FROM towns WHERE town = 'BOSTON' LIMIT 1) As ref ON MbrIntersects(BuildCircleMbr(X(ref.point1), Y(ref.point1),1609.344), t.the_geom) WHERE Distance(ref.point1, t.the_geom) < 1609.344 ORDER BY Distance(ref.point1, t.the_geom); --this seems to perform just as well -- would need a larger set to do a real test SELECT t.town, Distance(ref.point1,t.the_geom)/0.3048 As dist_ft, Distance(ref.point1, t.the_geom) As dist_m FROM towns As t INNER JOIN ( SELECT PointN(Boundary(the_geom),1) As point1 FROM towns WHERE town = 'BOSTON' LIMIT 1) As ref ON ( Distance(ref.point1, t.the_geom) < 1609.344) ORDER BY Distance(ref.point1, t.the_geom);
If you are a GIS newbie, I highly recommend using Quantum GIS. The latest binary distributions of QuantumGIS (as of this writing QGIS 1.4+) now have the SpatiaLite driver built in.
You can also use the latest Spatialite-gis minimalist, which has viewer and shapefile importer packaged in. Latest version is 1.0 Alpha and binaries available for Windows, Linux, and Mac OSX.
The OGR toolkit is a subset of GDAL project. GDAL is available on many operating systems including Linux, Unix, Mac, and Windows. You can find binaries at GDAL Binaries. It is also included as part of the QGIS Desktop. GDAL comes packaged with several command line tools. The ones we find most useful are:
If you want to use GDAL on a desktop, the easiest way to get started is to install QGIS Desktop which has installers available for Linux, Windows, and Mac. If you are on a server, then you should install GDAL via your OS Packager. Windows users who don't want QGIS (or want a non-install required set of binaries) can use one of the GISInternals binaries which includes GDAL and MapServer.
For the rest of this article, we will assume you are using GDAL via QGIS desktop
If you want a comprehensive listing of options offered by ogr2ogr or ogrinfo, run the following at the FW Tools Shell.
ogr2ogr --help ogrinfo --help
ogr2ogr -f "ESRI Shapefile" mydata.shp mydata.tab
ogr2ogr -f "PostgreSQL" PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" mytabfile.tab
Note: for the above, you can leave out the host if its localhost and user and password if you have your authentication set to trust.
In the below example, we don't want OGR to create a table called mytable. We instead want to call the table something different like newtablename. To do so we use the nln option. ogr2ogr -f "PostgreSQL" PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" mytabfile.tab -nln newtablename
ogr2ogr -f "PostgreSQL" PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" mytabfile.tab -nln newtablename
Sometimes OGR does not output the right projection, particularly with Units of Feet or data that has no projection info or the projection information can't be easily translated to your system. Sometimes OGR can't match the projection to one in your spatial_ref_sys table so creates a new entry in that table. In these cases you have to tell OGR what the output projection is. You do this with the -a_srs flag. ogr2ogr -f "PostgreSQL" -a_srs "EPSG:2249" PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" mytabfile.tab
ogr2ogr -f "PostgreSQL" -a_srs "EPSG:2249" PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" mytabfile.tab
In the above example I told OGR2OGR to assume the source/output projection is in Massachusetts Mainland US Ft. Note: All Spatial Ref Systems can be found in the spatial_ref_sys table of PostGIS or the Data/gcs.csv file of your FW Tools install.
The pgsql2shp and shp2pgsql are usually the best tools for converting back and forth between PostGIS and ESRI for 2 main reasons.
If you really want to use Ogr2Ogr for this kind of conversion, below is the standard way to do it
ogr2ogr -f "ESRI Shapefile" mydata.shp PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" "mytable"
Sometimes you have more than one geometry field in a table, and ESRI shape can only support one geometry field per shape. Also you may only want a subset of data. In these cases, you will need to select the geometry field to use. The most flexible way to do this is to use the -sql command which will take any sql statement.
ogr2ogr -f "ESRI Shapefile" mydata.shp PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" -sql "SELECT name, the_geom FROM neighborhoods"
ogr2ogr -f "KML" neighborhoods.kml PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" -sql "select gid, name, the_geom from neighborhoods"
ogr2ogr -f "KML" neighborhoods.kml PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" neighborhoods -dsco NameField=name
One way in which ogr2ogr excels above using the pgsql2shp tool is that ogr2ogr can export multiple tables at once. This is pretty handy for sharing your postgis data with others who do not have a postgis database.
The code below will export all your postgis tables out into a folder called mydatadump in ESRI shape (shp) format.
ogr2ogr -f "ESRI Shapefile" mydatadump PG:"host=myhost user=myloginname dbname=mydbname password=mypassword"
The code below will export all your postgis tables out into a folder called mydatadump in MapInfo .tab format.
ogr2ogr -f "MapInfo File" mydatadump PG:"host=myhost user=myloginname dbname=mydbname password=mypassword"
Now most of the time you probably only want to output a subset of your postgis tables rather than all your tables. This code exports only the neighborhoods and parcels tables to a folder called mydatadump in ESRI shapefile format
ogr2ogr -f "ESRI Shapefile" mydatadump PG:"host=myhost user=myloginname dbname=mydbname password=mypassword" neighborhood parcels
Topologically Integrated Geographic Encoding and Referencing system (TIGER) is the US Census Bureaus proprietary format for exporting US Census geographic and statistical data. Starting in 2007, they will be using ESRI Shapefile (SHP) as there official export format. So this section may be a bit obsolete for the upcoming versions.
To get the files for your location - you can browse their archive at http://www.census.gov/geo/www/tiger/index.html
ogrinfo TGR25025.RTI
The tiger files contain a set of layers, so unlike the other outputs we have done, we will specify a folder to dump all the layers into
ogr2ogr -f "ESRI Shapefile" masuffolk TGR25025.RTI
ogr2ogr -f "ESRI Shapefile" sufcomp.shp TGR25025.RT1 layer CompleteChain
In the above, we are asking for just the CompleteChain layer and to output to a new file called sufcomp.shp. Note it will output shp and the corresponding shx, and prj files.
The conversion follows a similar path to ESRI Shape
The below will create a folder masuf and output all the layers into that folder and give each a tab file extension
ogr2ogr -f "MapInfo File" masuf TGR25025.RT1
ogr2ogr -f "MapInfo File" sufcomp.tab TGR25025.RT1 layer CompleteChain
ogr2ogr -update -append -f "PostGreSQL" PG:"host=myserver user=myusername dbname=mydbname password=mypassword" TGR25025.RT1 layer CompleteChain -nln masuf -a_srs "EPSG:4269"
Note in the above we needed to put the -update -append option because OGR2OGR will try to create a folder if given a file with no extension, which translates to creating a new database.
We also put in the -nln masuf to prevent OGR2OGR from creating the table name as CompleteChain
Lastly we put in EPSG:4269 to tell OGR to just assume Tiger is in NAD 83 long lat. Without this it creates an entry in the spatial_ref_sys table which is equivalent to the already existing well known 4269.
FWTools at least for windows, has the ESRI Geodatabase driver (referred to as PGeo for Personal Geodatabase) baked into the binary. This makes it relatively simple to export data out of the Personal Geodatabase format into other formats such as PostgreSQL or ESRI Shape
If you want a sample geo database to play with, try the MassGIS annotation personal geo database http://www.mass.gov/mgis/ftpgnm.htm
ogrinfo C:\GISData\Geonames.mdb
INFO: Open of `c:\GISData\Geonames.mdb' using driver `PGeo' successful. 1: GEONAMES_ANNO_PLACES 2: GEONAMES_ANNO_HYDRO 3: GEONAMES_ANNO_HYPSO
To import all feature classes and assign a particular spatial ref
ogr2ogr -f "PostgreSQL" PG:"host=localhost user=someuser dbname=somedb password=somepassword port=5432" C:\GISData\Geonames.mdb -a_srs EPSG:26986
Import 1 feature class and reproject and rename geometry column. This example brings in a feature class that is of projection NAD 83 MA meters and transforms to NAD 83 longlat and also renames the feature class ma_hydro
ogr2ogr -f "PostgreSQL" PG:"host=localhost user=someuser dbname=somedb" C:\Data\Geonames.mdb GEONAMES_ANNO_HYDRO -a_srs EPSG: 26986 -t_srs EPSG:4269 -nln ma_hydro -lco GEOMETRY_NAME=the_geom_4269
MySQL 4 and 5 support spatial objects to a limited extent. The functionality is not as rich as PostGIS, but for basic mapping work and limited spatial analysis, and if you already have MySQL installed, it may suit your needs just fine. Below is a snippet of code that will import a shape file called world_adm0.shp into a MySQL database and will rename it world. NOTE: that if you have a virgin MySQL database with no geometry_columns or spatial_ref_sys table, then those will automatically be created.
ogr2ogr -f "MySQL" MYSQL:"mydb,host=myhost,user=mylogin,password=mypassword,port=3306" -nln "world" -a_srs "EPSG:4326" path/to/world_adm0.shp
While OGR was designed primarily to transform data between different spatial datasources, it is a little known fact that it can be used as well for importing non-spatial datasources such as Dbase files and CSV files.
In the below example, we will demonstrate importing a Dbase file into PostgreSQL using OGR2OGR
ogr2ogr -f "PostgreSQL" PG:"host=myserver user=myusername dbname=mydbname password=mypassword" sometable.dbf -nln "sometable"
Microsoft SQL Server 2008 is the first version of SQL Server to have built-in functionality for doing geographic spatial queries.
This tutorial is similar to our Part 1: Getting Started with PostGIS: An almost Idiot's Guide but written to provide a similar quick primer for SQL Server 2008 users and also just as a parallel exercise in mirroring the camps. Think of it as a big welcome to the new kid on the block who is an old family friend.
We will assume a windows environment for this tutorial since SQL Server only runs on Windows and preferably Windows XP and above (not sure if it works on Windows 2000). All our examples will be using Microsoft SQL Server 2008 Express which is a free version of SQL Server 2008. SQL Server 2008 Express is allowed for both non-hoster commercial and private use. Please note that the spatial functionality in the SQL Server 2008 Express family is just as good as in the Standard and Enterprise versions with the limitation on database size, mirroring, partitioning and some other minor things which are not spatial specific. SQL Server 2008 Standard, Web and Enterprise work on only servers while SQL Server Express 2008 works on both Servers and Workstations.
SQL Server 2008 Express comes in 3 flavors:
For the below exercise, we will assume you have downloaded at a minimum SQL Server 2008 express with Tools, or that you already have 2008 Express Studio already installed.
Some gotchas before we start:
Once SQL Server 2008 express is installed with management tools do the following
First off we have to install a loader. You can use the freely available http://www.sharpgis.net/page/SQL-Server-2008-Spatial-Tools.aspx Which comes with both an ESRI shape loader and SQL Server spatial viewer. To use simply download and extract.
WARNING: One of our friends noted that the SharpGIS Loader comes up with a very suboptimal spatial grid index that will throw of queries relying on a spatial index. This is particularly an issue for data loaded into the geometry planar data type. As a result, SQL Server queries may be unusually slow and you may get a bad impression of SQL Server's performance. This wil be fixed in a later version of the loader.
Download data from the MassGIS site. For this simple exercise just download Towns with Coast Extract the file into some folder. We will only be using the _POLY files for these exercises.
First of all, what is Planar - Planar is a class of spatial reference systems that projects a round earth onto a flat model. SQL Server 2008 supports both a Geometry (Planar) and a Geography (round-earth model). For data brought in as Planar, SQL Server 2008 does not do any validation to ensure it is in the sys.spatial_reference_systems table, and in fact SQL Server 2008 only contains spherical spatial reference systems in that meta table. So if you want to bring in as planar, as long as all your data is in the same planar projection, you should be fine. SQL Server 2008 has no mechanism of transforming data from one planar projection to another.
Those who are familiar with the PostGIS equivalent exercise of this know that MassGIS data is in Massachusetts state plane Meters (Spatial_Reference_ID = 26986 which is a planar projection) so bringing it in as Geometry works fine, but trying to push it into Geodetic we shall find is a little trickier.
Now lets move some data:
What good is spatial data if you can't drop your jaws at its fantastic beauty. So lets look at what this animal looks like:
SELECT * FROM towns_planar WHERE town = 'BOSTON'
SELECT TOP 1 geom.STCentroid().STAsText() FROM towns_planar WHERE town = 'BOSTON'
SELECT town, geom.STIntersection(buf.aBuffer) As newgeom FROM towns_planar INNER JOIN (SELECT TOP 1 geom.STCentroid().STBuffer(1000) As aBuffer FROM towns_planar WHERE town = 'BOSTON') As buf ON (towns_planar.geom.STIntersects(buf.aBuffer) = 1) WHERE geom.STIsValid() = 1
If you have data measured in degrees e.g. WGS84 longlat (4326) or NAD 83 LongLat (4269 standard TIGER US Census format), bringing in your data as geodetic is simple since 4326 and 4269 are already listed in the sys.spatial_reference_systems. A simple query confirms that - SELECT * FROM sys.spatial_reference_systems WHERE spatial_reference_id IN(4269,4326);
SELECT * FROM sys.spatial_reference_systems WHERE spatial_reference_id IN(4269,4326);
To do so - you simply follow the prior steps but choose Geography (Spheric) instead.
But what if we want to bring planar data in such as MassGIS towns as Geodetic. Then you need to first transform the data which SQL Server has no mechanism for and then bring it in. We shall go over this in part 2.
In the first part we covered bringing in Mass Towns data as Planar geometry, but were stuck because we need to transform the data to a degree based projection (in particular one listed in sys.spatial_reference_systems) to use the Geography data type, but SQL Server 2008 has no mechanism to transform that data. Now we shall demonstrate how to transform and import the data in a supported spatial reference system.
As mentioned before SQL Server 2008 has no mechanism for doing spatial transformations, so what to do?
OGR/GDAL is a free Open Source GIS toolkit that is very useful for data loading and doing spatial transformations among other things. Its probably the easiest to use of all the tools.
cd C:\data\gisdata
ogr2ogr -f "ESRI Shapefile" -a_srs "EPSG:26986" -t_srs "EPSG:4326" towns_geodetic.shp TOWNSSURVEY_POLY.shp
Now we have our data imported, Launch SQLSpatial.exe as we did before and run these queries
The below fails because Geography does not support Centroid - get error STCentroid for type STGeography not found.
SELECT TOP 1 geom.STCentroid().STAsText() FROM towns_geodetic WHERE town = 'BOSTON'
So the first thing we learn from the above exercise, is that sometimes planar is still needed since while Geography can cover a larger region, it is lacking many of the functions available in the regular Geometry. Refer to SQL Server 2008 PostGIS MySQL Compare for a compare of function/method differences between SQL Server 2008 Geography and Geometry
One important feature that Geography (Geodetic) has over Geometry is ability to do distances in meters using spherical coordinates and spanning large regions. In fact Isaac Kunen touches on this a little in his blog Nearest Neighbors. In fact doing distance queries and finding neighbors is probably the number one reason why most people will want to use spatial queries. With this one feature, one can answer questions such as
Of course questions like these could be answered before, but are answered a bit more trivially with a spatially enabled database and are extremely difficult to answer if you are trying to find shortest distances between 2 objects that are not points.
Note we know the distance is in meters because the spatial_reference_systems table tells us so.
SELECT unit_of_measure from sys.spatial_reference_systems WHERE spatial_reference_id = 4326;
Most of the spatial refence systems defined in this sys table are in meters except for a few weird ones in Clarke's foot, Survey foot, and German metre.
Here we are going to run this in SQL Server 2008 Studio since we don't have any map data to view and we want to take advantage of SQL Server 2008 Studio show plan features. Keep in mind just as in all OGC compliant spatial databases, the STDistance function defines the minimum distance between 2 geometries. So if you are comparing a Polygon to a polygon then its the distance between the points on each polygon that is the closest.
Below is a slightly different query from what we used in planar and can be used equally in planar. Here we arbitrarily take the first point that defines a polygon in Boston and ask what town POLYGON/MULTIPOLYGON geometries are within 1 mile of this point and we also want to know the exact distances and results ordered by distance.
SELECT t.town, ref.point1.STDistance(t.geom)/0.3048 As dist_ft FROM towns_geodetic As t INNER JOIN ( SELECT TOP 1 geom.STPointN(1) As point1 FROM towns_geodetic WHERE town = 'BOSTON') As ref ON ref.point1.STDistance(t.geom) < 1609.344 ORDER BY ref.point1.STDistance(t.geom) ; town dist_ft BOSTON 0 BOSTON 140.31019135227 BOSTON 211.728831986735 DEDHAM 2616.66222586371 DEDHAM 2616.73216967261 MILTON 3501.37051762325
SELECT t.town, ref.point1.STDistance(t.geom)/0.3048 As dist_ft FROM towns_geodetic As t INNER JOIN ( SELECT TOP 1 geom.STPointN(1) As point1 FROM towns_geodetic WHERE town = 'BOSTON') As ref ON ref.point1.STDistance(t.geom) < 1609.344 ORDER BY ref.point1.STDistance(t.geom) ;
Now try clicking the "Include Actual Execution Plan" (or Ctrl-M for short) and hitting the Execute for the above query.
You should see something like this which will give you a rough idea of where your processing time is going.
Shown above is a very small fragment of the plan used. From this we learn that our query is using a spatial index (this is good), but there is a warning on it, not so good. Usually when you see a little warning like that, it means your planner statistics are either non-existent or out of date. If you right click on it and view details, it will tell you more about the warning. This query is already lightning fast, so we won't worry about this minor issue. In our next part, we shall delve into larger sets of data with more sophisticated queries, where speed will be an issue and we'll want to squeeze out every inch of speed we can.
So I shall leave you with these words of wisdom as far as Query Plans go and these apply for most databases and not just spatial queries. We'll experiment with these rules of thumb in the next section.
Those who read our PostGIS series may be wondering why the examples we chose were not the same as what we did for PostGIS. Sadly SQL Server 2008 does not have any built-in spatial aggregates such as those you would find in PostGIS (e.g. ST_Union, ST_Collect, ST_Extent, ST_MakeLine etc.). Not to fret because Isaac Kunen and his compadres have been filling in the gaps in SQL Server 2008 spatial support. These gaps are filled in with a tool kit called SQL Server Spatial Tools and can be downloaded from Code Plex at http://www.codeplex.com/sqlspatialtools, not to be confused with Morten's SQL Spatial Tools we covered in the first tutorial.
Keep in mind that SQL Server 2008 Spatial tools is a moving target with new functionality constantly added. As of this writing it has the following packaged goodies list in our order of importance:
In this exercise we shall demonstrate how to compile the latest build, install and brief examples of using. As of this writing, the latest version is change Set 15818 released in September 12, 2008.
Note you can skip the compile step if you are satisfied with not having the latest and greatest and just download the original August compiled release. I would encourage you to be brave though since the newer version contains a lot of improvements include the GeographyCollectionAggregate.
Now after we have compiled, the assembly file will be located in the SpatialTools\bin\Release folder and be called SQLSpatialTools.dll. Before we can start using, we'll need to load this assembly and the function stubs into our database.
USE []
USE testspatial
create assembly SQLSpatialTools from ''
create assembly SQLSpatialTools from 'C:\SQLSpatialTools.dll'
SELECT town, dbo.GeographyUnionAggregate(geom) As geom INTO towns_singular_geodetic FROM towns_geodetic GROUP BY town; --Creates 351 rows. compare to SELECT COUNT(*) from towns_geodetic; Which gives us 1241 rows.
SELECT t.town FROM (SELECT geom FROM towns_singular_geodetic WHERE town = 'BOSTON') as boston INNER JOIN towns_singular_geodetic As t ON boston.geom.STIntersects(t.geom) = 1 ORDER BY t.town
If we run the Actual Execution Plan - we see that SQL Server tells us in very subtle terms that we are idiots. . If only that message about the missing index were in Red instead of Green, I just might have thought there was a problem :).
Well looking at our query we see a WHERE based on town and we are obviously doing spatial calculations that can benefit from a spatial index. A spatial index in SQL Server 2008 always relies on a clustered index so lets make our town a clustered primary since we have one town per record now and a spatial index.
Gotchas
Run the below in SQL Server 2008 Studio/Express.
ALTER TABLE dbo.towns_singular_geodetic ALTER COLUMN town varchar(50) NOT NULL GO ALTER TABLE [dbo].[towns_singular_geodetic] ADD PRIMARY KEY CLUSTERED ( town ASC ) GO ALTER TABLE dbo.towns_singular_geodetic ALTER COLUMN geom geography NOT NULL GO CREATE SPATIAL INDEX [geom_sidx] ON [dbo].[towns_singular_geodetic] ( [geom] )USING GEOGRAPHY_GRID WITH ( GRIDS =(LEVEL_1 = MEDIUM,LEVEL_2 = MEDIUM,LEVEL_3 = MEDIUM,LEVEL_4 = MEDIUM), CELLS_PER_OBJECT = 16) ;
After all that work rerunning our query - we are using the clustered index now, but no spatial index and speed is not improved.
Sadly it seems to get better performance (in fact down to less than 1 sec), we have to put in a hint to tell it to use the spatial index.
SELECT t.town FROM (SELECT geom FROM towns_singular_geodetic WHERE town = 'BOSTON') as boston INNER JOIN towns_singular_geodetic As t WITH(INDEX(geom_sidx)) ON boston.geom.STIntersects(t.geom) = 1 ORDER BY t.town
A snapshot of part of the show plan is here and we see that mystical warning flag telling us we have no stats for that spatial index:
So we have a workaround, but the workaround to me is very unappealing, perhaps because I come from the old school of thought that databases should know more about the state of the data than I do and the ideal solution is not to give the database hints, but figure out why the database doesn't think it needs to use the spatial index. I can go into a whole diatribe of why HINTS are bad, but I'll try to refrain from doing so since I'll just bore people with my incessant whining and start some mini holy wars along the way.
As my big brother likes saying -- When a fine piece of American rocketry doesn't work right, you better hope you have a Russian Cosmonaut hanging around and laughing hysterically. It took me a while to get the full thrust of that joke and appreciate my brother's twisted sense of humor.
What does all the above nonsense mean. In a nutshell - sophisticated machinery is nice, but don't let the complexity of that machinery shield you from the fundamentals. Admittedly its hard and I thank my lucky stars that I was brought up in a time before computers and programming got too complicated for a little kid to tear apart and successfully put back together. There is nothing as exhilarating for a nerd kid as being able to tear apart his/her toys and realizing "Yes I have the power to fix this myself".
There are a couple of things that most relational databases share with each other regardless of how simple or complex they are
With all these great gadgets and wizards, things work and the world is good except when things don't work and then you look up at that Russian Cosmonaut hanging above you in his modest machine with his pint of warm vodka in hand laughing hysterically. Why is he laughing hysterically? Because ironically he has never seen anything quite like this gorgeous machine you have which looks especially magical in his drunken state, but even in his drunken stupor he recognizes your problem, but you, fully sober, and having used this machine for so long are still scratching your head. His machine is not quite so complicated and he has seen the problem less disguised before. He has fewer magical wizards to summon to help out so he relies more on his hard-earned cultivated understanding of fundamentals of how things are supposed to work to guide him. He is one with his machine and you are not because in wizard you trusted, but the wizard seems to be doing very strange things lately.
So what's the problem here? Why must SQL Server be told to use the spatial index sometimes though not all the time? Well it is no surprise, spatial is a bit of a new animal to SQL Server and the planner is probably a bit puzzled by this new beast. I thought perhaps its because there are no statistics for the spatial index and in fact you can't create statistics for the spatial index. If you try to do this: CREATE STATISTICS [geom_towns_singular_geodetic] ON [dbo].[towns_singular_geodetic]([geom]) GO
CREATE STATISTICS [geom_towns_singular_geodetic] ON [dbo].[towns_singular_geodetic]([geom]) GO
You get this - Msg 1978, Level 16, State 1, Line 1 Column 'geom' in table 'dbo.towns_singular_geodetic' is of a type that is invalid for use as a key column in an index or statistics.
Even after trying to force statistics with update towns_singular_geodetic statistics. Still no statistics appeared for the spatial index. One day magically the statistics appeared for both towns_singular_geodetic and towns_geodetic (SQL Server is known to create statistics behind the scenes when the mood strikes it like all good wizards), but they were for some mystical tables called sys.extended..index... that contained cells, grids and stuff - I couldn't see these things, I couldn't select from them, but I could create statistics on them too just like the wizard can. I tried figuring out how this happened by dropping the index and readding and as expected the statistics went away and I'm sure they will come back.
update towns_singular_geodetic statistics
Contrasting and comparing between PostgreSQL and SQL Server. PostgreSQL lets you fiddle with coefficients and for 8.3+ you can fiddle down to individual functions, but you can't HINT. No Hinting is allowed. SQL Server allows you to hint all you want but no fiddling with its coefficients (that is the wizards realm). So what to do what to do. When you feel like HINTING go for SQL Server, but if you are in the mood to write big ass equations go for PostgreSQL :). Well truth be known in most cases you neither have to HINT nor write big ass equations except when the planner seems to be fumbling.
Note: In the other scenario of Part 2, SQL Server gladly used the spatial index without any hints and even without stats, but this time it is not. Why not? Because sometimes you don't need to look at statistics to know it is better to use something and your rule of thumb just drastically prefers one over the other. If you have millions of people in a crowd and you are looking for a person with a blue-checkered shirt, you can guess without looking at the distribution of the index that if you have an index by shirt-color type, it would probably help to use that index. However if you have only 2 people in a crowd and you are looking for someone with a blue-checkered shirt, it is probably faster to just scan the crowd than pulling out your index and cross-referencing with the crowd.
This exercise was a bit of a round about. If I really wanted to know which towns are adjacent to Boston I really don't need to do a spatial union, I could do the below query and the below query magically uses the spatial index without any hinting and returns the same 13 records in the same under 1 sec speed. SELECT t.town FROM (SELECT geom FROM towns_geodetic WHERE town = 'BOSTON') as boston INNER JOIN towns_geodetic As t ON boston.geom.STIntersects(t.geom) = 1 GROUP BY t.town ORDER BY t.town
SELECT t.town FROM (SELECT geom FROM towns_geodetic WHERE town = 'BOSTON') as boston INNER JOIN towns_geodetic As t ON boston.geom.STIntersects(t.geom) = 1 GROUP BY t.town ORDER BY t.town
So what does this tell us. Possibly the large size of each spatial geometry in the unioned version and the relative small size of the table is making SQL Server think bah -- scanning the table is cheaper. So the cost coefficients it is applying to the spatial index may be over-estimated. Perhaps this will be improved in Service Pack 1.
SQL Server 2008 comes with a couple of spatial helper stored procedures to help us diagnose problems. Below are some sample uses.
DECLARE @qg geography SELECT @qg = geom FROM towns_singular_geodetic WHERE town = 'BOSTON' EXEC sp_help_spatial_geography_index 'towns_singular_geodetic', 'geom_sidx', 1,@qg
One of the very great things about the UMN Mapserver Web system is that it can support numerous kinds of datasources. In this brief excerpt we will provide examples of how to specify the more common data sources used for layers. The examples below are for Mapserver 4.6, but for the most part are applicable to lower versions.
File locations for file based datasources such as ESRI Shape and MapInfo tab files are defined relative to the SHAPEPATH attribute or as absolute paths. For example the beginning declaration of your .map file might look something like the below
MAP # # Start of map file # NAME MYMAP EXTENT 732193.725550 2904132.702662 799614.090681 2971466.288170 SIZE 500 500 SHAPEPATH "c:\mydata\" : :
The most common kind of data used in UMN Mapserver is the ESRI shapefile which has a .shp extension. For this kind of datasource you simply specify the location of the file without even specifying the extension. Below is a sample declaration of a polygon layer that uses a shape file LAYER NAME buildings TYPE POLYGON STATUS DEFAULT DATA buildings PROJECTION "init=epsg:2249" END CLASS OUTLINECOLOR 10 10 10 END END
LAYER NAME buildings TYPE POLYGON STATUS DEFAULT DATA buildings PROJECTION "init=epsg:2249" END CLASS OUTLINECOLOR 10 10 10 END END
Many datasources are available to mapserver via the GDAL OGR driver. Map Info is one of those datasources. Below example is what a mapinfo layer definition looks like. Note the tab file specified should be placed in the folder denoted by SHAPEPATH at top of map file
LAYER NAME buildings STATUS DEFAULT MINSCALE 7000 CONNECTIONTYPE OGR CONNECTION "buildings.tab" TYPE POLYGON PROJECTION "init=epsg:2249" END # -- MapInfo has projection information built in the tab file # -- so you can often auto read this information with the below #PROJECTION # AUTO #END CLASS OUTLINECOLOR 10 10 10 END END
Mapserver has a custom driver for the PostGIS spatial database. In order to use this, your mapserver cgi or mapscript must be compiled with the PostGIS driver. Below is what a postgis mapserver layer looks like.
LAYER CONNECTIONTYPE postgis NAME "buildings" CONNECTION "user=dbuser dbname=mydb host=myserver" # the_geom column is the name of a spatial geometry field in the table buildings DATA "the_geom from buildings" STATUS DEFAULT TYPE POLYGON # Note if you use a filter statement - this is basically like a where clause of the sql statement FILTER "storyhg > 2" CLASS OUTLINECOLOR 10 10 10 END END
LAYER NAME "projects" CONNECTIONTYPE postgis CONNECTION "user=myloginuser dbname=mydbname host=mydbhost password=mypass" DATA "the_geom FROM (SELECT a.projid, a.projname, a.projtype, a.projyear, a.pid, parc.the_geom FROM projects a INNER JOIN parcels parc ON a.parcel_id = parc.pid WHERE a.projyear = 2007) as foo USING UNIQUE projid USING SRID=2249" STATUS OFF TYPE POLYGON CLASS NAME "Business Projects" EXPRESSION ('[projtype]' = 'Business') STYLE OUTLINECOLOR 204 153 51 WIDTH 3 END END CLASS NAME "Community Projects" EXPRESSION ('[projtype]' = 'Community') STYLE OUTLINECOLOR 204 0 0 WIDTH 3 END END PROJECTION "init=epsg:2249" END METADATA "wms_title" "Projects" "wfs_title" "Projects" gml_include_items "all" wms_include_items "all" END DUMP TRUE TOLERANCE 10 END
Mapserver has the ability to act as a WMS Server as well as a WMS Client. The WMS Client capabilities are accessed by defining WMS layers that connect to WMS servers. Below is an example of a WMS layer using the Microsoft Terraservices WMS Server.
LAYER NAME "msterraservicedoq" TYPE RASTER STATUS DEFAULT CONNECTION "http://terraservice.net/ogcmap.ashx?" CONNECTIONTYPE WMS MINSCALE 3000 MAXSCALE 20000 #DEBUG ON METADATA "wms_srs" "EPSG:26919" "wms_name" "doq" "wms_server_version" "1.1.1" "wms_format" "image/jpeg" "wms_style" "UTMGrid_Cyan" "wms_latlonboundingbox" "-71.19 42.23 -71 42.40" END END
In this example we show how to use Mapserver as a WMS client by utilizing Microsoft's Terra Service WMS server. For more details about Microsft's OGC WMS check out the GetCapabilities of Microsoft Terraservice. download
This is a work in progress, and demonstrates the ST_ConcaveHull function that will be available in PostGIS 2.0. The basic approach is that it first creates a convexhull of the geometry and then uses the ST_ClosestPoint function introduced in PostGIS 1.5 to cave in the hull to transform it into a concave hull. That's the basic idea, the second arguments is the percent target of convexhull. The routine will recursively shrink until it determines it can't reach the target percent or it gets lower than target.
CREATE TABLE s_test(geom geometry); INSERT INTO s_test(geom) VALUES(ST_GeomFromText('MULTIPOINT( (120 -224), (185 -219), (190 -234), (200 -246), (212 -256), (227 -261), (242 -264), (257 -265), (272 -264), (287 -263),(302 -258), (315 -250), (323 -237), (321 -222), (308 -213), (293 -208), (278 -205), (263 -202), (248 -199), (233 -196), (218 -193), (203 -190), (188 -185), (173 -180), (160 -172), (148 -162), (138 -150), (133 -135), (132 -120), (136 -105), (146 -92), (158 -82), (171 -74), (186 -69), (201 -65), (216 -62), (231 -60), (246 -60), (261 -60), (276 -60), (291 -61), (306 -64), (321 -67), (336 -72), (349 -80), (362 -89), (372 -101), (379 -115), (382 -130), (314 -134), (309 -119), (298 -108), (284 -102), (269 -100), (254 -99), (239 -100), (224 -102), (209 -107), (197 -117), (200 -132), (213 -140), (228 -145), (243 -148), (258 -151), (273 -154), (288 -156), (303 -159), (318 -163), (333 -167), (347 -173), (361 -179), (373 -189), (383 -201), (389 -215), (391 -230), (390 -245), (384 -259), (374 -271), (363 -282), (349 -289), (335 -295), (320 -299), (305 -302), (290 -304), (275 -305), (259 -305), (243 -305), (228 -304), (213 -302), (198 -300), (183 -295), (169 -289), (155 -282), (143 -272), (133 -260), (126 -246), (136 -223), (152 -222), (168 -221), (365 -131), (348 -132), (331 -133), (251 -177), (183 -157), (342 -98), (247 -75), (274 -174), (360 -223), (192 -85), (330 -273), (210 -283), (326 -97), (177 -103), (315 -188), (175 -139), (366 -250), (321 -204), (344 -232), (331 -113), (162 -129), (272 -77), (292 -192), (144 -244), (196 -272), (212 -89), (166 -236), (238 -167), (289 -282), (333 -187), (341 -249), (164 -113), (238 -283), (344 -265), (176 -248), (312 -273), (299 -85), (154 -261), (265 -287), (359 -111), (160 -150), (212 -169), (351 -199), (160 -98), (228 -77), (376 -224), (148 -120), (174 -272), (194 -100), (292 -173), (341 -212), (369 -209), (189 -258), (198 -159), (275 -190), (322 -82))') ) ; SELECT ST_AsBinary( geom ) FROM s_test ;
SELECT ST_AsBinary( ST_ConcaveHull(ST_Collect(geom),0.99) ) FROM s_test;
SELECT ST_AsBinary( ST_ConcaveHull(geom,0.90) ) FROM s_test ;
SELECT ST_AsBinary( ST_ConcaveHull(geom,0.90,true) ) FROM s_test;
SELECT ST_AsBinary( ST_ConvexHull(ST_Collect(geom)) ) FROM s_test ;
SELECT ST_AsBinary( ST_ConcaveHull(ST_Collect(geom),0.80) ) FROM stumper;
SELECT ST_AsBinary( ST_ConvexHull(ST_Collect(geom)) ) FROM stumper;
CREATE TABLE pointwave(gid serial primary key, geom geometry); INSERT INTO pointwave(geom) SELECT ST_Point(i*random(),j*random()) As the_geom FROM generate_series(1,100) As i CROSS JOIN generate_series(-20,20) As j ; SELECT geom FROM pointwave;
SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.80 ) ) FROM pointwave;
SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.50 ) ) FROM pointwave;
SELECT ST_AsBinary(ST_ConvexHull(ST_Collect(geom))) FROM stumper;
SELECT ST_AsBinary(geom) FROM pointlocs WHERE geom IS NOT NULL;
SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.99 ) ) FROM pointlocs WHERE geom IS NOT NULL;
SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.90 ) ) FROM pointlocs WHERE geom IS NOT NULL;
SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.90 ),true ) FROM pointlocs WHERE geom IS NOT NULL;
SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.70 ) ) FROM pointlocs WHERE geom IS NOT NULL;
SELECT ST_AsBinary( ST_ConcaveHull( ST_Collect(geom),0.70,true ) ) FROM pointlocs WHERE geom IS NOT NULL;
SELECT ST_AsBinary( ST_ConvexHull( ST_Collect(geom) ) ) FROM pointlocs WHERE geom IS NOT NULL;
SELECT ST_Union(geom) FROM neighborhoods;
SELECT ST_AsBinary( ST_ConcaveHull(ST_Union(geom),0.99) ) FROM neighborhoods;
SELECT ST_AsBinary( ST_ConcaveHull(ST_Union(geom),0.90) ) FROM neighborhoods;
SELECT ST_AsBinary( ST_ConcaveHull(ST_Union(geom),0.70) ) FROM neighborhoods;
SELECT ST_AsBinary( ST_ConvexHull(ST_Union(the_geom)) ) FROM neighborhoods;
SELECT ST_AsBinary( geom ) FROM mbta_busroutes ;
Note: In some cases the 99% (such as this example) gives a better and much faster result than some of the lower compressions. This is because of GEOS topological errors we run into that force the alogrithm to fall back on the convexhull for certain regions. So as a general rule, try the 99% first. It is generally much much faster than lower percentages, for fairly huge patches of empty area, also returns an areal percentage much less than the convexhull, and is also less likely to run into topological issues
SELECT ST_AsBinary( ST_ConcaveHull(ST_Collect(geom),0.99) ) FROM mbta_busroutes ;
SELECT ST_AsBinary( ST_ConcaveHull(ST_Collect(geom),0.90) ) FROM mbta_busroutes ;
SELECT ST_AsBinary( ST_ConcaveHull(ST_Collect(geom),0.70) ) FROM mbta_busroutes;
SELECT ST_AsBinary( ST_ConcaveHull(ST_Collect(geom),0.70, true) ) FROM mbta_busroutes;
SELECT ST_AsBinary( ST_ConvexHull(ST_Collect(geom)) ) FROM mbta_busroutes ;
ST_Dump is a function that takes a geometry and returns a set of Postgis geometry_dump structure. Geometry_dump is composed of 2 properties. geom and path. The geom is a geometry and path is a multi-dimensional integer array that has the exact location of the geom in the MULTI, Geometry Collection. For MULTI geometries, the path is one dimensional looking like {1}. For single geometries, the path is an empty array. For GeometryCollection it has multiple array elements depending on the complexity of the Geometry Collection.
ST_Dump comes in very handy for expanding a MULTI geometry into single geometries. Below is an example. That expands a set of MULTIPOLYGONs into single POLYGONs
SELECT somefield1, somefield2, (ST_Dump(the_geom)).geom As the_geom FROM sometable
It comes in very handy in conjunction with ST_Collect. ST_Collect will return a MULTI geom if you give it single geoms, but will return a Geometry collection if you try to feed it MULTI geometries. In general ST_Collect is faster than ST_Union. To take advantage of the speed and simplicity of ST_Collect without generation of GeometryCollections, you can expand your MULTIPOLYGONS, MULTILINESTRINGS, MULTIPOINTS into single geoms and then recollect them according to the grouping you desire. If you want to regroup a set of MULTI.. Into a new grouping, you can do something like the below.
SELECT stusps, ST_Multi(ST_Collect(f.the_geom)) as singlegeom FROM (SELECT stusps, (ST_Dump(the_geom)).geom As the_geom FROM somestatetable ) As f GROUP BY stusps
In this quick exercise, we will explore the following PostGIS OGC functions: Extent, Expand, Buffer, Distance
Extent is an aggregate function - meaning that it is used just as you would use SQL sum, average, min, and max often times in conjunction with group by to consolidate a set of rows into one. Pre-1.2.2 It returned a 2-dimensional bounding box object (BBOX2D) that encompasses the set of geometries you are consolidating.
Unlike most functions in PostGIS, it returns a postgis BBOX object instead of a geometry object. In some cases, it may be necessary to cast the resulting value to a PostGIS geometry for example if you need to do operations that work on projections etc. Since a bounding box object does not contain projection information, it is best to use the setSRID function as opposed to simply casting it to a geometry if you need projection information. SetSRID will automagically convert a BBOX to a geometry and then stuff in SRID info specified in the function call.
Extent has a sibling called Extent3d which is also an aggregate function and is exactly like Extent except it returns a 3-dimensional bounding box (BBOX3D).
Starting around version 1.2.2, Extent and Extent3d will be deprecated in favor of ST_Extent . ST_Extent will return a BOX3D object.
Expand returns a geometry object that is a box encompassing a given geometry. Unlike extent, it is not an aggregate function. It is often used in conjunction with the distance function to do proximity searches because it is less expensive than the distance function alone.
The reason why expand combined with distance is much faster than distance alone is that it can utilize gist indexes since it does compares between boxes so therefore reduces the set of geometries that the distance function needs to check.
Note in versions of PostGIS after 1.2, there exists a new function called ST_DWithin which utilizes indexes, simpler to write than the expand, &&, distance combination.
ST_Distance will be the preferred name in version 1.2.2 and up
The following statements return equivalent values, but the one using Expand is much faster especially when geometries are indexed and commonly used attributes are indexed.
Find all buildings located within 100 meters of Roslindale Using distance and Expand: Time with indexed geometries: 14.5 seconds - returns 8432 records SELECT b.the_geom_nad83m FROM neighborhoods n, buildings b WHERE n.name = 'Roslindale' and expand(n.thegeom_meter, 100) && b.thegeom_meter and distance(n.thegeom_meter, b.thegeom_meter) < 100 Using distance alone: Time with indexed geometries: 8.7 minutes - returns 8432 records SELECT b.the_geom_nad83m FROM neighborhoods n, buildings b WHERE n.name = 'Roslindale' and distance(n.thegeom_meter, b.thegeom_meter) < 100
SELECT b.the_geom_nad83m FROM neighborhoods n, buildings b WHERE n.name = 'Roslindale' and expand(n.thegeom_meter, 100) && b.thegeom_meter and distance(n.thegeom_meter, b.thegeom_meter) < 100
SELECT b.the_geom_nad83m FROM neighborhoods n, buildings b WHERE n.name = 'Roslindale' and distance(n.thegeom_meter, b.thegeom_meter) < 100
We will write the above using the new ST_DWithin to demonstrate how much easier it is.
SELECT b.the_geom_nad83m FROM neighborhoods n, buildings b WHERE n.name = 'Roslindale' and ST_DWithin(n.thegeom_meter, b.thegeom_meter, 100)
ST_Within(A,B) returns true if the geometry A is within B. There is an important distinction between Within and ST_Within and that is the ST_Within does an implicit A&&B call to utilize indexes where as Within and _ST_Within do not.
1.3.1 and above do
SELECT b.the_geom_nad83m FROM neighborhoods n, buildings b WHERE n.name = 'Roslindale' and ST_Within(b.thegeom_meter, n.thegeom_meter)
Pre 1.3.1 do
SELECT b.the_geom_nad83m FROM neighborhoods n, buildings b WHERE n.name = 'Roslindale' and b.thegeom_meter && n.thegeom_meter AND Within(b.thegeom_meter, n.thegeom_meter)
Buffer returns a geometry object that is the radial expansion of a geometry expanded by the specified number of units. Calculations are in units of the Spatial Reference System of this Geometry. The optional third parameter sets the number of segments used to approximate a quarter circle (defaults to 8) if third argument is not provided. This is a much more involved process than the expand function because it needs to look at every point of a geometry whereas the expand function only looks at the bounding box of a geometry.
Buffer can also be used to correct invalid geometries by smoothing out self-intersections. It doesn't work for all invalid geometries, but works for some. For example the below code will correct invalid neighborhood geometries that can be corrected. Note in here I am also combining the use with MULTI since in this case buffer will return a polygon and our table geometries are stored as multi-polygons. If your column geometry is a POLYGON rather than a MULTIPOLYGON then you can leave out the MULTI part.
UPDATE neighborhoods SET the_geom = multi(buffer(the_geom, 0.0)) WHERE isvalid(the_geom) = false AND isvalid(buffer(the_geom, 0.0)) = true
In this section we provide a graphical representation of what the operations Buffer, Expand, Extent look like when applied to geometries.
SELECT the_geom, name FROM neighborhoods WHERE name IN('Hyde Park','Roxbury')
SELECT expand(the_geom, 500) as geom, name FROM neighborhoods WHERE name IN('Hyde Park', 'Roxbury')
SELECT extent(the_geom) as thebbox, name FROM neighborhoods WHERE name IN('Hyde Park', 'Roxbury')
SELECT buffer(the_geom,500) as geom, name FROM neighborhoods WHERE name IN('Hyde Park', 'Roxbury')
The ST_GeomFromText OGC function is used to convert a Well Known Text (WKT) version of a geometry into a PostGIS geometry. Aliases: ST_GeometryFromText SQL-MM Spec Equivalents ST_GeomFromText, ST_WKTToSQL ()
This example demonstrates using a WKT representation of a Street in NAD 83 LongLat format and inserting it into a PostGIS geometry field.
INSERT INTO streets(street_name, the_geom) SELECT 'some street', ST_GeomFromText('LINESTRING(-70.729212 42.373848,-70.67569 42.375098)',4269)
The ST_GeomFromText function or equivalent exists in other spatial databases. For example in MySQL 5 and above, the function is called the same and uses the same syntax as above. In Oracle 10g Spatial and Locator, you would use the SDO_GEOMETRY function. So the above in Oracle would look something like. INSERT INTO streets(street_name, the_geom) SELECT 'some street', SDO_GEOMETRY('LINESTRING(-70.729212 42.373848,-70.67569 42.375098)',SDO_CS.MAP_EPSG_SRID_TO_ORACLE(4269))) FROM DUAL Note in the above code we are using an Oracle function called SDO_CS.MAP_EPSG_SRID_TO_ORACLE(), because Oracle Spatial Reference System IDs (SRID) are usually not the same as the commonly used European Petroleum Survey Group (EPSG)standard codes. PostGIS and MySQL SRIDs are generally the same as the EPSG IDs so that call is not necessary.
INSERT INTO streets(street_name, the_geom) SELECT 'some street', SDO_GEOMETRY('LINESTRING(-70.729212 42.373848,-70.67569 42.375098)',SDO_CS.MAP_EPSG_SRID_TO_ORACLE(4269))) FROM DUAL
INSERT INTO boston_buildings(name, the_geom) VALUES('some name', ST_GeomFromText('MULTIPOLYGON(((235670.354215375 894016.780856,235668.324215375 894025.050856,235681.154215375 894028.210856,235683.184215375 894019.940856,235670.354215375 894016.780856)))', 2805) )
ST_Intersects is a function that takes two geometries and returns true if any part of those geometries is shared between the 2. In PostGIS versions before 1.3 you would use the following syntax to utilize indexes SELECT a.somfield, b.somefield2 FROM a INNER JOIN b ON (a.the_geom && b.the_geom AND intersects(a.the_geom, b.the_geom)) In versions from 1.3 forward, the && indexable operator is automatically included in the definition of ST_Intersects so you can simply write SELECT a.somfield, b.somefield2 FROM a INNER JOIN b ON ST_Intersects(a.the_geom, b.the_geom)
SELECT a.somfield, b.somefield2 FROM a INNER JOIN b ON (a.the_geom && b.the_geom AND intersects(a.the_geom, b.the_geom))
SELECT a.somfield, b.somefield2 FROM a INNER JOIN b ON ST_Intersects(a.the_geom, b.the_geom)
The functions ST_Intersection and Intersection are compliments to ST_Intersects. What they return is that portion of geometry A and geometry B that is shared between the two geometries. If the two geometries do not intersect, then what is returned is an empty GEOMETRYCOLLECTION object.
NOTE: PostGIS versions 1.2.2 and up you should use ST_Intersection as Intersection is deprecated. The two functions are more or less equivalent though.
In this section we provide a graphical representation of what the Intersection looks like. We will use an example of a building and a parcel where the building does not completely sit inside the parcel.
SELECT b.the_geom As bgeom, p.the_geom As pgeom, ST_Intersection(b.the_geom, p.the_geom) As intersect_bp FROM buildings b INNER JOIN parcels p ON ST_Intersection(b,p) WHERE ST_Overlaps(b.the_geom, p.the_geom) LIMIT 1;
ST_Makeline is an aggregate function that takes a sequence of points and strings them together to make a line. In versions of PostGIS prior to 1.2.2, this was called MakeLine. For prior versions - replace ST_MakeLine with MakeLine.
In this example we have a table of gps point snapshots for our marathon practice for each day broken out by time. We want to convert the points for each day into a single record containing the line path of our run for that day. NOTE: For this example the trip_datetime field is of type timestamp so records the date and time the gps position was recorded. CAST(trip_datetime As date) (this is the ANSI sql standard) or PostgreSQL specific short-hand trip_datetime::date strips off the time part so we are just left with the day. We do a subselect with an order by to force the points to be in order of time so that the points of the line follow the path of our run across time.
SELECT St_MakeLine(the_point) as the_route, bp.trip_date FROM (SELECT the_point, CAST(trip_datetime As date) as trip_date FROM boston_marathon_practice ORDER BY trip_datetime) bp GROUP BY bp.trip_date;
In the previous example since our grouping field is in the same order as the datetime, we only needed to order by one field in our subselect. If your groupings are not in the same order as your line order, then you need to do additional orders for each field you plan to group by. If you don't your resulting points may get shuffled. In the below simplified example we have parcel centroids and we want to make line segments that join them such that for each unique street and zip we have one line segment. For extra measure we want our final set to include the start number range and end number for this segment.
This is a failry simplistic example. In reality you would probably need to do a little bit more because street segments have same names in Boston even within the same zip. We are also assuming the line segment drawing should follow the order of the street numbers and our street numbers are numeric.
SELECT p.street, p.zip, ST_MakeLine(p.the_point) As streetsegment, MIN(st_num) as st_num_start, MAX(st_num) As st_num_end FROM (SELECT street, zip, centroid(p.the_geom) as the_point, st_num FROM parcels ORDER BY zip, street, st_num) p GROUP BY p.zip, p.street
There are numerous ways of creating point geometries in PostGIS. We have covered these ways in other snippets. Check out the fulling links for other examples.
ST_MakePoint is perhaps in terms of speed the fastest way of creating a PostGIS geometry and on tests I have done, can be as much as 5 to 10 times faster. The downside of MakePoint is that it is not defined in the OGC spec so its not quite as portable across other spatial databases as is ST_GeomFromText and PointFromText.
ST_MakePoint is used in the form MakePoint(x, y) which for pretty much all spatial reference systems means ST_MakePoint(longitude, latitude)
MakePoint used alone creates a nontransformable geometry (one that has an SRID = -1, so because of that, MakePoint is usually used in conjunction with SetSRID to create a point geometry with spatial reference information. Examples below and the EWKT output to demonstrate the differences.
SELECT ST_MakePoint(-71.09405383923, 42.3151215523721) as the_point, AsEWKT(ST_MakePoint(-71.09405383923, 42.3151215523721)) as ewkt_rep
SELECT SETSRID(ST_MakePoint(-71.09405383923, 42.3151215523721),4326) as the_point, ST_AsEWKT(ST_SetSRID(ST_MakePoint(-71.09405383923, 42.3151215523721),4326)) as ewkt_rep
Note: Beginning with 1.2.1+ - the names without the ST_ prefix are preferred over the non-ST prefixed ones to conform to newer OGC standards. In future releases the non-ST names will be deprecated and eventually removed from PostGIS.
ST_PointFromText (previously known as PointFromText) and ST_LineFromText (previously known as LineFromText) and the other <Geometry Type>FromText functions are very similar to the all-purpose ST_GeomFromText, except that PointFromText only converts Point WKT geometry representations to a geometry object and returns null if another kind of Geometry is passed in. LineFromText similarly only converts WKT representations of Lines to geometry and returns null if another WKT representation is passed in.
If you look at the current underlying implementation of these functions in PostGIS (1.2.1 and below), you will observe that they in fact call ST_GeomFromText, but if the geometry type of the generated geometry is not a Point or a LineString respectively, then the functions return null.
The M*FromText, ST_M*FromText functions create Multi geometry types from the WKT representation of those types.
Since these function do have some added overhead (not much but some), it makes sense to just use ST_GeomFromText unless you have a mixed bag of WKT geometries coming in and only wish to selectively insert say the Points or the Lines or if just for plain readability to make it clear that you are indeed inserting Points or lines or MultiLineStrings etc.
UPDATE points_of_interest SET thepoint_lonlat = PointFromText('POINT(' || longitude || ' ' || latitude || ')',4326)
Simplify reduces the complexity and weight of a geometry using the Douglas-Peucker algorithm. It in general reduces the number of vertices in a polygon, multipolygon, line, or multi-line geometry. It takes 2 arguments, the geometry and the tolerance. To demonstrate we will apply the simplify function on Boston neighborhoods that are in SRID 2249 (NAD83 / Massachusetts Mainland (ftUS)) and compare the sizes of the shape files if we were to dump these out as ESRI shape files. We will also show pictorially how simplify changes the geometries.
SELECT the_geom FROM neighborhoods
SELECT simplify(the_geom,500) as simpgeom FROM neighborhoods
SELECT simplify(the_geom,1000) as simpgeom FROM neighborhoods
Note: When simplifying longlat data, you often get very strange results with simplify. It is best to first transform data from longlat to some other coordinate system, then simplify and then transform back to longlat. So something like SELECT transform(simplify(transform(the_geom, 2249), 500),4326) from neighborhoods
SELECT transform(simplify(transform(the_geom, 2249), 500),4326) from neighborhoods
ST_Summary(geometry) gives you a brief summary of a geometry telling you how many simple geometries, rings and type of geometry it is.
select ST_Summary(the_geom) from neighborhoods;
MultiPolygon[BS] with 1 elements Polygon[] with 2 rings ring 0 has 649 points ring 1 has 10 points MultiPolygon[BS] with 1 elements Polygon[] with 1 rings ring 0 has 182 points MultiPolygon[BS] with 2 elements Polygon[] with 1 rings ring 0 has 319 points Polygon[] with 1 rings ring 0 has 4 points MultiPolygon[BS] with 2 elements Polygon[] with 1 rings ring 0 has 376 points Polygon[] with 1 rings ring 0 has 68 points MultiPolygon[BS] with 3 elements Polygon[] with 1 rings ring 0 has 215 points Polygon[] with 2 rings ring 0 has 4 points ring 1 has 23 points Polygon[] with 1 rings ring 0 has 45 points MultiPolygon[BS] with 1 elements Polygon[] with 1 rings ring 0 has 71 points MultiPolygon[BS] with 1 elements Polygon[] with 2 rings ring 0 has 124 points ring 1 has 5 points MultiPolygon[BS] with 1 elements Polygon[] with 1 rings ring 0 has 199 points MultiPolygon[BS] with 1 elements Polygon[] with 1 rings ring 0 has 197 points MultiPolygon[BS] with 1 elements Polygon[] with 1 rings ring 0 has 168 points MultiPolygon[BS] with 1 elements Polygon[] with 1 rings ring 0 has 400 points MultiPolygon[BS] with 1 elements Polygon[] with 1 rings ring 0 has 355 points MultiPolygon[BS] with 1 elements Polygon[] with 1 rings ring 0 has 176 points MultiPolygon[BS] with 1 elements Polygon[] with 1 rings ring 0 has 392 points MultiPolygon[BS] with 1 elements Polygon[] with 2 rings ring 0 has 1965 points ring 1 has 29 points
The ST_Translate function takes any geometry (linestring, multiline etc) returns a new geometry that is the original geometry moved by a vector defined by X,Y,Z. Note the units of measurement are always in the units of the spatial reference system of the geometry argument. There are two forms of it ST_Translate. ST_Translate(geometry, X, Y, Z) and ST_Translate(geometry, X,Y).
In the below example we use the translate function to create a grid that divides the extent of Boston into into 200x200 rectangles (40,000). By doing the following
CREATE TABLE throwaway_grid(gid SERIAL PRIMARY KEY); SELECT AddGeometryColumn('public', 'throwaway_grid', 'the_geom', 26986, 'POLYGON', 2); INSERT INTO throwaway_grid(the_geom) SELECT ST_translate(ref.boxrep, hor.n*width, ver.n*height) As slice FROM (SELECT ST_xmin(ST_Extent(the_geom)) As xstart, ST_xmin(ST_Extent(the_geom)) as ystart, ST_SetSRID(CAST('BOX(33863.73046875 777606.3125,35348.73046875 778517.3125)' as box2d), 26986) as boxrep, ceiling((ST_xmax(ST_Extent(the_geom)) - ST_xmin(ST_Extent(the_geom)))/200) as width, ceiling((ST_ymax(ST_Extent(the_geom)) - ST_ymin(ST_Extent(the_geom)))/200) as height FROM towns) As ref, generate_series(1,200) as hor(n), generate_series(1,200) as ver(n); CREATE INDEX idx_throwaway_grid_the_geom ON throwaway_grid USING gist(the_geom);
The resulting image of this looks like this. Take a look at Map Dicing to see an example of how you can use this in conjunction with ST_Interseciton to dice up maps into smaller pieces.
The translate function can be used in conjunction with other functions such as Rotate to do interesting things like rotate a geometry about a point or create ellipses such as described here
This Document is available under the GNU Free Documentation License 1.2 http://www.gnu.org/copyleft/fdl.html & for download at the BostonGIS site http://www.bostongis.com