Part 1: Getting Started With PostGIS: An almost Idiot's Guide (PostGIS 2.0)
Printer Friendly
What Is PostGIS?
PostGIS is an open source, freely available, and
fairly OGC compliant spatial database extender for the PostgreSQL
Database Management System. In a nutshell it adds spatial functions
such as distance, area, union, intersection, and specialty geometry data types to the
database. PostGIS is very similar in functionality to SQL Server 2008 Spatial support, ESRI ArcSDE,
Oracle Spatial, and DB2 spatial extender. The latest release version
now comes packaged with the PostgreSQL DBMS installs as an optional
add-on. As of this writing PostGIS 2.0.0 is the latest stable release. Noteable enhancements in this release:
- now packages along new 3D measurement functions, 3D spatial index support and 3D surface area types
- seamless raster/vector analysis support including an easy to use command line raster database loader that supports various types and can load whole folders of raster files with one commandline statement, and really jazzy image export functions to output both raster and geometries as PNG/TIFF and other raster formats.
- SQL/MM topology support
- the graphical gui loader,which is packaged with the Windows Application Stack builder and some other desktop distros, now includes batch file uploading as well as exporting.
This feature can be enabled as a plugin in pgAdmin III
- Ability to install using
CREATE EXTENSION postgis , ALTER EXTENSION postgis .. if you are running 9.1.3
- KNN distance functionality if you are running PostgreSQL 9.1+
- It also includes as an extra: tiger geocoder with loader for 2010 Tiger data. You can expect to see this tiger geocoder
upgrade in 2.1.0 to load tiger 2011 data.
- The very first version of PostGIS to support PostgreSQL 64-bit on windows
We will assume a windows environment for this
tutorial, but most of the tutorial will apply to other supported
platforms such as Linux, Unix, BSD, Mac etc. We will also be using
Massachusetts/Boston data for these examples. For desktop users, the EnterpriseDB one-click installer exists as well for Mac/OSX and Linux desktops, so you should be able to follow along without too much fuss.
Installing PostgreSQL with PostGIS Functionality
We will not go into too much detail here since the
install wizard (at least the windows one) is pretty good. Below are the
basic steps.
Note for Vista Users Because of the new added security in Vista, you may run into issues installing PostgreSQL. Please refer to the Windows Vista gotchas http://trac.osgeo.org/postgis/wiki/UsersWikiWinVista in the PostGIS wiki if you run into issues.
- Download the install for your specific platform
from the PostgreSQL Binary Download ( http://www.postgresql.org/download/ ) . As of
this writing the latest version is PostgreSQL 9.1.3 and we will be assuming PostGIS 2.0. The minimum support PostgreSQL for PostGIS 2.0.0 is PostgreSQL 8.4 and to get the full
fancy smancy features like KNN distance indexable operators and ability to install with extensions, you need 9.1. So use 9.1, PLEASE, enough said.
The below options follow the
basic sequence of the postgresql windows installer.
- Launch exe to install PostgreSQL
- If you want to access this server from other than
the server itself. Check the "Accept connection on all addresses, not
just localhost". NOTE: You can change this later by editing the
postgresql.conf -> listen_addresses property and if you don't
like the default port of 5432 you can change this as well in the
postgresql.conf -> port property.
- For encoding UTF-8 is preferred because you can
convert to other encodings. SQL_ASCII was the default on
Windows before 8.3 and was later replaced with WIN1252. UTF-8 however is now supported well under Windows and generally the default on Linux/Unix.
- Once PostgreSQL is installed, launch Application Stack Builder from (Start->Programs->PostgreSQL 9.1->Applciation Stackbuilder and pick the version of PostgreSQL you want to install PostGIS on and the version of PostGIS to install. NOTE: PostGIS 1.5,2.0.0 can coexist on the same server so you can install both, in 2.0.0 (as far as windows is concerned, we changed the template to template_postgis20) so it won't require overwriting with the 1.* template_postgis database. The dumper,loader commandline and GUI tools in PostgreSQL bin folder will get overwritten by the last PostGIS you installed so be careful. Generally speaking PostGIS 2.0.0 should work just fine everywhere you were using PostGIS 1.5 before, but you may need install the legacy_minimal.sql or legacy.sql if you have old applications or tools that use old functions (removed in PostGIS 2.0)
- Navigate to spatial extensions and pick PostGIS 2.0. Download , install.
Please note for many install packages - particularly windows. When you choose PostGIS as an option, the system will create a template_postgis20 (or template_postgis) template database for you that has PostGIS functions included.
The create spatial database checkbox is optional, and we generally uncheck it. It creates a spatial database for you to experiment with, but the template_postgis20 is always created for windows.
If you are running PostgreSQL 9.1+, we recommend you don't even bother with the template database and just use CREATE EXTENSION postgis in the database of your choosing or use PgAdmin Extensions install feature which we will cover in this tutorial.
- -- Do you want to register the GDAL_DATA prompt is new for PostGIS 2.0. This is because in order to do operations that require raster transformations or other rater warping / clipping etc, PostGIS uses GDAL epsg files. The windows build, makes a local copy of these in the PostgreSQL install\gdal-data folder and saying yes will
automatically add an GDAL_DATA environment variable putting this path in for you. If you use GDAL already (or you are running both PostGIS 32-bit and 64-bit, chances are you already have this environment variable set and you may not want to overwrite it. PostGIS will work happily with an existing one, but just remember if you uninstall a PostGIS 2.0 or your GDAL, these functions may stop working and you'll need to repoint the environment variable.
For those of you who want to try experimental builds -- e.g. We have experimental Windows builds made weekly or as frequently as anything interesting happens in the PostGIS code base. These can be downloaded from http://postgis.net/windows_downloads. For those who want to test out 2.0.1SVN, just replace the postgis-2.0.dll in the PostgreSQL lib folder with the one in the zip and run the respective postgis upgrade sql file in the share/contrib/postgis-2.0 or if you are using extensions ALTER EXTENSION postgis UPDATE TO "2.0.1SVN" .
Creating a spatial database
PostgreSQL comes packaged with a fairly decent admin
tool called PgAdmin3. If you are a newbie, its best just to use that
tool to create a new database.
- On windows PgAdmin III is under
Start->Programs->PostgreSQL 9.1->PgAdmin III
- Login with the super user usually postgres and the
password you chose during install. If you forgot it, then go into
pg_hba.conf (just open it with an editor such as notepad or a
programmer editor). Set the line
host all all 127.0.0.1/32 md5
to
host all all 127.0.0.1/32 trust
If you are on a newer windows (say 2008 or Windows 7), you may see an additional option host all all ::1/128 trust
The ::1/128 is usually the controlling one and is what localhost resolves to in IPV6 so you'll want to set this one.
This will allow any person logging locally to the computer that
PostgreSQL is installed on to access all databases without a password.
(127.0.0.1/32) means localhost only (32 is the bit mask). Note you can
add additional lines to this file or remove lines to allow or block
certain ip ranges. The lines highest in the file take precedence.
So for example if you wanted to allow all users logging in access as long as they successfully authenticate with an md5 password, then you can add the line host all all 0.0.0.0/0 md5 . If it is below, you will still be able to connect locally without a password but non-local connections will need a valid username and password.
Note: - The newer versions of PgAdmin III (1.7 something on) allow editing Postgresql.conf and pg_hba.conf using the PgAdmin III tool. These are accessible from Tools->Server Configuration and provide a fairly nice table editor to work with. This feature is only available if you have installed the adminpack.sql (this is located in C:\Program Files\PostgreSQL\9.x\share\contrib) (Admin Pack) in the postgres database .
On windows the file is located in C:\Program Files\PostgreSQL\9.0\share\contribs\adminpack.sql for versions of PostgreSQL below 9.1. It is located in the share folder of Linux installs as well. To install it --- switch to postgres database and run the adminpack.sql script in that database.
If you are running 9.1, however, it doesn't matter which OS you are running with -- just connect to your postgres database
and run this command in the sql window of PgAdmin or using psql:
CREATE EXTENSION adminpack;
(NOTE: you can also use the extensions gui of PgAdmin to install in the postgres db)
- Now for the fun part - Create your database. Call
it gisdb or whatever you want. If you are running PostgreSQL below 9.1, choose the template database called template_postgis20. Chose this as a template.
For PostgreSQL 9.1+, we recommend you just use the default template
- Its generally a good idea to create a user too that
owns the database that way you don't need to use your superuser account
to access it.
UPDATE: - The remaining steps in this section are not needed if you chose template_postgis20 for your new database. However if you are trying to spatially enable an existing database or you didn't get the template_postgis20 option. Do the remaining steps.
For PostgreSQL 9.1 -- expand your database and you should see an extensions section , right-mouse click on that and select the New Extension option.
Note the two options -- postgis and postgis_topology -- ain't that a beaut. This will work even if your PostgreSQL server is not on Windows. . Select postgis (this includes geometry, geography, and raster support). Then
if you want to use topology, then repeat the step picking postgis_topology. Also note the Definition tab, which allows you to install postgis in the non-default schema
and will also allow you a painless way to upgrade to PostGIS 2.0.1 when that comes out by changing the version number from drop down.
If you are unlucky enough to be using PostGIS on pre-9.1 and also don't have a template database, things are a bit tougher. The pre-PostgreSQL 9.1 way --Next go to tools->Query tool in pgAdmin III
and browse to the postgresql install contrib postgis.sql file (on
Windows the default install is Program
files\Postgresql\8.4\share\contrib\postgis-2.0. You'll need to run all these scripts:
postgis.sql
rtpostgis.sql
spatial_ref_sys.sql
topology.sql (if you want topology support)
On the Query tool, make sure your gisdb is selected
and then run the above scripts click the green arrow. You'll get a bunch of notices - not to
be alarmed.
As of PgAdmin III 1.10 -- the Plugin Icon has PSQL as an option and if you responded to the message Yes to the Overwrite my plugins.ini (for 9.1 you won't get a prompt since these files are stored separately), you should get an additional PostGIS Shapefile and DBF Loader option. . This option should become ungreyed when you select a database and when you launch it, it will pass in the credentatials to the database for you and launch a PSQL connection.
If the Plugins green is disabled (and says No Plugins installed) then most likely you have another PgAdmin or PostgreSQL install getting in the way. An easy fix is to open up PgAdmin III go under File->Options and make sure your PG bin path is pointing at the locations of your PostgreSQL bin (e.g. C:\Program Files\PostgreSQL\8.4\bin) and click the ... to repoint it if it is not.
Loading GIS Data Into the Database
Now we have a nice fully functional GIS database with
no spatial data. So to do some neat stuff, we need to get some data to
play with.
Get the Data
Download data from the MassGIS site.
For this simple exercise just download Towns
Extract the files into some folder. We will only be using the _POLY
files for this exercise.
NOTE: Someone asked how you extract the file if you are on a linux box.
---FOR LINUX USERS ---
If you are on Linux/Unix, I find the exercise even easier.
If you are on linux or have Wget handy - you can do the below to download the file after you have cded into the folder you want to put it in.
wget http://wsgw.mass.gov/data/gispub/shape/state/townssurvey_shp.exe
Now to extract it simply do the following from a shell prompt
unzip townssurvey_shp.exe
---END FOR LINUX USERS ---
NOTE: As of PostGIS 1.5.0, the windows build is now packaged with a shp2pgsql Graphical User Interface. You can also download it separately if you are using a lower version of PostGIS or want to load data from a separate workstation that doesn't have PostgreSQL installed.download this separately and use with any version of PostGIS from 1.2 - 2.0.0 and enable it as a Plug-In in PgAdminIII . Check out our screencast on configurating the shp2pgsql-gui as a PgAdmin III plug-in and using it. or our write-up on registering it
Figure out SRID of the data
You will notice one of the files it extracts is called
TOWNS_POLY.prj. A .prj is often included with ESRI shape files and
tells you the projection of the data. We'll need to match this
descriptive projection to an SRID (the id field of a spatial ref record
in the spatial_ref_sys table) if we ever want to reproject our data.
- Open up the .prj file in a text editor. You'll see
something like NAD_1983_StatePlane_Massachusetts_Mainland_FIPS_2001
and UNIT["Meter",1.0]
- Open up your PgAdmin III query tool and type in the
following statement
select srid, srtext, proj4text from
spatial_ref_sys where srtext ILIKE '%Massachusetts%' And
then click the green arrow. This will bring up about 10 records.
- Note the srid of the closest match. In this case
its 26986. NOTE: srid is not just a PostGIS term.
It is an OGC standard so you will see SRID mentioned a lot in other
spatial databases, gis webservices and applications. Most of the common
spatial reference systems have globally defined numbers. So 26986
always maps to NAD83_StatePlane_Massachusetts_Mainland_FIPS_2001
Meters. Most if not all MassGIS data is in this particular projection.
Loading the Data
The easiest data to load into PostGIS is ESRI shape
data since PostGIS comes packaged with a nice command line tool called
shp2pgsql which converts ESRI shape files into PostGIS specific SQL
statements that can then be loaded into a PostGIS database.
This file is located in the PostGresql bin folder
which default location in Windows is Program
Files/PostGreSQL/8.4/bin
Make a PostGIS mini toolkit
Since these files are so embedded, it is a bit
annoying to navigate to. To create yourself a self-contained toolkit
you can carry with you anywhere, copy the following files from the bin
folder into say c:\pgutils:
comerr32.dll krb5_32.dll libeay32.dll libiconv-2.dll libintl-2.dll libpq.dll pgsql2shp.exe psql.exe pg_dump.exe pg_restore.exe shp2pgsql.exe ssleay32.dll
Note: The GUI loader is packaged as a self-contained postgisgui folder in the bin of your PostgreSQL install. If you prefer the GUI interface, you can copy that folder and run the shp2pgsql-gui.exe file from anywhere even an external file network path.
Load Towns data
- Open up a command prompt.
- Cd to the folder you extracted the towns data
- Run the following command:
c:\pgutils\shp2pgsql -s 26986 TOWNS_POLY towns > towns.sql
- Load into the database with this command:
psql -d gisdb -h localhost -U postgres -f towns.sql
If you are on another machine different from the server, you will need
to change localhost to the name of the server. Also you may get
prompted for a password. For the above I used the default superuser
postgres account, but its best to use a non-super user account.
- Alternatively you can use the gui to load the data and when you do, your screen will look something like this. Which is a little different from the PostGIS 1.5 loader, because it allows uploading multiple files at ones. To edit any of the
settings for each file, click into the cell and the cell will become editable.
- This particular dataset is only polygons. You can override the behavior of bringing in as multipolgons by clicking the Options button checking the Generate simple geometries ... .
One thing you can do with the shp2pgsql command line version pacakged with PostGIS 2.0, that you can't do with the GUI is to do a spatial transformation from one coordiante system to another. So witht eh command line, we can transform to 4326 (WGS 84) and load to geography type with a single command. Hoepfully we'll see this in the GUI in PostGIS 2.0.1 or 2.1.0.
Indexing the data
Table indexes are very important for speeding up the
processing of most queries. There is also a downside to indexes and
they are the following
- Indexes slow down the updating of indexed fields.
- Indexes take up space. You can think of an index as
another table with bookmarks to the first similar to an index to a book.
Given the above, it is often times tricky to have a
good balance. There are a couple general rules of thumb to go
by that will help you a long way.
- Never put indexes on fields that you will not use
as part of a where condition or join condition.
- Be cautious when putting index fields on heavily
updated fields. For example if you have a field that is
frequently updated and is frequently used for updating, you'll need to
do benchmark tests to make sure the index does not cause more damage in
update situations than it does for select query situations.
In general if the number of records you are updating at any
one time for a particular field is small, its safe to put in an index.
- Corrollary to 2. For bulk uploads of a
table - e.g. if you are loading a table from a shape, its best to put
the indexes in place after the data load because if an index
is in place, the system will be creating indexes as its loading which
could slow things down considerably.
- If you know a certain field is unique in a table, it
is best to use a unique or primary index. The reason for this is
that it tells the planner that once its found a match, there is no need
to look for another. It also prevents someone from accidentally
inserting a duplicate record as it will throw an error.
- For spatial indexes - use a gist index. A gist basically
stores the bounding box of the geometry as the index. For large complex
geometries unfortunately, this is not too terribly useful.
The most common queries we will be doing on this query
are spatial queries and queries by the town field. So we will create 2
indexes on these fields. NOTE: The loader has an option to create the spatial index which we took
advantage of, so the spatial index one is necessary aside form just to keep in mind or if you forgot to enable the index opton in the loader.
CREATE INDEX idx_towns_geom
ON towns
USING gist(geom);
CREATE INDEX idx_towns_town
ON towns
USING btree(town);
Querying Data
Go back into PgAdmin III and refresh your view. Verify
that you have a towns database now.
Test out the following queries from the query tool
For PostGIS installations of 1.2.2 and above, the preferred function names start with ST_
SELECT ST_Extent(geom) FROM towns WHERE town = 'BOSTON';
SELECT ST_Area(ST_Union(geom)) FROM towns WHERE town = 'BOSTON';
Old syntax pre PostGIS 1.2.2 - this will not work in PostGIS 1.4+. If you have old code like this -- change it to the above syntax. We have crossed out the below code to demonstrate it is BAD
SELECT Extent(geom) from towns where town = 'BOSTON';
SELECT Area(GeomUnion(geom)) FROM towns where town = 'BOSTON';
Most functions in new postgis installs just have an ST_ prefixed, except for GeomUnion which became ST_Union. The other difference is that relational operators with ST_ now automagically use index operators where as the ones without ST_ you need to do an additional && call.
Example:
a.geom && b.geom AND Intersects(a.geom,b.geom)
can simply be written as
ST_Intersects(a.geom, b.geom)
If the above gives you an error such as mixed SRIDs, most likely you are running 1.3.2 postgis which was very defective. Upgrade to 1.3.3 at your next opportunity. To verify your install -
SELECT postgis_full_version();
Viewing the Data
If you are a GIS newbie, I highly recommend using Quantum GIS. Quantum GIS has ability to view PostGIS data directly, do simple filters on it, is free, is cross-platform (Linux, Windows, MacOSX,Unix) and is the least threatening of all the GIS Viewers I have seen out there for people new to GIS.
Post Comments About Part 1: Getting Started With PostGIS: An almost Idiot's Guide
Post Comments About Part 1: Getting Started With PostGIS: An almost Idiot's Guide (PostGIS 2.0)
|