What is SpatiaLite?
SpatiaLite is an SQLite database engine with Spatial functions added. You can think of it
as a spatial extender for SQLite database engine which is similar in concept to what PostGIS does for the PostgreSQL Object-Relational Database.
For those unfamiliar with SQLite -- it is this little adorable single file relational database system that runs equally well on Windows, Linux, Unix, Mac and is easily embeddable in larger apps. Is Free and Open Soure -- Public domain so no restrictions for commercial and embedding in other applications and
I think is superb as a mini-replicator. We use it in some of our projects for synching data back and forth between a mainstream server database such as PostgreSQL, SQL Server, MySQL and a client
offline database that needs to synch differential data back and forth to the mother ship.
What is special about it is as follows:
- all the tables fit into a single file so easy for transport
- It has this cute little dblink like feature that allows you to mount external files as virtual tables and join with your SQLite data.
- PHP has built-in drivers for SQLite it, .NET has drivers
for it you can dump in your bin folder and get going, SharpMap (has a driver for the SpatiaLite version in the works) - thanks to Bill Dollins similar to what it has for PostGIS. (note the drivers unlike other databases contain the engine as well).
This is very important to understand since you want your driver to have the SpatiaLite enhancements.
- The SQLite core engine is a relatively tiny application that provides you more or less standard ANSI-SQL 92 syntax so with a basic abstraction layer, you can treat this as you would any other relational database.
- My favorite - it even supports SQL views
- It supports triggers
- Has basic ACID/Transactions -- BEGIN/COMMIT;
- Did I mention its cute, but don't let its cuteness deceive you of its potential. Seems to be lacking an animal mascot that embodies its cuteness.
SpatiaLite sweetens this little database by allowing you to store geometries and query them with spatial functions
similar to what you would find in PostgreSQL/PostGIS, Microsoft SQL Server 2008, MySQL, IBM DBII, Oracle Locator/Spatial. In terms of the model it uses,
it seems closest in syntax to PostGIS and in fact modeled after PostGIS and also piggy-backs on GEOS and Proj.4 like PostGIS. In fact the functionality you will see is pretty much the functions you get in PostGIS including their names minus the aggregates and some other things, except you just need to strip off the ST_ and add in a bbox filter, so PostGIS users should feel very much at home.
Some key features of SpatiaLite
- OGC functions similar to what you find implemented in PostGIS/GEOS. This uses GEOS.
- R-Tree Spatial Index if you use SQLite 3.6+, and rudimentary MBR index for lower SQLite
- Rudimentary support for curves
- Lots of MBR functions (Minimum bounding rectangle), similar to what MySQL 5.1 and below has, but also has real functions as described above for more exact calculations
- Unfortunately the current release seems to lack spatial aggregate functions that PostGIS has such as Extent, Union, collect.
- Spatial Transformation support using Proj.4 that hmm even SQL Server 2008 lacks
As far as licensing goes, SpatiaLite does not have the same License as SQLite. It is under 3 licenses Mozilla Public and GPLv3
In this little
exercise, we shall get you up and running quickly with this cute cute database engine.
In terms of tutorials -- here is a recent one by MapFish people -- SpatiaLite in 5 minutes.
This article will be similar except we shall go thru similar exercises we did for SQL Server 2008 and PostGIS as a contrast and compare.
Installing SpatiaLite
Installation is simple. -- just download and extract and run the GUI.
- Download
from the http://www.gaia-gis.it/spatialite/ : Download SpatiaLite GUI statically linked (choose your platform)
- under Useful spatial scripts download
http://www.gaia-gis.it/spatialite/binaries.html
- Extract the spatialite-gui. Note you can use spatialite-gis to view and import tables, but the spatialite-gui gives you ability to do free SQL queries, but not as good viewing features, while the spatialite-gis can create a new database, import and can show you a whole map, but has limited filtering. We'll cover spatialite-gis graphical tool in the next tutorial.
Creating a spatial database
The GUI has imbedded in it the engine that runs the database similar in concept to the way Microsoft Access application has embedded the engine to control an MS Access MDB. So to get started, launch the GUI.
- Launch the GUI
- From menu choose --> files -> Create a new SQLiteDb
- Call the file boston.sqlite
- Choose the little icon with tooltip "Execute SQL Script" and choose init_spatialite-2.2.sql
- Choose Latin if prompted for encoding
- Right mouse-click on the database and hit refresh and you should see three tables. If you right click on the spatial_ref_sys table and click edit rows,
Your screen should look like this:
Loading GIS Data Into the Database
Now we have a nice fully functional GIS database with
no spatial data. So to do some neat stuff, we need to get some data to
play with.
Get the Data
Download data from the MassGIS site.
NOTE: MAssGIS has restructured their site quite a bit. When we wrote this article it was on a file located at ftp://data.massgis.state.ma.us/pub/shape/state/towns.exe. We've updated this article to link to the newer comparable file
For this simple exercise just download Towns
Extract the file into some folder. We will only be using the _POLY
files for this exercise.
Figure out SRID of the data
You will notice one of the files it extracts is called
TOWNS_POLY.prj. A .prj is often included with ESRI shape files and
tells you the projection of the data. We'll need to match this
descriptive projection to an SRID (the id field of a spatial ref record
in the spatial_ref_sys table) if we ever want to reproject our data.
Loading the Data
The easiest data to load into SpatiaLite is to use the ESRI shape
data loader packaged with the SpatiaLite gui.
Load Towns data
- Open up the SpatiaLite gui if you don't already have it open
- From the SpatiaLite gui --> File -> Load Shape file and browse for TOWNS_POLY.shp
and type in 26986 for SRID. Type in the_geom for field name. Windows Latin1 is selected by default
and is fine for Massachusetts data and most data sources. Type in towns for table name.
Your screen should look like this.
and then click okay. It should state 631 records created and you should see a new table called towns.
Note that SQLite is not case sensitive (so similar in concept to SQL Server in normal state), so its okay to use mixed case.
There are a couple of things I would like to point out that the shape file loader auto-magically does for you.
- It adds an entry to geometry_columns table -- if you type in --
SELECT * from geometry_columns;
in the
Query window and click the Execute SQL Statement icon to the right of the window, you'll see one record listed in the query result and that the type field
says its a MULTIPOLYGON.
- It adds triggers to the TOWNS_POLY table that prevent data that is not of right SRID or geometry from being added. If you expand the TOWNS_POLY table
on the left - you'll see these. These serve the same purpose as the geometry contraints in PostGIS. If you show the source of the triggers, one of them looks something like this:
So you see it looks back on the geometry_columns table to ensure what is in that table matches what someone tries to insert.
CREATE TRIGGER gtu_towns_the_geom BEFORE UPDATE ON towns
FOR EACH ROW BEGIN
SELECT RAISE(ROLLBACK,
'''towns.the_geom'' violates Geometry constraint [geom-type not allowed]')
WHERE (SELECT type FROM geometry_columns
WHERE f_table_name = 'towns' AND f_geometry_column = 'the_geom'
AND (type = GeometryType(NEW.the_geom)
OR type = GeometryAliasType(NEW.the_geom))) IS NULL;
END
Indexing the data
Table indexes are very important for speeding up the
processing of most queries. There is also a downside to indexes and
they are the following
- Indexes slow down the updating of indexed fields.
- Indexes take up space. You can think of an index as
another table with bookmarks to the first similar to an index to a book.
Given the above, it is often times tricky to have a
good balance. There are a couple general rules of thumb to go
by that will help you a long way.
- Never put indexes on fields that you will not use
as part of a where condition or join condition.
- Be cautious when putting index fields on heavily
updated fields. For example if you have a field that is
frequently updated and is frequently used for updating, you'll need to
do benchmark tests to make sure the index does not cause more damage in
update situations than it does for select query situations.
In general if the number of records you are updating at any
one time for a particular field is small, its safe to put in an index.
- Corrollary to 2. For bulk uploads of a
table - e.g. if you are loading a table from a shape, its best to put
the indexes in place after the data load because if an index
is in place, the system will be creating indexes as its loading which
could slow things down considerably.
- If you know a certain field is unique in a table, it
is best to use a unique or primary index. The reason for this is
that it tells the planner that once its found a match, there is no need
to look for another. It also prevents someone from accidentally
inserting a duplicate record as it will throw an error.
- For spatial indexes -an R-Tree index. An R-Tree basically
stores the bounding box of the geometry as the index. For large complex
geometries unfortunately, this is not too terribly useful. SpatiaLite 3.6+ as mentioned has R-Tree indexes
The most common queries we will be doing on this query
are spatial queries and queries by the town field. So we will create 2
indexes on these fields.
In the tree view that shows the tables, expand towns and right mouse click on the the_geom
field and then choose Build Spatial Index. You will see the little icon change on the column once that is done.
The SpatiaLite GUI doesn't seem to have a right-click feature for other indexes besides spatial, so to create an index on the town field do the following:
CREATE INDEX idx_town ON towns(town);
vacuum towns;
Querying Data
Go back into SpatiaLite gui
Test out the following queries from the query tool:
-- this is similar to the extent function we did in PostGIS but makes up for the fact that there is no extent function in SpatiaLite. The Rtree index
is implemented as a virtual table in SQLite and each record has the bbox min,max settings. So to simulate extent, we simply
join by the ROWID and pkid of the two tables.
SELECT MIN(idx.xmin) As bxmin,MIN(idx.ymin) As bymin, MAX(idx.xmax) As bxmax,
MAX(idx.ymax) As bymax
FROM towns As t INNER JOIN idx_towns_the_geom As idx
ON t.rowid = idx.pkid
WHERE t.town = 'BOSTON';
returns -- bxmin: 225473.6094 bymin: 886444.6250 bxmax: 252005.5781 bymax: 905284.0000
SELECT SUM(Area(the_geom)) FROM towns where town = 'BOSTON';
returns 128251224.3644
--reproject to Massachusetts state plane feet
SELECT town, astext(centroid(transform(the_geom,2249)))
FROM towns
WHERE town > ''
ORDER BY town LIMIT 2; returns
ABINGTON POINT(802961.762366 2868485.109604)
ACTON POINT(672950.258529 3001540.967504);
--to see what indices its using
EXPLAIN QUERY PLAN
SELECT SUM(Area(the_geom))
FROM towns where town = 'BOSTON';
--or the painful steps
EXPLAIN
SELECT SUM(Area(the_geom))
FROM towns where town = 'BOSTON';
Example:
--Here is an exercise we did in SQL Server 2008 --
Here we arbitrarily take the first point that defines a polygon in Boston
and ask what town POLYGON/MULTIPOLYGON geometries are within 1 mile of this point
and we also want to know the exact distances and results ordered by distance. The speed of this was
surprisingly good and finished in under a second and returned 3 rows.
--I was hoping this would use a spatial index but it doesn't
SELECT t.town, Distance(ref.point1,t.the_geom)/0.3048 As dist_ft,
Distance(ref.point1, t.the_geom) As dist_m
FROM towns As t
INNER JOIN (
SELECT PointN(Boundary(the_geom),1) As point1
FROM towns WHERE town = 'BOSTON' LIMIT 1) As ref
ON MbrIntersects(BuildCircleMbr(X(ref.point1), Y(ref.point1),1609.344), t.the_geom)
WHERE Distance(ref.point1, t.the_geom) < 1609.344
ORDER BY Distance(ref.point1, t.the_geom);
--this seems to perform just as well -- would need a larger set to do a real test
SELECT t.town, Distance(ref.point1,t.the_geom)/0.3048 As dist_ft,
Distance(ref.point1, t.the_geom) As dist_m
FROM towns As t
INNER JOIN (
SELECT PointN(Boundary(the_geom),1) As point1
FROM towns WHERE town = 'BOSTON' LIMIT 1) As ref
ON ( Distance(ref.point1, t.the_geom) < 1609.344)
ORDER BY Distance(ref.point1, t.the_geom);
Viewing the Data
If you are a GIS newbie, I highly recommend using Quantum GIS.
The latest binary distributions of QuantumGIS (as of this writing QGIS 1.4+) now have the SpatiaLite driver built in.
You can also use the latest Spatialite-gis minimalist, which has viewer and shapefile importer packaged in. Latest version is 1.0 Alpha and binaries available for Windows, Linux, and Mac OSX.
Post Comments About Part 1: Getting Started with SpatiaLite: An almost Idiot's Guide