Intro to OGR, Part I: Exploring Data

OGR is a really incredible tool for manipulating spatial data, but it can seem a little intimidating at first. I've created this tutorial series in hopes that it might help someone get past that intimidation and discover a few of OGR's many fantastic features.

In this tutorial, we're going to create a web map of parks in Austin, TX. We'll be using OGR and publicly-available data. At the end, we'll host our map for free on CartoDB. This should provide an introduction to using the open-source OGR toolkit to read and modify spatial data files.

Getting Started

This tutorial will work for both Windows & Linux users. Although aimed at anyone, this post assumes some basic knowledge of spatial data (e.g., geometry types and the idea of projections). And since OGR is a command-line tool, you'll spend a lot of your time typing into a terminal. Don't be scared! It'll be fun, I promise.

Here's what you'll need:

  • GDAL/OGR should be installed and on your system's PATH. If you're using Ubuntu, you can start here: Installing GDAL/OGR for Ubuntu.
  • QGIS. If you're using Ubuntu, see this post. You can also follow along in the GIS desktop utility of your choice. If you have & prefer to use ArcGIS Desktop, I won't stop you.
  • Dataset: download the "City of Austin Parks" shapefile (.zip)

To get started, unzip the dataset you just downloaded, and put the resulting shapefile in a new working directory.

Check out the Data

Take a few minutes to familiarize yourself with the shapefile: for this tutorial, we'll first look at the layer visually, and then explore the same data using OGR.

Visually (in QGIS)

  • Launch your GIS desktop application. Here's what the layer looks like in QGIS:
  • Add the dataset to your workspace, and look over the layout and layer extents.
  • Look at the attribute table, and check the number of features (267). Also note the fields. For our final map we want to show only developed parks -- looks like PARK_STATU is the field to remember for that.

From the Command Line (with OGR)

Now we're going to use a function that comes built-in to the OGR toolkit: ogrinfo. To start, open a terminal & change to your working directory (where you've placed the Austin Parks shapefile).

$ cd ~/gis/austin_parks

Get the Number of Features

To get the most out of OGR's tools, you'll often need to give a SQL query to a command with the -sql flag. In the example below, we're going to use ogrinfo on the city_of_austin_parks.shp shapefile to count the number of features. (Since this is a shapefile, there's only one layer, which has the same name as the file.)

Do this:

$ ogrinfo city_of_austin_parks.shp -sql "SELECT COUNT(*) FROM city_of_austin_parks"

To get this:

INFO: Open of `city_of_austin_parks.shp'
          using driver `ESRI Shapefile' successful.
Layer name: city_of_austin_parks
Geometry: Polygon
Feature Count: 1
Layer SRS WKT:
(unknown)
COUNT_*: Integer (0.0)
OGRFeature(city_of_austin_parks):0
  COUNT_* (Integer) = 267

OGR automatically detected that this is an "ESRI Shapefile" (in some less-common cases, you may need to explicitly tell OGR the file type). The geometry type, as we already know, is polygonal. And, as we saw in our GIS application a few minutes ago, there are 267 features (records) in this layer. Notice that the layer's spatial reference (SRS) is "unknown": since our SQL query didn't actually select any features (it just counted said features), ogrinfo had no geometries to look at and likewise, was unable to detect a spatial reference. Don't worry, we'll soon fix that.

More Useful Information About the Layer

This time we want ogrinfo to actually look at real geometries, and give us more detailed information about this layer. Using the -so flag will tell ogrinfo to return those details as a "summary only": we'll see information about the features, but we won't actually see the features themselves (with 267 features, that would be a long list to look at!).

Since we want ogrinfo to look at all the geometries here, we're going to SELECT everything from the layer.

Do this:

$ ogrinfo -so city_of_austin_parks.shp -sql "SELECT * FROM city_of_austin_parks"

To get this:

INFO: Open of `city_of_austin_parks.shp'
          using driver `ESRI Shapefile' successful.

Layer name: city_of_austin_parks
Geometry: Polygon
Feature Count: 267
Extent: (3060414.510114, 10021121.749968) - (3167828.500010, 10161094.720022)
Layer SRS WKT:
PROJCS["NAD_1983_StatePlane_Texas_Central_FIPS_4203_Feet",
        GEOGCS["GCS_North_American_1983",
                DATUM["North_American_Datum_1983",
                        SPHEROID["GRS_1980",6378137.0,298.257222101]],
                PRIMEM["Greenwich",0.0],
                UNIT["Degree",0.0174532925199433]],
        PROJECTION["Lambert_Conformal_Conic_2SP"],
        PARAMETER["False_Easting",2296583.333333333],
        PARAMETER["False_Northing",9842500.0],
        PARAMETER["Central_Meridian",-100.3333333333333],
        PARAMETER["Standard_Parallel_1",30.11666666666667],
        PARAMETER["Standard_Parallel_2",31.88333333333333],
        PARAMETER["Latitude_Of_Origin",29.66666666666667],
        UNIT["Foot_US",0.3048006096012192]]
PARK_ID: Integer (10.0)
PARK_NAME: String (100.0)
PARK_TYPE: String (30.0)
PARK_ADDRE: String (100.0)
PARK_STATU: String (30.0)
POLYGON_PA: Integer (5.0)
ACRES: Real (19.8)
CREATED_BY: String (10.0)
CREATED_DA: Date (10.0)
MODIFIED_B: String (10.0)
MODIFIED_D: Date (10.0)
PARK_LABEL: String (100.0)
SHAPE_AREA: Real (19.11)
SHAPE_LEN: Real (19.11)

Cool, right?

This time, ogrinfo was able to detect the spatial reference of this layer, and returns the Well Known Text (WKT) format to us. We see that this shapefile is originally in a Texas State Plane coordinate system, and the layer's extents are given in that system's unit of measurement (US feet). Below the projection information is a list of fields. Each field name is followed by that field's type (e.g., Integer) and max length (e.g., 10.0). From our earlier visual inspection, we remember that the field PARK_STATU is one we definitely want to use later. But what is in the other fields?

Look at a Single Feature

What if we want to inspect the content of just one feature? Doing that could give us a better idea of what all those fields hold.

This time, we'll use a -q flag to quiet some of the information we already know (e.g.,spatial reference & feature count). As before, we'll select all geometries from this layer, but this time using the -fid flag to tell ogrinfo we only want to see results for the first feature in our layer.

(if you wanted to get fancy you could instead use a WHERE clause in your SQL query to do the same or more - e.g., "SELECT * FROM city_of_austin_parks WHERE fid IN (1,3)" to return the first and third features).

Do this:

$ ogrinfo -q city_of_austin_parks.shp -sql "SELECT * FROM city_of_austin_parks" -fid 1

To get this:

Layer name: city_of_austin_parks
OGRFeature(city_of_austin_parks):1
  PARK_ID (Integer) = 104
  PARK_NAME (String) = Brush Square Triangle
  PARK_TYPE (String) = Planting Strips/Triangles
  PARK_ADDRE (String) = 4621 Lake Vw
  PARK_STATU (String) = Undeveloped
  POLYGON_PA (Integer) = 1
  ACRES (Real) =          0.73052860
  CREATED_BY (String) = (null)
  CREATED_DA (Date) = (null)
  MODIFIED_B (String) = ahardy
  MODIFIED_D (Date) = 2009/05/01
  PARK_LABEL (String) = Brush Square Triangle
  SHAPE_AREA (Real) =   31821.82560110000
  SHAPE_LEN (Real) =     734.51993732000
  POLYGON ((3106098.239940226078033 10093798.339851051568985,3106113.222849890589714 10093848.298412546515465,3106151.089900150895119 10093973.729920208454132,3106155.100062727928162 10093983.899847373366356,3106161.109893232584 10093993.039920955896378,3106168.850035235285759 10094000.760049879550934,3106177.99995131790638 10094006.729854211211205,3106188.180049061775208 10094010.720003709197044,3106198.959883153438568 10094012.539881959557533,3106209.889979392290115 10094012.119935303926468,3106220.49986632168293 10094009.479848712682724,3106230.340069726109505 10094004.729858219623566,3106274.828169733285904 10093976.966790378093719,3106335.08986023068428 10093939.359910219907761,3106341.970095813274384 10093932.290042459964752,3106346.169890567660332 10093923.370112791657448,3106347.240098401904106 10093913.140146374702454,3106345.050142154097557 10093903.960046634078026,3106339.859863817691803 10093895.589984625577927,3106332.220115318894386 10093889.360010206699371,3106233.553645983338356 10093839.04679062962532,3106160.889421224594116 10093801.992730885744095,3106123.35012623667717 10093782.850052624940872,3106116.520087391138077 10093780.649925798177719,3106109.400022894144058 10093781.51016029715538,3106103.299969479441643 10093785.269995301961899,3106099.340003654360771 10093791.249970212578773,3106098.239940226078033 10093798.339851051568985))

Using that -q flag to suppress a lot of extra output makes it easier to see the contents of the first feature. We'll come back to these fields in Part II.

Conclusion

That's the end of part 1! If you're feeling super confident, use the ogrinfo documentation to discover even more possibilities.

When you're ready, continue with Intro to OGR Part II: Creating New Data.

By @Sara Safavi