Skip to content

plotting polygon shapefiles

markconnolly edited this page Sep 29, 2010 · 24 revisions

Plot shapefiles

Goal

Use ggplot2 to plot polygons contained in a shapefile.

Packages

require("rgdal") # requires sp, will use proj.4 if installed
require("maptools")
require("ggplot2")
gpclibPermit() # required for fortity method

Introduction

A shapefile is a group of files containing feature geometry and feature attribute data, as specified by Environmental Systems Research Institute, Inc. (ESRI).  In this example, we are interested in a shapefile that contains the geometry type polygons.  The example shapefile is eco_l3_ut, which contains the ecoregions of Utah.  The group files comprises

  • eco_l3_ut.shp
  • eco_l3_ut.dbf
  • eco_l3_ut.shx
  • eco_l3_ut.prj
  • eco_l3_ut.sbn 
  • eco_l3_ut.sbx

The .shp is the main file and contains feature geometry.  The .dbf file contains the attributes of the feature.  The elements of the two files are linked by their offsets in the file: the first geometric feature (offset 0 in the shp) has its attributes in the first attribute tuple (offset 0 in the dbf file).  There is a one-to-one relationship between geometry tuples and attribute tuples.  This structure will be key when preparing shapefile contents for ggplot2.

The .prj file is the projection of the geometry components.  Using the projection file is often critical when dealing with multiple shapefiles having different projections.  rgdal will use this file to set projection information when creating a spatial object from a shapefile (assuming the .prj file exists or the CRS string is given).  The information can be used to re-project spatial objects as needed.  See the rgdal and sp documentation for details.

Preparation

The shapefile contents are loaded as a spatial object and transformed for presentation by ggplot2.

utah = readOGR(dsn=".", # expects current direct
               layer="eco_l3_ut") # OGR name for shapefile
  

The utah object is an instance of class sp::SpatialPolygonsDataFrame, which holds polygons with attributes.  Any spatial manipulation (such as re-projection) is performed on this object.

ggplot2 will render the polygons using geom_polygon, which expects a standard data frame containing polygon verticies and attribute data.  The information contained in utah must be transformed into such a standard data frame.

utah slots are

  • bbox
  • data
  • plotOrder
  • polygons
  • proj4string

The slots of primary interest are data (the geometry attributes) and polygons (the geometry features).  The attributes and features are related to each other by order, so the relationship is implicit.  The attributes need to be keyed explicitly to be joined to the polygon elements.

utah@data$id = rownames(utah@data)

The polygons need to be flattened, enriched, and tagged for joining to the attributes.  The ggplot2::fortify.SpatialPolygonsDataFrame method accomplishes this.

utah.polys = fortify.SpatialPolygonsDataFrame(utah, region="id")

region selects the feature attribute name to tag each row in the new data frame.  Each geometry feature offset gets the value from the feature attribute at the same offset.  In this case, it is utah@data$id.  The attribute utah.polys$id contains the mapped value.  If a different attribute were used (region="LEVEL3"), utah.polys$id would contain the corresponding value of utah@data$LEVEL3.

The full result of the method is a data frame containing all of the polygon verticies.  Each polygon in utah@polygons may be made up of more than one polygon.  The vertices of these multipart polygons also appear in the data frame.

utah.polys attributes include spatial coordinates (long and lat), group (coordinates having the same group belongs to the same polygon), and id (each id identifies an attribute tuple). 

Another of the remaining attributes is hole.  A hole is a polygon that represents a cut out of the polygon it is contained by.  The attribute hole is a logical attribute, and tuples in the utah.polys data frame are either hole==FALSE or hole==TRUE.  Holes may cause plotting problems.

With a shared attribute of id, the polygon coordinates can be joined with their atteibutes to complete preparation.

utah.df = join(utah.polys, utah@data, by="id")

Plotting

ggplot(utah.df) + 
  aes(long,lat,group=group,fill=LEVEL3_NAM) + 
  geom_polygon() +
  geom_path(color="white") +
  coord_equal() +
  scale_fill_brewer("Utah Ecoregion")

Drawing sequence problems

ggplot2 draws the plots in the order of the id.  In this example, the sequence is happily an order that results in a correctly rendered plot. 

Note: The ggplot2 wiki is no longer maintained, please use the ggplot2 website instead!

Clone this wiki locally