Skip to content

plotting polygon shapefiles

markconnolly edited this page Sep 30, 2010 · 24 revisions

Plot shapefiles

Goal

Use ggplot2 to plot polygons contained in a shapefile.

Packages

require("rgdal") # requires sp, will use proj.4 if installed
require("maptools")
require("ggplot2")
gpclibPermit() # required for fortify method

Introduction

A shapefile is a group of files containing feature geometry and feature attribute data, as specified by Environmental Systems Research Institute, Inc. (ESRI).  In this example, we are interested in a shapefile that contains the geometry type polygons.  The example shapefile is eco_l3_ut, which contains the ecoregions of Utah.  The group files comprises

  • eco_l3_ut.shp
  • eco_l3_ut.dbf
  • eco_l3_ut.shx
  • eco_l3_ut.prj
  • eco_l3_ut.sbn 
  • eco_l3_ut.sbx

The .shp is the main file and contains feature geometry.  The .dbf file contains the attributes of the feature.  The elements of the two files are linked by their offsets in the file: the first geometric feature (offset 0 in the shp) has its attributes in the first attribute tuple (offset 0 in the dbf file).  There is a one-to-one relationship between geometry tuples and attribute tuples.  This structure will be key when preparing shapefile contents for ggplot2.

The .prj file is the projection of the geometry components.  Using the projection file is often critical when dealing with multiple shapefiles having different projections.  rgdal will use this file to set projection information when creating a spatial object from a shapefile (assuming the .prj file exists or the CRS string is given).  The information can be used to re-project spatial objects as needed.  See the rgdal and sp documentation for details.

Preparation

The shapefile contents are loaded as a spatial object and transformed for presentation by ggplot2.

utah = readOGR(dsn=".", layer="eco_l3_ut")
utah@data$id = rownames(utah@data)
utah.points = fortify.SpatialPolygonsDataFrame(utah, region="id")
utah.df = join(utah.points, utah@data, by="id")

Notes

Object utah is an instance of class sp::SpatialPolygonsDataFrame, which holds polygons with attributes.  Any spatial manipulation (such as re-projection) is performed on this object.

ggplot2 will render the polygons using geom_polygon, which expects a standard data frame containing polygon verticies and attribute data.  The information contained in utah must be transformed into such a standard data frame.

utah slots are

  • bbox
  • data
  • plotOrder
  • polygons
  • proj4string

The slots of primary interest are utah@data (the geometry attributes) and utah@polygons (the geometry features).  The attributes and features are related to each other by order, so the relationship is implicit.  Relationship attributes need to be explicit so that the geometry attributes can be joined with the geometry features.

utah@data$id = rownames(utah@data)

explicitly identifies attribute rows by the .dbf offset

utah.points = fortify.SpatialPolygonsDataFrame(utah, region="id")

melts the polygons into points, tags each point with the id value of the corresponding attribute row, and tags each point with values from the polygon from which the point was derived.

region selects the feature attribute name to tag each row in the new data frame.  Each geometry feature offset gets the value from the feature attribute at the same offset.  In this case, it is utah@data$id.  The attribute utah.points$id contains the mapped value.  If a different attribute were used (region="LEVEL3"), utah.points$id would contain the corresponding value of utah@data$LEVEL3.

utah.points attributes include spatial coordinates (long and lat), group (coordinates having the same group belongs to the same polygon), and id (each id identifies a feature attribute tuple). 

Another of the remaining attributes is hole.  Each polygon in utah@polygons is defined by one or more polygons. When defined by multiple polygons, the sub-polygons may be disjunct islands or holes. A hole part should remove geometry from its utah@polgons instance.  In utah.points is created and individual polygons are melted, the hole state is retained, but this information is not used by ggplot2. Depending on the rendering sequence of the polygons, holes may overlay and obscure other polygons. That is, they are not treated as holes but as non-hole polygons.

utah.df = join(utah.points, utah@data, by="id")

joins the points to their corresponding attributes and finalizes the data preparation.

Plotting

ggplot(utah.df) + 
  aes(long,lat,group=group,fill=LEVEL3_NAM) + 
  geom_polygon() +
  geom_path(color="white") +
  coord_equal() +
  scale_fill_brewer("Utah Ecoregion")

Problems

utah.points = fortify.SpatialPolygonsDataFrame(utah, region="LEVEL3")
names(utah.points)[which(names(utah.points)=="id")] = "LEVEL3"
utah.df = join(utah.points, utah@data, by="LEVEL3")
ggplot(utah.df) + 
  aes(long,lat,group=group,fill=LEVEL3_NAM) + 
  geom_polygon() +
  geom_path(color="white") +
  coord_equal() +
  scale_fill_brewer("Utah Ecoregion")

In this plot, one of the polygons in the ecoregion Wasatch and Uinta Mountains is obscured by a hole polygon from ecoregion Colorado Plateaus. In the original SpatialFramesDataFrame, we got lucky in that Wasatch and Uinta Mountains were defined as three separate polygon instances in utah@polygons rather than as a single instance of multiple polygons. We also were lucky that the Wasatch and Uinta Mountains polygon inside Colorado Plateaus appears after the Colorado Plateaus polygon. Using id as the region identifier maintained this order, and the plot was properly rendered. If the shapefile had been produced in another way, we might not have been so lucky and have ended up with an obscuring hole. The bottom line is plots need to be checked for correctness and some work-around might be needed.

Workaround: forced layering of geom_plot

Force-order the plot to insure obscured polygons are moved above their obscurer.
utah.points = fortify.SpatialPolygonsDataFrame(utah, region="LEVEL3")
names(utah.points)[which(names(utah.points)=="id")] = "LEVEL3"
utah.df = join(utah.points, utah@data, by="LEVEL3")
ggplot(utah.df) + 
  aes(long,lat,group=group,fill=LEVEL3_NAM) + 
  geom_polygon(data=subset(utah.df,LEVEL3!=19)) +
  geom_polygon(data=subset(utah.df,LEVEL3==19)) +
  geom_path(color="white") +
  coord_equal() +
  scale_fill_brewer("Utah Ecoregion")

Note: The ggplot2 wiki is no longer maintained, please use the ggplot2 website instead!

Clone this wiki locally