── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
✖ purrr::map() masks maps::map()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Baja California
Learning Objectives
This topics is the first
Describe the importance of Ellipsoids & Datum in spatial data.
Use both sf & ggplot in visualizing point data.
Be able to transform point data from one projection to another.
Ellipsoids
Unless you are in PHYS 101, the earth is not a perfect sphere (😉). It is an irregularly shaped object that we need to be able to characterize if we are going to develop a system of placing points onto it and doing things such as measuring distance, finding watersheds, or defining boundaries.
There has been a long history of ellipsoid research, all of which has been sought to increase our ability to map and move across the earth. The following table gives some historical and contemporary ellipsoids.
Ellipsoid
Equatorial Radius (m)
Polar Radius (m)
Used
Maupertuis (1738)
6,397,300
6,363,806.283
France
Plessis (1817)
6,376,523.0
6,355,862.9333
France
Everest (1830)
6,377,299.365
6,356,098.359
India
Everest 1830 Modified (1967)
6,377,304.063
6,356,103.0390
West Malaysia & Singapore
Everest 1830 (1967 Definition)
6,377,298.556
6,356,097.550
Brunei & East Malaysia
Airy (1830)
6,377,563.396
6,356,256.909
Britain
Bessel (1841)
6,377,397.155
6,356,078.963
Europe, Japan
Clarke (1866)
6,378,206.4
6,356,583.8
North America
Clarke (1878)
6,378,190
6,356,456
North America
Clarke (1880)
6,378,249.145
6,356,514.870
France, Africa
Helmert (1906)
6,378,200
6,356,818.17
Egypt
Hayford (1910)
6,378,388
6,356,911.946
USA
International (1924)
6,378,388
6,356,911.946
Europe
Krassovsky (1940)
6,378,245
6,356,863.019
USSR, Russia, Romania
WGS66 (1966)
6,378,145
6,356,759.769
USA/DoD
Australian National (1966)
6,378,160
6,356,774.719
Australia
New International (1967)
6,378,157.5
6,356,772.2
GRS-67 (1967)
6,378,160
6,356,774.516
South American (1969)
6,378,160
6,356,774.719
South America
WGS-72 (1972)
6,378,135
6,356,750.52
USA/DoD
GRS-80 (1979)
6,378,137
6,356,752.3141
Global ITRS
WGS-84 (1984)
6,378,137
6,356,752.3142
Global GPS
IERS (1989)
6,378,136
6,356,751.302
IERS (2003)
6,378,136.6
6,356,751.9
The most common ones you will probably run across include GRS80/NAD83 (derived from satellite measurements of the distance of the surface to the core of the planet ) and WGS-84 (an ellipsoid based upon GPS).
Example Data
To examine the differences between ellipsoids, let’s load in some data first. Here are some point data that can be interpreted as polygons and represent the lower 48 states of the US.
states <-map_data("state")head( states )
long lat group order region subregion
1 -87.46201 30.38968 1 1 alabama <NA>
2 -87.48493 30.37249 1 2 alabama <NA>
3 -87.52503 30.37249 1 3 alabama <NA>
4 -87.53076 30.33239 1 4 alabama <NA>
5 -87.57087 30.32665 1 5 alabama <NA>
6 -87.58806 30.32665 1 6 alabama <NA>
Each row is a point that is associated with a group (in this case the state) and is plot in a specific order (to make the outline of the state). There are 15,537 points required to make the plot, with the following 49 regions.
Fortunately for us, our old friend ggplot has a bit of magic that can do this kind of plotting for us.
library( ggplot2 )ggplot( states, aes( x = long, y = lat,group = group ) ) +geom_polygon( fill ="lightgray", color ="black", lwd =0.25) +theme_void() -> p
Azimuth Projections
An Azimuth Projection is one that is formed by a 2-dimensional plane that is tangential to the surface of the earth at example one point. This point may be polar (north or south pole) or oblique (e.g., over Richmond, Virginia).
Azequidistant
We can apply different ellipsoids to the map when we plot it by adjusting the coordinate space it is plot within using the coord_map() modification. For a whole list of available projections, see ?mapproject.
p +coord_map( "azequalarea")
Cylindrical Projection
A cylindrical projection is one where a cylinder is wrapped around the earth creating straight lines for all parallel away from the equator.
Cylindrical Projection
p +coord_map("cylindrical")
Conic Projections
Conic projections are symmetric around the prime meridian and all parallels are segments of conecntric circles.
Conic Projection
p +coord_map( "conic", lat0 =30)
Datum
Once we have an ellipsoid model to work with we must define a DATUM type that will represent the coordiante system used. Two common DATUM types include:
Longitude & Latitude - The East/West & North/South position on the surface of the earth.
Prime Meridian (0° Longitude) passes thorugh the Royal Observatory in Greenwich England, with positive values of longitude to the east and negative to the west.
Equator (0° Latitude) and is defined as the point on the planet where both northern and southern hemisphers have equal amounts of day and night at the equinox (Sept. 21 & March 21).
Universal Trans Mercator - A division of the earth into 60 zones (~6°longitude each, labeled 01 - 60) and 20 bands each of which is ~8° latitude (labeled C-X excluding I & O with A & B dividing up Antartica). See image here.
Coordinates include Zone & band designation as well as coordinates in Easting and Northing (planar coordinates within the zone) measured in meters.
Richmond, Virginia: 18S 282051 4156899
⚠️
You must set both the ellipsoid and datum to be EXACTLY THE SAME for all of your data before you can do any work with it. If they are not on the same lumpy bumpy planet or in the same coordinate system, you will be screwed (that is a technical term).
Defining Projections
Projections are a combination of the underlying elipse as well as the definition of the datum. There are literally thousands of recognized projections, each of which has to be able to be sufficiently defined such that we can convert from one recognized projection to another2.
One of the repositories for these projections can be found at epsg.io.
epsg.io
The EPSG Geodetic Parameter Dataset (also known as the EPGS3 registry) is a publc registry of datums, spatial reference systems, and earth elipsoids. Each item is assigned a specific EPSG code
The precise definitions of projections come in several different formats, some of which include:
If you work with ESRI software, when you have a projected shapefile, one of the several files that go with that shapefile is the .prj file which will contain the ESRI WKT below.
Always let the underlying software make any changes to projection information. There is a lot of difficult conversions that need to happen to reproject from one CRS to another.
🌎
Point Data
To start off, we will load in some data from the bark beetle, Araptus attenuatus, a Sonoran Desert endemic parasite that lives within the plant Euphorbia lomelii.
Araptus attenuatus
Euphorbia lomelii
As part of some work that we have done on these species, we have looked at the relationship between habitat suitability and sex ratio bias. The life history for this beetle is such that males will establish home by burrowing in the senescing tissues of the host plant. Once established, feamles are attracted via phermones.
url <-"https://raw.githubusercontent.com/dyerlab/ENVS-Lectures/master/data/Araptus_Disperal_Bias.csv"read_csv( url ) |>select( Site, Longitude, Latitude, everything() ) |>arrange( Latitude ) -> data
Rows: 31 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Site
dbl (8): Males, Females, Suitability, MFRatio, GenVarArapat, GenVarEuphli, L...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary( data )
Site Longitude Latitude Males
Length:31 Min. :-114.3 Min. :23.29 Min. : 9.00
Class :character 1st Qu.:-113.1 1st Qu.:24.95 1st Qu.:16.00
Mode :character Median :-112.0 Median :26.64 Median :21.00
Mean :-112.0 Mean :26.44 Mean :25.68
3rd Qu.:-110.8 3rd Qu.:27.78 3rd Qu.:31.50
Max. :-109.3 Max. :29.33 Max. :64.00
Females Suitability MFRatio GenVarArapat
Min. : 5.00 Min. :0.0563 Min. :0.5938 Min. :0.0500
1st Qu.:15.50 1st Qu.:0.2732 1st Qu.:0.8778 1st Qu.:0.1392
Median :21.00 Median :0.3975 Median :1.1200 Median :0.2002
Mean :23.52 Mean :0.4276 Mean :1.1598 Mean :0.2006
3rd Qu.:29.00 3rd Qu.:0.5442 3rd Qu.:1.3618 3rd Qu.:0.2592
Max. :63.00 Max. :0.9019 Max. :2.2000 Max. :0.3379
GenVarEuphli
Min. :0.0500
1st Qu.:0.1777
Median :0.2171
Mean :0.2203
3rd Qu.:0.2517
Max. :0.5122
We can plot these points using the normal ggplot functions (map overlays below).
Make sure to click on one of the markers and see the popup information that we added en route via that mutate and some HTML code.
In this case, we are using the latitude and longitude as numerical values of which the leaflet library is able to intrepret properly. However, just like we did for date-like data, we can convert these into a geographically relevant data type that knows a lot about geospatial processes rather than keeping it as a numeric value that we “assume” will work properly. For this we will use the sf library.
sf Objects
Simple Features (hereafter abbreviated as sf) are an open standard developed by the Open Geospatial Consortium (OGC). They define the following basic types:
POINT
LINESTRING
POLYGON
MULTIPOINT
MULTILINESTRING
MULTIPOLYGON
GEOMETRYCOLLECTION
Each of these basic types can be represented within a single column of a data.frame. To do this, we need to tell the conversion function st_as_sf() which columns to consider as the datum and which ellipsoid to use.
library( sf )data |>st_as_sf( coords=c("Longitude","Latitude"),crs =4326 ) -> datahead( data )
Geometry set for 1 feature
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -111.3417 ymin: 26.37741 xmax: -111.3417 ymax: 26.37741
Geodetic CRS: WGS 84
POINT (-111.3417 26.37741)
and the area enclosed by all the points (for various units).
library( units )
udunits database from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/units/share/udunits/udunits2.xml
hull |>st_area() |>set_units( km^2 )
122130.5 [km^2]
Reprojecting
In addition to the operations above, properly created sf objects can easily be projected from one CRS into another (epsg 6372 is a common projection covering Mexico based upon the GRS80 elipsoid and the latest ITRF2008 datum standard based on the meter)4.
Again, do this first to all your data to make sure it is put into a proper projection (and most of your headaches will disappear).
Plotting sf Objects
Analogous to the duality between built-in R plotting and ggplot approaches, we can use either of these frameworks to plot sf objects.
As built-in objects, a sf data set that has a geometry coordinate is intrinsically linked to all the other data columns. If we plot the entire data frame, we see that for each non-geometry data column, we create an individual plot.
plot( data )
The data with the data.frame can be accessed as normal.
plot( data$Suitability )
But if we plot it using the square brackets and names of dat columns, we can link the geometry column to it and plot it as a spatial representation of those data (and adorn it with the normal plot() upgrades accordingly).
plot( data["Suitability"], pch=16, cex=2)
Perhaps not surprisingly, ggplot() also works the same way, however, the geospatial coordiantes for the plot aare taken care of using geom_sf() and you are left with definining which of the data columns you want to put into the plot as a component of the aes() definition.
ggplot( data ) +geom_sf_text( aes(label=Site) ) +theme_void() +coord_map()
Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
give correct results for longitude/latitude data
Map Overlays
We can go out and grab a map background to overlay these plots onto, giving more context (similiarly to how we did this above using leaflet).
The map_data() function is part of ggplot() and produces a data.frame object where each row is a coordinate used to draw polygons (see narrative on polygons for more information).
map_data("world") |>filter( region =="Mexico") -> maphead( map )
long lat group order region subregion
1 -91.68369 18.67734 970 60731 Mexico Isla del Carmen
2 -91.79614 18.65420 970 60732 Mexico Isla del Carmen
3 -91.81612 18.67588 970 60733 Mexico Isla del Carmen
4 -91.58911 18.77803 970 60734 Mexico Isla del Carmen
5 -91.55029 18.77368 970 60735 Mexico Isla del Carmen
6 -91.53672 18.76001 970 60736 Mexico Isla del Carmen
In ggplot there is a geom_polygon() function that takes these series of coordiantes and draws them properly over which we can lay down the sf object.
However, notice that the boundary boxes for the data and the map are vastly different (the underlying map is much bigger).
cbind( Data =st_bbox( data ), Map =c(min(map$long), min(map$lat), max(map$long), max(map$lat) ) )
In a normal ggplot() display we could use + xlim() + ylim() but since we are combining both geom_polygon and geom_sf, we are required to do this in the coord_sf() function to make it work correctly5.