sf::st_as_sf(data.table::rbindllist(<list_of_sf>))
is a fast
alternative to do.call(rbind, <list_of_sf>) for
binding lists of simple features objects (sf)
to single sf. However, there are some pitfalls when
rbindlist() is applied to lists of
sf. These are explored, and some workarounds are
presented.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(data.table)
#>
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#>
#> between, first, last
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUErbindlist() and
do.call(rbind, )
bbox
We create a list of sf objects:
nc <- st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE)
list_of_sf <- lapply(seq_len(nrow(nc)), function(x) nc[x, ])Binding such a list of sf with
do.call(rbind, ) to a single sf object works
easily (as long as the list is rather small):
nc_rbind <- do.call(rbind, list_of_sf)And will return the expected object of the class sf:
all.equal(nc_rbind, nc)
#> [1] TRUEWhat’s different when rbindlist() is used instead of
do.call(rbind, )?
First of all, rbindlist() on its own does not bind a
list of sf to an object of the class
sf:
Thus, subsequently the return of rbindlist() needs to be
converted into an object of the the class sf:
Is the return of the pipe rbindlist() %>% st_as_sf()
equivalent to the one of do.call(rbind, )?
all.equal(nc_rbindlist, nc)
#> [1] "target is data.table, current is sf"No, not really! The pipe-output has inherited the class
data.table:
Will omitting the class data.table eliminate the
difference?
class(nc_rbindlist) <- c("sf", "data.frame")
all.equal(nc_rbindlist, nc)
#> [1] "Component \"geom\": Attributes: < Component \"bbox\": Mean relative difference: 0.05380136 >"There’s also an issue with bbox of the geometry
column:
Instead of returning the correct bbox for the whole
geometry column, data.table::rbindlist() simply copies the
bbox of the first listed sf
object:
A simple
trick solves this issue inherent to sf-objects compiled
with data.table::rbindlist():
sf_former_dt <- sf_former_dt[1:nrow(sf_former_dt), ]
# or, preferable in a programming context:
sf_former_dt <- sf_former_dt[seq_len(nrow(sf_former_dt)), ]We apply the trick …
… check if it worked:
all.equal(nc_rbindlist, nc)
#> [1] TRUElists containing sf-objects with different
CRS are treated quite differently by do.call(rbind, ) and
rbindlist() %>% st_as_sf(). To demonstrate that, we
create such a list:
nc <- st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE)
nc_3857 <- st_transform(nc, 3857)
list_of_sf_with_unequal_crs <- list(nc, nc_3857)rbind() does not bind sf-objects with
unequal CRS and throws a corresponding error-message:
do.call(rbind, list_of_sf_with_unequal_crs)
#> Error: arguments have different crsWhereas rbindlist() doesn’t seem to be affected:
The returned object of the classes sf and
data.table has a CRS which is the same as the one of first
listed sf-object:
If we reverse the sequence of elements of the list of
sf-objects with unequal CRS the newly first placed
sf-object dictates the CRS of the by
rbindlist() %>% st_as_sf bound single
sf-object:
sf_dt <- list_of_sf_with_unequal_crs %>% rev() %>% rbindlist() %>% st_as_sf()
st_crs(sf_dt)$epsg
#> [1] 3857
st_crs(sf_dt) == st_crs(nc_3857)
#> [1] TRUEThe maloperation of rbindlist() regarding
lists of sf-objects with different CRS can’t
be handled after the fact. Thus, feeding such a list to
rbindlist() must be avoided. Hence prior checking if a
list of sf contains more than 1 CRS is
crucial:
list(nc, nc_3857) %>% lapply(st_crs) %>% n_distinct()
#> [1] 2geometry_types
A list with sf-objects having geometry
columns of different geometry_types:
l_diff_geom_type <- list(nc[c(17, 56), ], st_cast(nc[4, ], "POLYGON", warn = FALSE))
sapply(l_diff_geom_type, st_geometry_type, by_geometry = FALSE) %>% unique()
#> [1] MULTIPOLYGON POLYGON
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLErbind() can easily stack sf-objects with
geometry columns of different geometry_types and the
returned geometry_type is GEOMETRY:
sf_do.call <- do.call(rbind, l_diff_geom_type)
st_geometry_type(sf_do.call, by_geometry = FALSE)
#> [1] GEOMETRY
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLEBut rbindlist() can’t handle a list
containing different geometry_types:
rbindlist(l_diff_geom_type)
#> Error in rbindlist(l_diff_geom_type): Class attribute on column 15 of item 2 does not match with column 15 of item 1. In order to make such a list
workable for rbindlist(), the geometry_types
have to be homogenized:
l_homogenized_geom_type <- lapply(l_diff_geom_type, st_cast, to = "GEOMETRY", warn = FALSE)
sapply(l_homogenized_geom_type, st_geometry_type, by_geometry = FALSE) %>% unique()
#> [1] GEOMETRY
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLEOnce the listed sf-objects have the same
geometry_type, rbindlist() can stack them:
sf_rbindlist <- rbindlist(l_homogenized_geom_type) %>% st_as_sf()
st_geometry_type(sf_rbindlist, by_geometry = FALSE)
#> [1] GEOMETRY
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLENote that the above demonstrated way of
homogenizing geometry_types isn’t very time efficient for
large lists of sf-objects. Luckily there’s a
faster alternative:
l <- l_diff_geom_type # for the sake of code readability, copy list as short-named object
for (i in seq_along(l)) {
class(l[[i]][[attr(l[[i]], "sf_column")]])[[1]] <- "sfc_GEOMETRY"
}We check if the two homogenizing methods are equivalent:
all.equal(l_homogenized_geom_type, l)
#> [1] TRUEThe use of data.table::rbindlist() as a fast alternative
to do.call(rbind, ) for binding lists of
sf-objects to a single sf-object is justified
in situations when do.call(rbind, ) is too slow. It does,
however, require appropriate pre- and post-treatment:
list of sf-objects contains
more than 1 distinct CRS:
list_of_sf %>% lapply(st_crs) %>% n_distinct()If so, CRS-unification needs to be done before moving on.
?st_transform
?st_crslisted sf-objects have
different geometry_types:If so, homogenize them:
list_of_sf <- lapply(list_of_sf, st_cast, to = "GEOMETRY", warn = FALSE)
# or more advisable because faster (for large lists):
for (i in seq_along(list_of_sf)) {
class(list_of_sf[[i]][[attr(list_of_sf[[i]], "sf_column")]])[[1]] <- "sfc_GEOMETRY"
}list of sf-objects to a single
sf-object (which is also a data.table):bbox:
sf_object <- sf_object[1:nrow(sf_object), ]
# or, preferable in a programming context:
sf_object <- sf_object[seq_len(nrow(sf_object)), ]data.table:Note that those five steps are integrated in the function
sfhelpers::st_rbindlist() enabling a painless and fast
conversion of a list of sf objects to a single
sf object. Moreover st_rbindlist() has
argument options capable of handling lists of
sf objects that differ by the names and positions of their
geometry and other attribute columns.
?sfhelpers::st_rbindlist