sf::st_as_sf(data.table::rbindllist(<list_of_sf>))
is a fast
alternative to do.call(rbind, <list_of_sf>)
for
binding list
s of simple features objects (sf
)
to single sf
. However, there are some pitfalls when
rbindlist()
is applied to list
s of
sf
. These are explored, and some workarounds are
presented.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(data.table)
#>
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#>
#> between, first, last
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
rbindlist()
and
do.call(rbind, )
bbox
We create a list
of sf
objects:
nc <- st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE)
list_of_sf <- lapply(seq_len(nrow(nc)), function(x) nc[x, ])
Binding such a list
of sf
with
do.call(rbind, )
to a single sf
object works
easily (as long as the list
is rather small):
nc_rbind <- do.call(rbind, list_of_sf)
And will return the expected object of the class sf
:
all.equal(nc_rbind, nc)
#> [1] TRUE
What’s different when rbindlist()
is used instead of
do.call(rbind, )
?
First of all, rbindlist()
on its own does not bind a
list
of sf
to an object of the class
sf
:
Thus, subsequently the return of rbindlist()
needs to be
converted into an object of the the class sf
:
Is the return of the pipe rbindlist() %>% st_as_sf()
equivalent to the one of do.call(rbind, )
?
all.equal(nc_rbindlist, nc)
#> [1] "target is data.table, current is sf"
No, not really! The pipe-output has inherited the class
data.table
:
Will omitting the class data.table
eliminate the
difference?
class(nc_rbindlist) <- c("sf", "data.frame")
all.equal(nc_rbindlist, nc)
#> [1] "Component \"geom\": Attributes: < Component \"bbox\": Mean relative difference: 0.05380136 >"
There’s also an issue with bbox
of the geometry
column:
Instead of returning the correct bbox
for the whole
geometry column, data.table::rbindlist()
simply copies the
bbox
of the first list
ed sf
object:
A simple
trick solves this issue inherent to sf
-objects compiled
with data.table::rbindlist()
:
sf_former_dt <- sf_former_dt[1:nrow(sf_former_dt), ]
# or, preferable in a programming context:
sf_former_dt <- sf_former_dt[seq_len(nrow(sf_former_dt)), ]
We apply the trick …
… check if it worked:
all.equal(nc_rbindlist, nc)
#> [1] TRUE
list
s containing sf
-objects with different
CRS are treated quite differently by do.call(rbind, )
and
rbindlist() %>% st_as_sf()
. To demonstrate that, we
create such a list
:
nc <- st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE)
nc_3857 <- st_transform(nc, 3857)
list_of_sf_with_unequal_crs <- list(nc, nc_3857)
rbind()
does not bind sf
-objects with
unequal CRS and throws a corresponding error-message:
do.call(rbind, list_of_sf_with_unequal_crs)
#> Error: arguments have different crs
Whereas rbindlist()
doesn’t seem to be affected:
The returned object of the classes sf
and
data.table
has a CRS which is the same as the one of first
list
ed sf
-object:
If we reverse the sequence of elements of the list
of
sf
-objects with unequal CRS the newly first placed
sf
-object dictates the CRS of the by
rbindlist() %>% st_as_sf
bound single
sf
-object:
sf_dt <- list_of_sf_with_unequal_crs %>% rev() %>% rbindlist() %>% st_as_sf()
st_crs(sf_dt)$epsg
#> [1] 3857
st_crs(sf_dt) == st_crs(nc_3857)
#> [1] TRUE
The maloperation of rbindlist()
regarding
list
s of sf
-objects with different CRS can’t
be handled after the fact. Thus, feeding such a list
to
rbindlist()
must be avoided. Hence prior checking if a
list
of sf
contains more than 1 CRS is
crucial:
list(nc, nc_3857) %>% lapply(st_crs) %>% n_distinct()
#> [1] 2
geometry_type
s
A list
with sf
-objects having geometry
columns of different geometry_type
s:
l_diff_geom_type <- list(nc[c(17, 56), ], st_cast(nc[4, ], "POLYGON", warn = FALSE))
sapply(l_diff_geom_type, st_geometry_type, by_geometry = FALSE) %>% unique()
#> [1] MULTIPOLYGON POLYGON
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
rbind()
can easily stack sf
-objects with
geometry columns of different geometry_type
s and the
returned geometry_type
is GEOMETRY
:
sf_do.call <- do.call(rbind, l_diff_geom_type)
st_geometry_type(sf_do.call, by_geometry = FALSE)
#> [1] GEOMETRY
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
But rbindlist()
can’t handle a list
containing different geometry_type
s:
rbindlist(l_diff_geom_type)
#> Error in rbindlist(l_diff_geom_type): Class attribute on column 15 of item 2 does not match with column 15 of item 1.
In order to make such a list
workable for rbindlist()
, the geometry_type
s
have to be homogenized:
l_homogenized_geom_type <- lapply(l_diff_geom_type, st_cast, to = "GEOMETRY", warn = FALSE)
sapply(l_homogenized_geom_type, st_geometry_type, by_geometry = FALSE) %>% unique()
#> [1] GEOMETRY
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
Once the list
ed sf
-objects have the same
geometry_type
, rbindlist()
can stack them:
sf_rbindlist <- rbindlist(l_homogenized_geom_type) %>% st_as_sf()
st_geometry_type(sf_rbindlist, by_geometry = FALSE)
#> [1] GEOMETRY
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
Note that the above demonstrated way of
homogenizing geometry_type
s isn’t very time efficient for
large list
s of sf
-objects. Luckily there’s a
faster alternative:
l <- l_diff_geom_type # for the sake of code readability, copy list as short-named object
for (i in seq_along(l)) {
class(l[[i]][[attr(l[[i]], "sf_column")]])[[1]] <- "sfc_GEOMETRY"
}
We check if the two homogenizing methods are equivalent:
all.equal(l_homogenized_geom_type, l)
#> [1] TRUE
The use of data.table::rbindlist()
as a fast alternative
to do.call(rbind, )
for binding list
s of
sf
-objects to a single sf
-object is justified
in situations when do.call(rbind, )
is too slow. It does,
however, require appropriate pre- and post-treatment:
list
of sf
-objects contains
more than 1 distinct CRS:
list_of_sf %>% lapply(st_crs) %>% n_distinct()
If so, CRS-unification needs to be done before moving on.
?st_transform
?st_crs
list
ed sf
-objects have
different geometry_type
s:If so, homogenize them:
list_of_sf <- lapply(list_of_sf, st_cast, to = "GEOMETRY", warn = FALSE)
# or more advisable because faster (for large lists):
for (i in seq_along(list_of_sf)) {
class(list_of_sf[[i]][[attr(list_of_sf[[i]], "sf_column")]])[[1]] <- "sfc_GEOMETRY"
}
list
of sf
-objects to a single
sf
-object (which is also a data.table
):bbox
:
sf_object <- sf_object[1:nrow(sf_object), ]
# or, preferable in a programming context:
sf_object <- sf_object[seq_len(nrow(sf_object)), ]
data.table
:Note that those five steps are integrated in the function
sfhelpers::st_rbindlist()
enabling a painless and fast
conversion of a list
of sf
objects to a single
sf
object. Moreover st_rbindlist()
has
argument options capable of handling list
s of
sf
objects that differ by the names and positions of their
geometry and other attribute columns.
?sfhelpers::st_rbindlist