Tutorial
The workflow for processing point-cloud data depends on the goals of the analysis. Here, we give a few examples of tasks that could be accomplished with PointClouds.jl.
- Finding & loading LAS data
- Accessing point attributes
- Coordinates
- Accessing & filtering points
- In-memory processing
- Updating & writing LAS data
Finding & loading LAS data
Let’s look at the surroundings of the White House for this example. The Wikipedia page lists its coordinates as 38°53′52″N 77°02′11″W, and clicking that link we can get decimal coordinates:
julia> white_house = (38.897778, -77.036389)
(38.897778, -77.036389)
We can now search for point-cloud data for that location. First, we have to load the package:
julia> using PointClouds
Then we can use gettiles
to query ScienceBase
for data from the USGS 3D Elevation Program (3DEP)
julia> tiles = gettiles(white_house)
1-element Vector{PointClouds.DataSources.PointCloudTile}:
ScienceBase(6413c497d34eb496d1ce956e): USGS Lidar Point Cloud Sandy_Supplemental_NCR_VA_MD_DC_QL2_LiDAR 18SUJ322306
julia> extrema(tiles[1])
((38.8894538536233, -77.0469626451088), (38.903264326206, -77.0292899951434))
Unfortunately, there are multiple convention for the order of coordinates. The gettiles
function accepts coordinate tuples in (lat, lon)
order, which is the EPSG:4326 standard order for WGS 84 coordinates, even though this does not follow the typical $(x, y)$ order. It is recommended to use the keyword arguments lat
& lon
to avoid ambiguities. 3D point-cloud data handled by PointClouds.jl should always have coordinates in $(x, y, z)$-order.
We can load the data of this tile by passing it to the LAS
constructor. This will download the data to the local cache folder, if it has not been downloaded already.
julia> las = LAS(tiles[1])
16,107,898-point LAZ (v1.2, PDRF 1, 01 Jun 2015)
Source ID => 65535
Project ID => AEB2BAA1-2BEF-41FC-B9BB-BDA288E8D77B
System ID => ""
Software ID => "GeoCue LAS Updater"
X-Coordinates => 322500.0 … 323999.99
Y-Coordinates => 4.3065e6 … 4.30799999e6
Z-Coordinates => -88.88 … 767.73
Return-Counts => [1 => 13,783,924, 2 => 1,993,048, 3 => 310,661, 4 => 19,622, 5 => 643]
Extra Data => [0x00, 0x00, 0xdd, 0xcc]
Variable-Length Records
=> LASF_Projection[34735] "GeoTiff Projection Keys" (200 bytes)
=> LASF_Projection[34736] "GeoTiff double parameters" (80 bytes)
=> LASF_Projection[34737] "GeoTiff ASCII parameters" (217 bytes)
Here we can see an overview of the data contained within this LAZ file. The header properties can be accessed with “dot syntax”, although this may return the “raw data” as it is stored within the LAS header:
julia> las.source_id
0xffff
julia> las.coord_min, las.coord_max
((322500.0, 4.3065e6, -88.88), (323999.99, 4.30799999e6, 767.73))
Accessing point attributes
The LAS
behaves as a collection of PointRecord
s similar to a Vector
.
julia> length(las)
16107898
julia> las[1]
PointRecord{1}(X = 32250320, Y = 430799725, Z = 5799, intensity = 0.00134279, return = 1/2, classification = 17, flags = [right-to-left], scan angle = +17°, GPS time = 8.23926e7, user data = 1, source ID = 8997)
We see that the points have a number of attributes, the exact subset of which depends on the “point data record format” (PDRF), which is “1” in this example. Each attribute has a function to access it, which can be applied to points or (slightly more efficiently) to the LAS
:
julia> intensity(las[1])
0.0013427939269092851
julia> intensity(las, 1)
0.0013427939269092851
julia> intensity(las, 1:10)
10-element Vector{Float64}:
0.0013427939269092851
0.0001373311970702678
9.155413138017853e-5
0.00039673456931410697
0.0006103608758678569
0.00045777065690089265
0.0008545052262149996
0.000976577401388571
0.0014648661020828565
0.0020752269779507134
See “Point data records and their attributes” for details.
Coordinates
The x-/y-/z-coordinates of a single point can not be accessed quite like that:
julia> coordinates(las[1])
ERROR: MethodError: no method matching coordinates(::PointClouds.IO.PointRecord1{0})
[...]
This is because the LAS format stores coordinates as integer values that rely on a global scaling defined in the LAS header. We can access these “raw” coordinate values by explicitly asking for them:
julia> coordinates(Int, las[1])
(32250320, 430799725, 5799)
To get “real” coordinates, we can pass the LAS
and the index as separate arguments:
julia> coordinates(las, 1)
(322503.2, 4.30799725e6, 57.99)
We see here that these coordinates are very different from the coordinates of our original query. This is because this LAS file is not using WGS 84 as its coordinate reference system (CRS). We can look at the CRS data within the LAS:
julia> getcrs(las)
(GTModelType = 0x0001, GTCitation = "PCS Name = NAD_1983_UTM_Zone_18N", GeodeticCRS = EPSG:4269, GeodeticCitation = "GCS Name = GCS_North_American_1983|Datum = North_American_1983|Ellipsoid = GRS_1980|Primem = Greenwich", GeodeticDatum = EPSG:6269, PrimeMeridian = EPSG:8901, GeogAngularUnits = EPSG:9102, GeogAngularUnitSize = 0.0, Ellipsoid = EPSG:7019, EllipsoidSemiMajorAxis = 500000.0, EllipsoidInvFlattening = 0.0, PrimeMeridianLongitude = -75.0, ProjectedCRS = EPSG:26918, ProjectedCitation = "NAD83 / UTM zone 18N|projection: Transverse Mercator", ProjMethod = 0x0001, ProjLinearUnits = EPSG:9001, ProjLinearUnitSize = 0.9996, ProjNatOriginLat = 1.0, ProjFalseEasting = 6.378137e6, ProjFalseNorthing = 298.2572221010042, ProjCenterLong = 0.0, ProjScaleAtNatOrigin = 0.017453292519943278, VerticalCitation = "NAVD88 - Geoid12A (Meters)", VerticalUnits = EPSG:9001)
This is the “raw” CRS data that is stored within the LAS, in this case using the CRS format defined in the GeoTIFF standard. We may be able to determine manually that this is meant to describe NAD83/UTM zone 18N coordinates (EPSG:26918), but PointClouds.jl can also interpret the data for us. This is done by transforming it to the well-known text (WKT) representation defined in the OpenGIS® Coordinate Transformation Service Standard, which in turn can be interpreted by the PROJ library.
We can for example use coordinates
to look at $(x, y, z)$ of the first point in the WGS 84 coordinates (EPSG:4326) that we used for our initial query:
julia> coordinates(las, 1; crs = "EPSG:4326")
(-77.04692505523937, 38.902938366348046, 57.99)
We see that these are indeed close to the coordinates of our query.
Accessing & filtering points
If we do not want to work with the full set of 16M points, we can look at a subset of those points:
julia> subset = las[1:10000]
10,000-point LAZ (v1.2, PDRF 1, 01 Jun 2015)
Source ID => 65535
Project ID => AEB2BAA1-2BEF-41FC-B9BB-BDA288E8D77B
System ID => ""
Software ID => "GeoCue LAS Updater"
X-Coordinates => 322500.0 … 322552.47000000003
Y-Coordinates => 4.30797428e6 … 4.30799999e6
Z-Coordinates => 15.8 … 723.47
Return-Counts => [1 => 9,068, 2 => 906, 3 => 26]
Extra Data => [0x00, 0x00, 0xdd, 0xcc]
Variable-Length Records
=> LASF_Projection[34735] "GeoTiff Projection Keys" (200 bytes)
=> LASF_Projection[34736] "GeoTiff double parameters" (80 bytes)
=> LASF_Projection[34737] "GeoTiff ASCII parameters" (217 bytes)
We can see that the header fields with the coordinate ranges and return counts have been updated. We can compute further statistics by looping over the points:
julia> cls = Dict()
Dict{Any, Any}()
julia> foreach(c -> cls[c] = get(cls, c, 0) + 1, classification(Int, pt) for pt in subset);
julia> sort(collect(cls))
4-element Vector{Pair{Any, Any}}:
1 => 2401
2 => 1394
17 => 4285
18 => 1920
We see that this point sample contains points marked as unclassified (1) and ground (2) according to the ASPRS standard point classes, plus a large amount of points with class 17 and 18. These classes were later defined as bridge deck (17), and high noise (18) for the PDRFs 6–10, but since these points are stored in the PDRF 1, the use of these classes technically does not conform to the LAS specification and we cannot be sure of their interpretation without further information. If we dig up the delivery report (PDF) of this data collection, we see that these classes were used for overlap default (17) and overlap ground (18).
We could decide to remove all non-standard points from our sample:
julia> subset = filter(pt -> classification(pt) <= 12, subset)
3,795-point LAZ (v1.2, PDRF 1, 01 Jun 2015)
Source ID => 65535
Project ID => AEB2BAA1-2BEF-41FC-B9BB-BDA288E8D77B
System ID => ""
Software ID => "GeoCue LAS Updater"
X-Coordinates => 322500.0 … 322552.47000000003
Y-Coordinates => 4.30797429e6 … 4.30799997e6
Z-Coordinates => 15.8 … 723.47
Return-Counts => [1 => 3,408, 2 => 372, 3 => 15]
Extra Data => [0x00, 0x00, 0xdd, 0xcc]
Variable-Length Records
=> LASF_Projection[34735] "GeoTiff Projection Keys" (200 bytes)
=> LASF_Projection[34736] "GeoTiff double parameters" (80 bytes)
=> LASF_Projection[34737] "GeoTiff ASCII parameters" (217 bytes)
In-memory processing
Let’s say we want to classify some of these unclassified points. We can load the data we need for this into memory, as described in “Loading point data”:
julia> pts = PointCloud(subset; attributes = (:class => classification, ));
This loads the coordinates plus the attributes we have specified:
julia> propertynames(pts)
(:x, :y, :z, :class)
The coordinates are no longer encoded as integers and can be used directly:
julia> typeof(pts.x)
Vector{Float64} (alias for Array{Float64, 1})
julia> pts.x[1:4]
4-element Vector{Float64}:
322500.52
322500.42
322501.14
322500.66000000003
We now try to update the classifications based on the five closest neighbors of each point. For all unclassified points, we check whether a majority of the neighbors have the same class, and if this is the case we set the class to the result of this “neighborhood vote”:
julia> count(==(1), pts.class)
2401
julia> pts.class = apply(pts, :class; neighbors = 5) do classes
self, rest = Iterators.peel(classes)
self == 1 || return self
n, c = last(sort!([count(==(c), rest) => c for c in unique(rest)]))
n >= 3 ? c : self
end;
julia> count(==(1), pts.class)
2280
We see that there are only 2280 unclassified points left, as opposed to the original 2401.
Updating & writing LAS data
Now we can combine the new classes with the original LAS data. This still does not read the whole point-cloud data into memory and instead creates a lazy representation of the points that combines the new classification with the original data.
julia> reclassified = update(subset, (classification = pts.class, ))
3,795-point LAZ (v1.2, PDRF 1, 01 Jun 2015)
Source ID => 65535
Project ID => AEB2BAA1-2BEF-41FC-B9BB-BDA288E8D77B
System ID => ""
Software ID => "GeoCue LAS Updater"
X-Coordinates => 322500.0 … 322552.47000000003
Y-Coordinates => 4.30797429e6 … 4.30799997e6
Z-Coordinates => 15.8 … 723.47
Return-Counts => [1 => 3,408, 2 => 372, 3 => 15]
Extra Data => [0x00, 0x00, 0xdd, 0xcc]
Variable-Length Records
=> LASF_Projection[34735] "GeoTiff Projection Keys" (200 bytes)
=> LASF_Projection[34736] "GeoTiff double parameters" (80 bytes)
=> LASF_Projection[34737] "GeoTiff ASCII parameters" (217 bytes)
Finally, we can write this data to a new LAS file:
julia> path = mktempdir();
julia> write(joinpath(path, "reclassified-subset.las"), reclassified)