Scala, Geotoolkit and UTM Zone 10N gridded data (Part 1)

by 1/26/2012 02:00:00 PM 0 comments

Overview

I am currently working on some detailed GIS analysis. It's very cool stuff; I'm analyzing about 4 million undersea video annotations collected using VARS (which I also wrote). The problem that I keep encountering again and again is coordinate transforms and data formats.

Why are coordinate transforms a problem? Well, I like to analyze data using UTM grids. UTM has an advantage for analysis in that pixels represent a uniform area, for example, a single pixel might represent a  25mx25m area. Geographic grids do not usually have uniform pixel sizes, which makes inter-pixel comparisons difficult (e.g. I found 20 critters in pixel A and 20 in pixel B, but in geographic coordinate systems pixel A and B represent different surface areas, so I can't say that they have the same density of organisms). The problem is that all the positions of our ROVs and annotations are in geographic coordinates (i.e. latitude and longitude) so a transform from geographic coordinates to projected coordinates (UTM) is required. Since I'm writing automated software to generate thousands (actually hundreds of thousands) of data products, I need software libraries that I can use for the transforms. And since I'm doing a huge number of analysis the software has to be fast. The speed issue alone rules out ESRI products; they supply python as a scripting language. As great as python is, it just isn't fast enough for what I need. Instead, I'm working on the JVM (which is pretty darn fast). So in the examples below I'll be using Scala to illustrate how to do the analysis.

The second issue, data formats, is one of those insidious annoyances that just pervades the GIS world. For one, there's a lot of different formats. For another, one format does not rule them all, so it's tough to find a grid format that's broadly usable in different software applications. For this analysis, I'll be using GMT's 'GRD' format. Why? For one, it's NetCDF, so there's a zillion libraries to read it in any number of a zillion programming languages. I do most of my analysis on the JVM, so I can use NetCDF-Java there. In addition, I often do ad-hoc analysis in Matlab, so I can use nctoolbox (which I also wrote) to work with NetCDF there. The other reason that GRD is nice is that the CDL is very simple to understand. Here's an example CDL:

netcdf ETOPO1_Bed_g_gmt4 {
dimensions:
 x = 21601 ;
 y = 10801 ;
variables:
 double x(x) ;
  x:long_name = "Longitude" ;
  x:actual_range = -180., 180. ;
  x:units = "degrees" ;
 double y(y) ;
  y:long_name = "Latitude" ;
  y:actual_range = -90., 90. ;
  y:units = "degrees" ;
 int z(y, x) ;
  z:long_name = "z" ;
  z:_FillValue = -2147483648 ;
  z:actual_range = -10898., 8271. ;

// global attributes:
  :Conventions = "COARDS/CF-1.0" ;
  :title = "ETOPO1_Bed_g_gmt4.grd" ;
  :history = "grdreformat ETOPO1_Bed_g_gdal.grd ETOPO1_Bed_g_gmt4.grd=ni" ;
  :GMT_version = "4.4.0" ;
  :node_offset = 0 ;
}

The Setup

For this analysis, I'll need to include the NetCDF-Java libraries as well as Geotoolkit libraries. Geotoolkit is an open source Java library for developing geospatial applications. I use it because: I can actually get it to work, it's NetCDF support is decent, and the lead developer,  Martin Desruisseaux, is excellent about responding to questions. There are several Geotoolkit modules that need to be included, they're listed in the pom.xml example below. At the time of this writing, the most recent released version of Geotoolkit, 3.19, has a bug that prevents it from working with my grd files. However, the 3.x-SNAPSHOT release seems to work just fine.

I'm using Maven for this project, so I thought it would be handy to post the pom.xml for reference. Yes, it's long. Yes, it's XML. Yes, I know you hate Maven and prefer Ant/Sbt/Gradle/Buildr/Make. Get over it. I'm posting the pom because it shows the dependencies needed to run the code samples below.


<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.mbari</groupId>
    <artifactId>geotools</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>geotools</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <geotoolkit.version>3.x-SNAPSHOT</geotoolkit.version>
        <scala.version>2.9.1</scala.version>
    </properties>

    <dependencies>


        <!-- GIS Referencing libraries -->
        <dependency>
            <groupId>org.geotoolkit</groupId>
            <artifactId>geotk-referencing</artifactId>
            <version>${geotoolkit.version}</version>
        </dependency>
        <dependency>
            <groupId>org.geotoolkit</groupId>
            <artifactId>geotk-epsg</artifactId>
            <version>${geotoolkit.version}</version>
        </dependency>

        <!-- JDBC Driver required for GIS referencing libraries -->
        <dependency>
         <groupId>org.apache.derby</groupId>
         <artifactId>derbyclient</artifactId>
         <version>10.8.2.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.derby</groupId>
            <artifactId>derbynet</artifactId>
            <version>10.8.2.2</version>
        </dependency>


        <!-- GIS Coverage -->
        <dependency>
            <groupId>org.geotoolkit</groupId>
            <artifactId>geotk-coverage</artifactId>
            <version>${geotoolkit.version}</version>
        </dependency>
        <dependency>
            <groupId>org.geotoolkit</groupId>
            <artifactId>geotk-coverageio</artifactId>
            <version>${geotoolkit.version}</version>
        </dependency>
        <dependency>
            <groupId>org.geotoolkit</groupId>
            <artifactId>geotk-coverageio-netcdf</artifactId>
            <version>${geotoolkit.version}</version>
        </dependency>

        <!-- Scala dependencies -->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>jline</artifactId>
            <version>${scala.version}</version>
            <scope>runtime</scope>
        </dependency>

        <!-- Logger via SLF4J -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.6.4</version>
            <scope>compile</scope>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.0.0</version>
            <scope>test</scope>
        </dependency>

        <!-- Ubiquitous JUnit dependency -->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.10</version>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <repositories>
        <repository>
            <id>maven.geotoolkit.org/</id>
            <name>Geotoolkit repository</name>
            <url>http://maven.geotoolkit.org/</url>
        </repository>
        <repository>
            <id>scala-tools.org</id>
            <name>Scala-Tools Maven2 Repository</name>
            <url>http://scala-tools.org/repo-releases</url>
            <releases>
                <enabled>true</enabled>
            </releases>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
    </repositories>

    <pluginRepositories>
        <pluginRepository>
            <id>scala-tools.org</id>
            <name>Scala-Tools Maven2 Repository</name>
            <url>http://scala-tools.org/repo-releases</url>
        </pluginRepository>
    </pluginRepositories>

    <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <testSourceDirectory>src/test/scala</testSourceDirectory>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.3.2</version>
                <configuration>
                    <encoding>${project.build.sourceEncoding}</encoding>
                    <source>1.6</source>
                    <target>1.6</target>
                    <optimize>true</optimize>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <version>2.15.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <scalaVersion>${scala.version}</scalaVersion>
                            <encoding>${project.build.sourceEncoding}</encoding>
                            <charset>${project.build.sourceEncoding}</charset>
                            <args>
                                <arg>-optimise</arg>
                                <arg>-unchecked</arg>
                                <arg>-deprecation</arg>
                                <arg>-dependencyfile</arg>
                                <arg>${project.build.directory}/.scala_dependencies</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.9</version>
                <configuration>
                    <parallel>classes</parallel>
                    <argLine>-Xmx1024m</argLine>
                    <encoding>${project.build.sourceEncoding}</encoding>
                    <forkMode>once</forkMode>
                    <useFile>false</useFile>
                    <disableXmlReport>true</disableXmlReport>
                    <!-- If you have classpath issue like NoDefClassError,... -->
                    <!-- useManifestOnlyJar>false</useManifestOnlyJar -->
                    <includes>
                        <include>**/*Test.*</include>
                        <include>**/*Suite.*</include>
                    </includes>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

Reading a Grid

First, I need to read the gridded data so I can work with it. The CDL for my dataset is:

netcdf BIGGRID_UTM50 {
dimensions:
 x = 5083 ;
 y = 4067 ;
variables:
 double x(x) ;
  x:long_name = "Easting (meters)" ;
  x:actual_range = 422985.277598382, 677085.277598383 ;
 double y(y) ;
  y:long_name = "Northing (meters)" ;
  y:actual_range = 3915514.90677777, 4118814.90677777 ;
 float z(y, x) ;
  z:long_name = "Topography (m)" ;
  z:actual_range = -4032.438f, 964.f ;

// global attributes:
  :Conventions = "COARDS/CF-1.0" ;
  :description = "\tProjection: UTM10N\n",
   "\tGrid created by: GMTGridReader.scala" ;
  :CreationDate = "20110214T193537Z" ;
}

The Scala code to read the file is:

import java.io.File
import org.geotoolkit.coverage.io.CoverageIO

val file = new File("BIGGRID_UTM50.grd")
// read returns GridCoverage, but really it's a GridCoverage2D
val coverage = CoverageIO.read(file).asInstanceOf[GridCoverage2D]

Unfortunately, the coordinate reference system, or CRS, isn't correct. How do I know that? You can examine the CRS like so:

println(coverage.getCoordinateReferenceSystem)
// NetCDF:"x y"

We need to force it to be UTM Zone 10N. If you're using a different zone, you can look up the CRS code at http://www.epsg-registry.org/. Here's how I'm forcing the zone:

import org.geotoolkit.coverage.CoverageFactoryFinder
import org.geotoolkit.referencing.CRS


// Be warned. This fires up an Apache Derby database in the background
val utmZone10N = CRS.decode("EPSG:32610")
val gridCoverageFactory = CoverageFactoryFinder.getGridCoverageFactory(null)
val sourceGeometrey = coverage.getGridGeometry
val utmCoverage = gridCoverageFactory.create(
            coverage.getName,               // The grid coverage name
            coverage.getRenderedImage,      // The pixel values
            utmZone10N,                     // The coordinate reference system
            sourceGeometrey.getGridToCRS(), // The transform from pixel to geodetic coordinates
            coverage.getSampleDimensions,   // Information about each band
            null,                           // The sources (Not relevant here)
            null)                           // Optional properties


// View CRS
println(utmCoverage.getCoordinateReferenceSystem)

/*
PROJCS["WGS 84 / UTM zone 10N", 
  GEOGCS["WGS 84", 
    DATUM["WGS84", 
      SPHEROID["WGS 84", 6378137.0, 298.257223563, AUTHORITY["EPSG","7030"]], 
      AUTHORITY["EPSG","6326"]], 
    PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]], 
    UNIT["degree", 0.017453292519943295], 
    AXIS["Geodetic latitude", NORTH], 
    AXIS["Geodetic longitude", EAST], 
    AUTHORITY["EPSG","4326"]], 
  PROJECTION["Transverse Mercator", AUTHORITY["EPSG","9807"]], 
  PARAMETER["central_meridian", -123.0], 
  PARAMETER["latitude_of_origin", 0.0], 
  PARAMETER["scale_factor", 0.9996], 
  PARAMETER["false_easting", 500000.0], 
  PARAMETER["false_northing", 0.0], 
  UNIT["metre", 1.0], 
  AXIS["Easting", EAST], 
  AXIS["Northing", NORTH], 
  AUTHORITY["EPSG","32610"]]
*/

Coming in Part 2

In part 2, I'll walk through accessing data values from the grid.

hohonuuli

Developer

Cras justo odio, dapibus ac facilisis in, egestas eget quam. Curabitur blandit tempus porttitor. Vivamus sagittis lacus vel augue laoreet rutrum faucibus dolor auctor.

0 comments: