Reading EXIF image metadata using Apache commons-imaging and Scala

by 2/04/2015 02:03:00 PM 4 comments
I'm currently working on a project that has to watermark a few hundred thousand images and add some EXIF metadata to the images. For various reasons, JVM-based languages work best for this problem at work, so I've spent some time learning Apache's commons-imaging libraries as they support reading and writing EXIF metadata in Java. Using exiftool as 'truth', I worked through writing the metadata needed using first exiftool then commons-imaging and comparing the results. I ran into a few issues related to TIFF metadata directories that I needed to debug. In, order to do this I created a Scala script that dumps out EXIF info along with the tag id and directory id. In order to get this example to work you will need the following:
  1. Install SBT. On Macs, you can use Homebrew to install it.
  2. Create a shell script named scalas with the following contents: scalas
    #!/usr/bin/env bash
    # This script calls SBT and executes a scala script that requires external dependencies
    # The build details are added as a comment at the top of the script
    # You can set SBT_HOME or hard code the path to the directory where SBT is installed.
    # If you installed SBT using homebrew then add this to your .bashrc file:
    # export SBT_HOME='/usr/local/opt/sbt/libexec'
    java -Dsbt.main.class=sbt.ScriptMain$HOME/.sbt/boot -jar $SBT_HOME/sbt-launch.jar "$@"
Using scalas we can run scripts with SBT settings passed in. This allows us to link to 3rd party jars from our scripts without having to hard code paths. Now create a file named readexif and add the following:
#!/usr/bin/env scalas

 Dump out exif metadata

 Usage: readexif filename

scalaVersion := "2.11.5"

libraryDependencies ++= Seq(
  "org.apache.commons" % "commons-imaging" % "1.0-SNAPSHOT")

resolvers in ThisBuild ++= Seq(Resolver.mavenLocal,
    "mbari-maven-repository" at "",
    "Apache Development Snapshot Repository" at "")


import org.apache.commons.imaging.common.GenericImageMetadata.GenericImageMetadataItem
import org.apache.commons.imaging.common.ImageMetadata.ImageMetadataItem
import org.apache.commons.imaging.formats.jpeg.JpegImageMetadata
import org.apache.commons.imaging.formats.tiff.TiffImageMetadata.TiffMetadataItem
import org.apache.commons.imaging.Imaging
import scala.collection.JavaConverters._

val file = new File(args(0))
val metadata = Imaging.getMetadata(file)

if (metadata == null) {
  println("\tNo image metadata was found")

println(s"  -- ALL EXIF info (using ${metadata.getClass.getName})")
println(s"  (tag    -- directory )")
metadata match {
  case j: JpegImageMetadata => dump(j)

def dump(j: JpegImageMetadata): Unit = {
  j.getItems.asScala.foreach {i => 
    i match {
      case g: GenericImageMetadataItem => printItem(g)
      case _ => print("      $i")

def printItem(i: GenericImageMetadataItem): Unit = {
  val s = s"${i.getKeyword} : ${i.getText}"
  val tagString = i match {
    case tmi: TiffMetadataItem => {
      val tmi = i.asInstanceOf[TiffMetadataItem]
      val tiffField = tmi.getTiffField
      val tag = tiffField.getTag
      val directoryType = tiffField.getDirectoryType
      f"  (0x${tag}%04x -- 0x${directoryType}%08x)  $s"
    case _ => s

Here's example output from readexif somefile.jpg:
  -- ALL EXIF info (using org.apache.commons.imaging.formats.jpeg.JpegImageMetadata)
  (tag    -- directory )
  (0x8298 -- 0x00000000)  Copyright : 'Copyright 2003 Monterey Bay Aquarium Research Institute'
  (0x8769 -- 0x00000000)  ExifOffset : 108
  (0x8825 -- 0x00000000)  GPSInfo : 192
  (0x882a -- 0xfffffffe)  TimeZoneOffset : 0
  (0x9003 -- 0xfffffffe)  DateTimeOriginal : '2014:01:28 10:11:12'
  (0x9004 -- 0xfffffffe)  DateTimeDigitized : '2015:01:01 00:00:00'
  (0x0001 -- 0xfffffffd)  GPSLatitudeRef : 'N'
  (0x0002 -- 0xfffffffd)  GPSLatitude : 40, 43, 1/2147483647 (0)
  (0x0003 -- 0xfffffffd)  GPSLongitudeRef : 'W'
  (0x0004 -- 0xfffffffd)  GPSLongitude : 74, 0, 0
  (0x0005 -- 0xfffffffd)  GPSAltitudeRef : 1
  (0x0006 -- 0xfffffffd)  GPSAltitude : 4203/10 (420.3)
  (0x001b -- 0xfffffffd)  GPSProcessingMethod : 'MANUAL'
Compared to the output of exiftool somefile.jpg:
ExifTool Version Number         : 9.76
File Name                       : WriteExif01$-01_45_25_18-external.jpg
Directory                       : target
File Size                       : 56 kB
File Modification Date/Time     : 2015:02:04 10:53:09-08:00
File Access Date/Time           : 2015:02:04 13:56:01-08:00
File Inode Change Date/Time     : 2015:02:04 10:53:09-08:00
File Permissions                : rw-r--r--
File Type                       : JPEG
MIME Type                       : image/jpeg
JFIF Version                    : 1.02
Resolution Unit                 : None
X Resolution                    : 1
Y Resolution                    : 1
Exif Byte Order                 : Little-endian (Intel, II)
Copyright                       : Copyright 2003 Monterey Bay Aquarium Research Institute
Time Zone Offset                : 0
Date/Time Original              : 2014:01:28 10:11:12
Create Date                     : 2015:01:01 00:00:00
GPS Version ID                  :
GPS Latitude Ref                : North
GPS Longitude Ref               : West
GPS Altitude Ref                : Below Sea Level
GPS Processing Method           : MANUAL
Image Width                     : 640
Image Height                    : 486
Encoding Process                : Baseline DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:2:0 (2 2)
GPS Altitude                    : 420.3 m Below Sea Level
GPS Latitude                    : 40 deg 43' 0.00" N
GPS Longitude                   : 74 deg 0' 0.00" W
GPS Position                    : 40 deg 43' 0.00" N, 74 deg 0' 0.00" W
Image Size                      : 640x486

Brian Schlining


Cras justo odio, dapibus ac facilisis in, egestas eget quam. Curabitur blandit tempus porttitor. Vivamus sagittis lacus vel augue laoreet rutrum faucibus dolor auctor.


alan-pater said...

Just curious how you are using the TimeZoneOffset tag? I am hacking on exiv2 DateTime conversions and noticed that TimeZoneOffset is limited to whole hours.

Brian Schlining said...

I have to interrelate the images with large amounts of non-image data. (Temperature, Salinity, Location, Depth in the ocean, oxygen concentration, etc.) All data is timestamped using UTC time.

For my image processing, I explicitly write the 'Date/Time Original' and 'Create Date' (sometimes call Date/Time Digitized) using the UTC timezone. Of course, when I do this, the Time Zone Offset is always zero.

If I need to convert the exif timestamps to a different timezone, I just read it from the image and do the conversion to whatever zone I need.

Hope that helps.

Brian Schlining said...

Here's example code in Scala for writing UTC timetamps:

val timestampFormat = {
val f = new SimpleDateFormat("yyyy:MM:dd HH:mm:ss")

val outputSet = // get or create a TiffOutputSet. Ommitted for brevity
val exifDirectory = outputSet.getOrCreateExifDirectory()

// Create Date
exifDirectory.add(ExifTagConstants.EXIF_TAG_DATE_TIME_ORIGINAL, timestampFormat.format(dateTimeOriginal))

// DateTimeDigitized
exifDirectory.add(ExifTagConstants.EXIF_TAG_DATE_TIME_DIGITIZED, timestampFormat.format(createDate))

// Time Zone offset
exifDirectory.add(TiffEpTagConstants.EXIF_TAG_TIME_ZONE_OFFSET, 0.shortValue)

// Use ExifRewriter to update image metadata. Ommitted for brevity

shuixiang li said...

I find your blog for this article, it helps me a lot. But your code don't have syntax highlighting at Blogger. You may interested in from my blog.