Calculating SHA 512 in Scala

by 9/15/2016 02:58:00 PM 0 comments
I'm calculating SHA-512 checksum's for video files that will be registered with a custom video asset manager that I wrote. Checksum's are useful fingerprints for identifying a file. In my case the checksums are useful for doing reverse look ups; e.g. if I have a file, I can look up it's metadata just by using it's checksum.

SHA-512 from Command Line

For my application, I can use the linux/mac util shasum when I register the file. In this case the command looks like:
shasum -a 512 -p /path/to/videofile.mp4
For a 40GB file, this takes about 160 seconds to run. For a 4 GB file, it takes 20 seconds.

SHA-512 in Java

However, I also need to calculate the checksum in a Java/Scala application. To do that I used the following script (written in scala)
#!/usr/bin/env scala

// Output matches the shasum -a 512 -p output

import java.io._
import java.math._
import java.security._


val file = new File(args(0))
val h = hash(file)
println(new BigInteger(1, h).toString(16)) // Dump out hash as hex


def hash(file: File): Array[Byte] = {
  val in = new BufferedInputStream(new FileInputStream(file))
  val digest = MessageDigest.getInstance("SHA-512");
  val buffer = Array.ofDim[Byte](1048576) // 1 MB. I tried 4MB and it was the same
  var sizeRead = -1
  var ok = true
  while(ok) {
    sizeRead = in.read(buffer)
    if (sizeRead == -1) ok = false
    else digest.update(buffer, 0, sizeRead)
  }
  in.close()
  
  digest.digest()
}
This takes slightly longer than the command line, but it's pretty close. A 40GB file takes about 185 seconds, a 4GB file takes about 21 seconds. Of course, that also includes the time to compile the code.

hohonuuli

Developer

Cras justo odio, dapibus ac facilisis in, egestas eget quam. Curabitur blandit tempus porttitor. Vivamus sagittis lacus vel augue laoreet rutrum faucibus dolor auctor.

0 comments: