Thursday, September 15, 2016

Calculating SHA 512 in Scala

I'm calculating SHA-512 checksum's for video files that will be registered with a custom video asset manager that I wrote. Checksum's are useful fingerprints for identifying a file. In my case the checksums are useful for doing reverse look ups; e.g. if I have a file, I can look up it's metadata just by using it's checksum.

SHA-512 from Command Line

For my application, I can use the linux/mac util shasum when I register the file. In this case the command looks like:
shasum -a 512 -p /path/to/videofile.mp4
For a 40GB file, this takes about 160 seconds to run. For a 4 GB file, it takes 20 seconds.

SHA-512 in Java

However, I also need to calculate the checksum in a Java/Scala application. To do that I used the following script (written in scala)
#!/usr/bin/env scala

// Output matches the shasum -a 512 -p output

import java.io._
import java.math._
import java.security._


val file = new File(args(0))
val h = hash(file)
println(new BigInteger(1, h).toString(16)) // Dump out hash as hex


def hash(file: File): Array[Byte] = {
  val in = new BufferedInputStream(new FileInputStream(file))
  val digest = MessageDigest.getInstance("SHA-512");
  val buffer = Array.ofDim[Byte](1048576) // 1 MB. I tried 4MB and it was the same
  var sizeRead = -1
  var ok = true
  while(ok) {
    sizeRead = in.read(buffer)
    if (sizeRead == -1) ok = false
    else digest.update(buffer, 0, sizeRead)
  }
  in.close()
  
  digest.digest()
}
This takes slightly longer than the command line, but it's pretty close. A 40GB file takes about 185 seconds, a 4GB file takes about 21 seconds. Of course, that also includes the time to compile the code.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.