I'm calculating SHA-512 checksum's for video files that will be registered with a
custom video asset manager that I wrote. Checksum's are useful fingerprints for identifying a file. In my case the checksums are useful for doing reverse look ups; e.g. if I have a file, I can look up it's metadata just by using it's checksum.
SHA-512 from Command Line
For my application, I can use the linux/mac util
shasum when I register the file. In this case the command looks like:
shasum -a 512 -p /path/to/videofile.mp4
For a 40GB file, this takes about
160 seconds to run. For a 4 GB file, it takes 20 seconds.
SHA-512 in Java
However, I also need to calculate the checksum in a Java/Scala application. To do that I used the following script (written in scala)
#!/usr/bin/env scala
// Output matches the shasum -a 512 -p output
import java.io._
import java.math._
import java.security._
val file = new File(args(0))
val h = hash(file)
println(new BigInteger(1, h).toString(16)) // Dump out hash as hex
def hash(file: File): Array[Byte] = {
val in = new BufferedInputStream(new FileInputStream(file))
val digest = MessageDigest.getInstance("SHA-512");
val buffer = Array.ofDim[Byte](1048576) // 1 MB. I tried 4MB and it was the same
var sizeRead = -1
var ok = true
while(ok) {
sizeRead = in.read(buffer)
if (sizeRead == -1) ok = false
else digest.update(buffer, 0, sizeRead)
}
in.close()
digest.digest()
}
This takes
slightly longer than the command line, but it's pretty close. A 40GB file takes about
185 seconds, a 4GB file takes about 21 seconds. Of course, that also includes the time to compile the code.
0 comments:
Post a Comment