Gregory Conti
Sergey Bratus
Benjamin Sangster
Roy Ragsdale
Matthew Supan
Andrew Lichtenberg
Robert Perez-Alemany
Anna Shubina

Abstract

Security analysts, reverse engineers, and forensic analysts are regularly faced with large binary objects, such as executable and data files, process memory dumps, disk images and hibernation files, often Gigabytes or larger in size and frequently of unknown, suspect, or poorly documented structure. Binary objects of this magnitude far exceed the capabilities of traditional hex editors and textual command line tools, frustrating analysis. This paper studies automated means to map these large binary objects by classifying regions using a multi-dimensional, information-theoretic approach. We make several contributions including the introduction of the binary mapping metaphor and its associated applications, as well as techniques for type classification of low-level binary fragments. We validate the efficacy of our approach through a series of classification experiments and an analytic case study. Our results indicate that automated mapping can help speed manual and automated analysis activities and can be generalized to incorporate many low-level fragment classification techniques.