Reverse engineering message formats from static network traces is a difficult and time consuming task, but it is critical for a variety of security purposes from recovering the lost or incomplete specifications of legacy systems to understanding the communications of hostile systems. The ambiguous nature of binary data makes reverse engineering difficult: the same sequence of four bytes could be interpreted as an integer, a float, a string, a timestamp, etc., or even several smaller fields. Our key insight in tackling this problem is that while there may be an infinite number of ways data can be encoded, in practice engineers reuse standard encodings over and over again, both for atomic types, such as integers, IEEE 754 floats, and timestamps, and for compound types such as variable-length sequences. These common idioms leave behind fingerprints we can use to identify them. In the BinaryInferno project, we are exploring an ensemble-based approach in which a collection of simple detectors, each focused on a particular kind of data, work together to infer an overall description.
- Lauren Labell, Jared Chandler, and Kathleen Fisher. 2020. Automatic Discovery and Synthesis of Checksum Algorithms from Binary Data Samples. In Proceedings of the 15th Workshop on Programming Languages and Analysis for Security (PLAS’20). Association for Computing Machinery, New York, NY, USA, 25–34. DOI: https://doi.org/10.1145/3411506.3417599 Link to video of presentation: https://www.youtube.com/watch?v=L_NsZ8W5fwA
- Jared Chandler and Kathleen Fisher. 2020. Reverse Engineering Binary Messages through Design Patterns. In LangSec Workshop.
- Jared Chandler, Kathleen Fisher, Erin Chapman, Eric Davis, and Adam Wick. 2020. Invasion of the Botnet Snatchers: A Case Study in Applied Malware Cyberdeception. In Proceedings of the 53rd Hawaii International Conference on System Sciences. DOI: https://doi.org/10.24251/HICSS.2020.229