Kathleen’s research focuses on advancing the theory and practice of programming languages. All of her work is collaborative and much is interdisciplinary. In one research thrust, she applies formal methods and other programming language techniques to produce software that is provably functionally correct with the goal of making hacking significantly harder. Examples include the DARPA HACMS program that built verified software for vehicles that a world-class red-team failed to hack into and a formally verified parser generator that provably rejects all malformed inputs. In a second thrust, she develops domain-specific languages to make it easier to solve problems in particular domains. Examples include PADS for data format manipulation, Forest for filestore management, and Hancock for stream processing. In a third thrust, she studies program synthesis, using search techniques to generate programs from high-level specifications. Examples include synthesizing high-performance data structure(s) and concurrency control strategies for a given program and workload, inferring data descriptions from example data, and synthesizing lenses for synchronizing data stored in different formats.
Applying Formal Methods
The HACMS project strives to create technology to construct high-assurance cyber-physical systems that are much harder to hack. To that end, HACMS adopts a clean-slate, formal methods-based approach that leverages domain-specific languages, interactive theorem provers, and model checkers.
Parsing, which is theoretically well-understood but error-prone in practice, is an important aspect of high-assurance software because parsers frequently handle input from untrusted sources. This project develops high-assurance parsers that have formal semantics and machine-checked proofs of correctness.
The PADS project aims to simplify the processing of ad hoc data sources. Its users can declaratively describe data sources and then use generated tools to understand, parse, translate, and format data.
The Forest domain-specific language extends the rigorous discipline of typed programming languages to the untyped world of file systems, allowing users to specify the structure, attributes, and invariants of collections of files. Forest leverages the PADS project to manage the format of individual files.
Hancock is a C-based domain-specific language designed to make it easy to read, write, and maintain programs that use large amounts of relatively uniform data to build signatures of entities of interest. The motivation for Hancock was facilitating fraud detection by understanding the normal behavior of AT&T customers.
Lenses, which are domain-specific languages for specifying provably correct bi-directional data transformations, are difficult to write. This project uses program synthesis to generate lenses from example data instead.
Autobahn uses genetic algorithms to automatically infer strictness annotations for Haskell programs to improve program performance.
Data Representation Synthesis
This project studies how to declaratively specify combinations of data structures with complex sharing relationships from which a synthesis engine can generate provably correct and efficient code.
The BinaryInferno project seeks to help users reverse engineer binary data formats. BinaryInferno deploys a modular collection of detectors, each specialized to a particular type of data, and then integrates the results to produce an overall description suitable for parsing that describes as much of the data as possible.
This project aims to study how to best teach programming language techniques to learners from middle school through upper-division college classes.