RESEARCH

Verified Parsing

Parsers are the “trusted gatekeepers” of high-assurance software; larger programs rely on them to consume input from untrusted sources. In the high-assurance parsing project, we are using formal methods to develop parsers that come with machine-checked proofs of their correctness. Parsers are natural targets for formal verification because they are hard to implement correctly but easy to specify: while parser bugs are pervasive (with grave security consequences), parsers have a simple high-level definition of what constitutes correct behavior. By proving that a parser conforms to this correctness specification, we can guarantee that it is free of common classes of bugs. The two verified parsers we have developed, Vermillion and CoStar, occupy different points on the “speed vs. expressiveness” spectrum of parsing algorithms. Vermillion runs in linear time and is compatible with a more restricted class of grammars; CoStar is slower in the worst case, but it achieves linear performance on many real-world grammars and is suitable for parsing a broader range of languages.

Papers

  • D. Egolf, S. Lasser, and K. Fisher. Verbatim++: Verified, Optimized, and Semantically Rich Lexing with Derivatives. In CPP ’22: Proceedings of the 11th ACM SIGPLAN Conference on Certified Programs and Proofs, 2022.
  • S. Lasser, C. Casinghino, K. Fisher, and C. Roux. CoStar: A verified ALL(*) parser. In PLDI ’21: Proceedings of the 2021 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2021.
  • S. Lasser, C. Casinghino, K. Fisher, and C. Roux. A verified LL(1) parser generator. In Interactive Theorem Proving, volume 141 of Leibniz International Proceedings in Informatics (LIPIcs). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, September 2019.
  • D. Egolf, S. Lasser, and K. Fisher. Verbatim: A verified lexer generator. In LangSec Work- shop, 2021.
  • T. Parr, S. Harwell, and K. Fisher. Adaptive LL(*) parsing: The power of dynamic analysis. In Proceedings of the 2014 ACM SIGPLAN International Conference on Object-Oriented Programming Systems Languages and Applications, 2014.
  • T. Parr and K. Fisher. LL(*): The foundation of the ANTLR parser generator. In PLDI ’11: Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011.