RESEARCH

Domain-Specific Languages:  PADS

The goal of the PADS project is to make it easier for data analysts to extract useful information from ad hoc data files, bridging the gap between the unmanaged world of ad hoc data and the managed world of typed programming languages and databases. The project centers around the design of the PADS data description language and its embodiment in three different host languages: C, ML, and Haskell. From a PADS description, the compiler generates parsing tools that produce meta-data as well as the parsed values. The project defines the formal semantics of the language, generates a number of useful auxiliary tools using type-directed programming techniques, and explores an inferencing system that can learn PADS descriptions from positive examples of the data format.

Find us on github!

People

PADS alumni:

  • Mark Daly
  • Zach DeVito
  • Pamela Dragosh
  • Mary Fernandez
  • Andrew Forrest
  • Joel Gottleib
  • Vikas Kedia
  • John Launchbury
  • Yitzhak Mandelbaum
  • Ricardo Medel
  • Frances Spalding
  • Peter White
  • Qian Xi
  • Xuan Zheng
  • Kenny Zhu

PADS Live!

Vist the Learning Demo.

Thanks!

We would like to thank Bala Krishnamurthy, Andrew Hume, David Poole, and Oliver Spatscheck for informative discussions about particular forms of ad hoc data and potentially useful tools for manipulating that data.

We would like to thank Glenn Fowler and Phong Vo for their help in using the AST and SFIO libraries.

We would like to thank Diane Ristaino for artistic assistance.

Support

The PADS project has been generously supported by AT&T. In addition, portions of the work have been supported by DARPA Grant No. FA8750-07-C-0014 and the National Science Foundation under Grants No. 0612147, 0633268, and 0615062. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA or the National Science Foundation.

We have always a long list of things to do. So if you want to help, don’t be shy and find us on github!

User’s Manual

Research Papers

The Pads Project: An Overview. Kathleen Fisher and David Walker Invited Paper, International Conference on Database Theory, March 2011. PDF

Forest: A Language and Toolkit for Programming with Filestores. Kathleen Fisher, Nate Foster, David Walker, and Kenny Q. Zhu. Princeton Technical Report TR-889-10, December 2010.  PDF

The Next 700 Data Description Languages. Kathleen Fisher, Yitzhak Mandelbaum, David Walker. Journal of the ACM, Volumne 57 Issue 2, January 2010.  PDF

Incremental Learning of System Log Formats. Kenny Q. Zhu, Kathleen Fisher, and David Walker. WASL, October 2009. Workshop proceedings reprinted in SIGOPS Operating Systems Review, Volume 44 Issue 1, January 2010.  PDF

Ad Hoc Data and the Token Ambiguity Problem. Qian Xi, Kathleen Fisher, David Walker, and Kenny Q. Zhu. PADL, January 2009.  PDF

LearnPADS: Automatic Tool Generation from Ad Hoc Data. Kathleen Fisher, David Walker, and Kenny Zhu. SIGMOD, June 2008.  PDF

From Dirt to Shovels: Fully Automatic Tool Generation from Ad Hoc Data. Kathleen Fisher, David Walker, Kenny Q. Zhu, and Peter White POPL, January 2008.  PDF

A Generic Programming Toolkit for PADS/ML: First-Class Upgrades for Third-Party Developers. Mary Fernandez, Kathleen Fisher, Nathan Foster, Michael Greenberg, and Yitzhak Mandelbaum Practical Applications of Declarative Languages, January 2008.  PDF

A Dual Semantics for the Data Description Calculus. Kathleen Fisher, Yitzhak Mandelbaum, and David Walker I, April 2007.  PDF

Towards 1-click Tool Generation with PADS. David Burke, Kathleen Fisher, David Walker, Peter White, and Kenny Q. Zhu Workshop on Challenges and Applications of Grammar Induction, June 2007.  PDF

PADS/ML: A Functional Data Description Language. Yitzhak Mandelbaum, Kathleen Fisher, David Walker, Mary Fernandez, and Artem Gleyzer. POPL, January 2007.  PDF

PADS/ML: A Functional Data Description Language. Yitzhak Mandelbaum, Kathleen Fisher, David Walker, Mary Fernandez, and Artem Gleyzer. Princeton Technical Report, July 2006.  PDF

The Theory and Practice of Data Description. Yitzhak Mandelbaum. Ph.D. Thesis, September 2006.  PDF

PADS: An end-to-end system for processing ad hoc data. Mark Daly, Mary Fernandez, Kathleen Fisher, Yitzhak Mandelbaum, and David Walker. SIGMOD, June 2006.  PDF

PADX: Querying Large-scale Ad Hoc Data with XQuery. Mary Fernandez, Kathleen Fisher, Robert Gruber, and Yitzhak Mandelbaum. PLAN-X, January 2006.  PDF

LaunchPads: A System for Processing Ad Hoc Data. Mark Daly, Mary Fernandez, Kathleen Fisher, Yitzhak Mandelbaum, and David Walker. PLAN-X Demo, January 2006.  PDF

The Next 700 Data Description Languages. Kathleen Fisher, Yitzhak Mandelbaum, and David Walker. POPL, January 2006.  PDF

PADS/T: A Language for Describing and Transforming Ad Hoc Data. Mary Fernandez, Kathleen Fisher, Yitzhak Mandelbaum, and David Walker. Princeton Technical Report, September 2005.  PDF

PADS: A Domain-Specific Language for Processing Ad Hoc Data. Kathleen Fisher and Robert Gruber. PLDI, June 2005.  PDF

PADS: Processing Arbitrary Data Streams. Kathleen Fisher and Robert Gruber. MPDS, June 2003.  PDF

Presentations

Ad Hoc Data and the Token Ambiguity Problem. Qian Xi PADL, January 2009. PPT

Programming Language Ideas Escape the Lab: A Declarative Data Description Language. Kathleen Fisher Grace Hopper, October 2008. PDF

From Dirt to Shovels: Fully Automatic Tool Generation from Ad Hoc Data. Kenny Zhu. POPL, January 2008.

A Generic Programming Toolkit for PADS/ML: First-Class Upgrades for Third-Party Developers. Michael Greenberg. Practical Applications of Declarative Languages, January 2008. PDF

Towards 1-click Tool Generation with PADS. David Walker. CAGI, June 2007. PPT

PADS/ML: A Functional Data Description Language. Yitzhak Mandelbaum. POPL, January 2007. PDF Quicktime Keynote (tar zipped)

Typing Ad Hoc Data. Kathleen Fisher. TLDI, January 2007. PDF Keynote (tar zipped)

PADS: A System for Managing Ad Hoc Data. Kathleen Fisher. Pomona Distinguished Lecture, November 2006. PPT

The Next 700 Data Description Languages. Yitzhak Mandelbaum. POPL, January 2006. PPT

PADS: A Domain-Specific Language for Processing Ad Hoc Data. Kathleen Fisher. PLDI, June 2005. PPT

The Next 700 Data Description Languages. Yitzhak Mandelbaum. IBM PL Day, April 2005. PPT

PADS: Simplified Data Processing for Scientists. David Walker. Princeton Engineering Junior Faculty Seminar Series, March 2005.

PADS/Galax Architecture. Yitzhak Mandelbaum. AT&T Intern Talk, 2004. PDF

PADS/Galax Introduction. Yitzhak Mandelbaum. AT&T Intern Talk, 2004. PPT

PADS Overview. Kathleen Fisher. Microsoft Invited Talk, February 2004. PPT

Find us on github!