Three decades past, the relational empire conquered the hierarchical hegemony. Today, an upstart challenges the relational empire's dominance, threatening the return of hierarchy. XML is Lisp's bastard nephew, with uglier syntax and no semantics. Yet XML is poised to enable the creation of a Web of data that dwarfs anything since the Library at Alexandria. This talk examines the design of XQuery, the W3C standard query language for XML, and related standards such as XML Schema.
MSL (Model Schema Language) is an attempt to formalize some of the core idea in XML Schema. The benefits of a formal description is that it is both concise and precise. MSL has already proved helpful in work on the design of XML Query. We expect that similar techniques can be used to extend MSL to include most or all of XML Schema.
An updated version of the paper An Algebra for XML Query.
An updated version of the manuscript An Algebra for XML Query.
This note presents a possible algebra for an XML query language, submitted as an input to the XML Query working group.
Please try out our implementation of the algebra!
There is also a November 1999 draft of the algebra, which is rather different.
Gifford and others proposed an effect typing discipline to delimit the scope of computational effects within a program, while Moggi and others proposed monads for much the same purpose. Here we marry effects to monads, uniting two previously separate lines of research. In particular, we show that the type, region, and effect system of Talpin and Jouvelot carries over directly to an analogous system for monads, including a type and effect reconstruction algorithm. The same technique should allow one to transpose any effect systems into a corresponding monad system.
This is the journal version of the ICFP effects paper.
XML (eXtensible Markup Language) is a magnet for hype: the successor to HTML for Web publishing, electronic data interchange, and e-commerce. In fact, XML is little more than a notation for trees and for tree grammars, a verbose variant of Lisp S-expressions coupled with a poor man's BNF (Backus-Naur form). Yet this simple basis has spawned scores of specialized sublanguages: for airlines, banks, and cell phones; for astronomy, biology, and chemistry; for the DOD and the IRS. Domain-specific languages indeed! There is much for the language designer to contribute here. In particular, as all this is based on a sort of S-expression, is there a role for a sort of Lisp?
This paper identifies essential features of an XML query language by examining four existing query languages: XML-QL, YATL, Lorel, and XQL. The first three languages come from the database community and possess striking similarities. The fourth comes from the document community and lacks some key functionality of the other three.
The second-order polymorphic lambda calculus (F2) was independently discovered by Girard and Reynolds. Girard additionally proved a representation theorem: every function on natural numbers that can be proved total in second-order propositional logic (P2) can be represented in F2. Reynolds additionally proved an abstraction theorem: for a suitable notion of logical relation, every term in F2 takes related arguments into related results. We observe that the essence of Girard's result is a projection from P2 into F2, and that the essence of Reynolds's result is an embedding of F2 into P2, and that the Reynolds embedding followed by the Girard projection is the identity. The Girard projection discards all first-order quantifiers, so it seems unreasonable to expect that the Girard projection followed by the Reynolds embedding should also be the identity. However, we show that in the presence of Reynolds's parametricity property that this is indeed the case, for propositions corresponding to inductive definitions of naturals, products, sums, and fixpoint types.
This note presents a formal semantics of the pattern language from the 16 December 1998 draft of XSLT. The semantics is clear and concise, summarizing in one page of formulas what required about ten pages of prose to describe. With the aid of the semantics one can rigorously state and prove properties of the language; these properties helped to guide future development of the XSLT design. The semantics was developed using standard techniques from the programming language community, and this article provides a tutorial introduction to these techniques. While little here will be new to the language theorist, some of what is here may be of use to the markup technologist.
Several recent studies have introduced lightweight versions of Java: reduced languages in which complex features like threads and reflection are dropped to enable rigorous arguments about key properties such as type safety. We carry this process a step further, omitting almost all features of the full language (including interfaces and even assignment) to obtain a small calculus, Featherweight Java, for which rigorous proofs are not only possible but easy.
Featherweight Java bears a similar relation to full Java as the lambda-calculus does to languages such as ML and Haskell. It offers a similar computational ``feel,'' providing classes, methods, fields, inheritance, and dynamic typecasts, with a semantics closely following Java's. A proof of type safety for Featherweight Java thus illustrates many of the interesting features of a safety proof for the full language, while remaining pleasingly compact. The syntax, type rules, and operational semantics of Featherweight Java fit on one page, making it easier to understand the consequences of extensions and variations.
As an illustration of its utility in this regard, we extend Featherweight Java with generic classes in the style of GJ (Bracha, Odersky, Stoutamire, and Wadler) and sketch a proof of type safety. The extended system formalizes for the first time some of the key features of GJ.
The best way to program is to get someone else to do it for you: exploit a reusable library. Many classes, especially reusable ones, are best thought of as generic; for instance, a list is generic in its element type. Java 1.1.6 comes with a Collections Library, including lists, similar to the Standard Template Library for C++. Such classes are easy to define in Java, but not so easy to use. The definer implements a class List where the elements are of type Object. The user has to remember what kind of list it is, and to add casts from Object to the element type where appropriate.
GJ extends Java with generic types. Typing is more precise: one may replace the uninformative List by the more precise, say, List<String>. And there is less to type: no extra casts to insert. Many common errors are caught by the compiler rather than left lurking until run-time. The mechanism looks like templates in C++, but has greater power (you can specify what interface a type should implement) and fewer drawbacks (no risk of code bloat).
GJ contains Java as a subset, and the GJ compiler may be used as a Java compiler. GJ compiles into Java bytecodes, so it runs wherever Java runs. GJ is fully forward and backward compatible: new GJ code may use legacy Java libraries; and legacy Java code may be linked with new GJ libraries. Legacy Java libraries may be retrofitted with GJ types, even if the source is unavailable; in particular, the Java Collections Library has been augmented with generic types. The GJ compiler is itself written in GJ, and is freely available over the web.
GJ is joint work with Martin Odersky at the University of South Australia, and Gilad Bracha and Dave Stoutamire at Sun. GJ was designed so that it could be incorporated into a future Java release, although whether this will happen is unclear. Compatibility with GJ was a design consideration for the new Java Collections Libary.
We present GJ, a design that extends the Java programming language with generic types and methods. These are both explained and implemented by translation into the unextended language. The translation closely mimics the way generics are emulated by programmers: it erases all type parameters, maps type variables to their bounds, and inserts casts where needed. Some subtleties of the translation are caused by the handling of overriding.
GJ increases expressiveness and safety: code utilizing generic libraries is no longer buried under a plethora of casts, and the corresponding casts inserted by the translation are guaranteed to not fail.
GJ is designed to be fully backwards compatible with the current Java language, which simplifies the transition from non-generic to generic programming. In particular, one can retrofit existing library classes with generic interfaces without changing their code.
An implementation of GJ has been written in GJ, and is freely available on the web.
This tutorial introduces the concepts of GJ by example.
This specification defines GJ in rigorous detail.
Gifford and others proposed an effect typing discipline to delimit the scope of computational effects within a program, while Moggi and others proposed monads for much the same purpose. Here we marry effects to monads, uniting two previously separate lines of research. In particular, we show that the type, region, and effect system of Talpin and Jouvelot carries over directly to an analogous system for monads, including a type and effect reconstruction algorithm. The same technique should allow one to transpose any effect systems into a corresponding monad system.
There are two styles of lazy evaluation, one (called `odd') that is easy to encode in the traditional `delay' and `force' syntax, but forces too much evaluation, and another (called `even') that delays evaluation as in traditional lazy languages, but is harder to encode. This note presents a new `lazy' syntax, and shows how the even style is easy to encode in this syntax, while the odd style is harder to encode. The new `lazy' syntax is defined by translation into the `delay-force' syntax. Comparisons are drawn with two other syntaxes, one used in CAML and one proposed by Chris Okasaki.
This column lists eight reasons why functional languages are not more widely used (compatibility, libraries, portability, availability, packagability, tools, training, and popularity) and two non-reasons (performance and ``they just don't get it''), then draws four lessons.
Parametric types and virtual types have recently been proposed as extensions to Java to support genericity. In this paper we examine both in order to investigate the strengths and weaknesses of each. We suggest a variant of virtual types which has similar expressiveness, but supports safe static type checking. This results in a language in which both parametric types and virtual types are well-integrated, and which is statically type-safe.
Functional programmers and reuse engineers dine at the same table. Delicacies like type abstraction and higher-order functions are meat and potatoes for those who need to reuse code parameterised by types and operations.
The talk will start with a review of modern functional languages. Isolation has given way to systems that interact with C and COM components. Code quality can rival C. Functional programs deliver calls in Brussels, route planes through Paris, and play CDs over networks at Cornell.
The talk will then describe Pizza, an attempt to make functional ideas accessible to a wider community by embedding them in Java. Pizza contains Java as a subset, so its easy to learn, and it compiles to the Java Virtual Machine, so it runs anywhere Java runs, including web browsers. We will focus on how Pizza is designed to add parametric types on top of existing Java libraries, enhancing reuse.
Applications of functional languages have been described elsewhere; this paper describes salient features of the latest version of Pizza.
John Hughes has made pretty-printers one of the prime demonstrations of using combinators to develop a library, and algebra to implement it. This note presents a new design for pretty printers which improves on Hughes's classic design. The new design is based on a single concatenation operator which is associative and has a left and right unit. Hughes's design requires two separate operators for concatenation, where horizontal concatenation has a right unit but no left unit, and vertical concatenation has neither unit.
``Have you used it in anger yet?''
The time is a dozen years ago, the place is Oxford, and my fellow postdoc has just scrutinized my new bike. He's admired the chrome, checked the gears, noted the Kryptonite lock. Now he wants to know if I've used it to serious purpose. Gleaming chrome is well and good, but will it run you through the woods?
``Have you used it in anger yet?''
Having read the title of this column, you may have just asked the same question, though perhaps in different words. You've scrutinized functional languages. You've admired the elegance of lambda calculus, checked the benchmarks from the compilers, noted the security provided by strong typing. Now you want to know if they have been used to serious purpose. Mathematical elegance is well and good, but will it run that mission-critical system?
Here are a half-dozen examplars of functional programs used in anger.
One way to model a sound and complete translation from a source into a target calculus is with an adjoint or a Galois connection. In the special case of a reflection, one also has that the target calculus is isomorphic to a subset of the source. We show that three widely-studied translations form reflections. We use as our source language Moggi's computational lambda calculus lambda-c, which is an extension of Plotkin's call-by-value calculus lambda-v. We show that Plotkin's CPS translation, Moggi's monad translation, and Girard's translation to linear logic can all be regarded as reflections from this source language, and we put forward lambda-c as a model of call-by-value computation that improves on lambda-v. Our work strengthens Plotkin's and Moggi's original results, and improves on recent work based on equational correspondence, which uses equations rather than reductions.
This tutorial describes the use of a monad to integrate interaction into a purely declarative language. This technique has been implemented in the higher-order functional language Haskell. A sketch is given of how it might be added to a first-order language for logic programming.
The Java phenomenon means that programmers that once laughed at garbage collection and strong typing have started to use it daily, and this opens up a wonderful opportunity for the functional programming community....
We present a type system for the programming language Erlang. The type system supports subtyping and declaration-free recursive types, using subtyping constraints. Our system is similar to one explored by Aiken and Wimmers, though it sacrifices expressive power in favor of simplicity. We cover our techniques for type inference, type simplification, and checking when an inferred type conforms to a user-supplied type signature, and report on early experience with our prototype.
Pizza is a strict superset of Java that incorporates three ideas from the academic community:
Two different operational interpretations of intuitionistic linear logic have been proposed in the literature. The simplest interpretation recomputes non-linear values every time they are required. It has good memory-management properties, but is often dismissed as being too inefficient. Alternatively, one can memoize the results of evaluating non-linear values. This avoids any recomputation, but has weaker memory-management properties. Using a novel combination of type-theoretic and operational techniques we give a concise formal comparison of the two interpretations. Moreover, we show that there is a subset of linear logic where the two operational interpretations coincide. In this subset, which is sufficiently expressive to encode call-by-value lambda-calculus, we can have the best of both worlds: a simple and efficient implementation, and good memory-management properties.
Benton's adjoint models of Girard's linear logic also provide models of Moggi's computational metalanguage. We consider three different translations of lambda calculus into other calculi: direct, call-by-name, and call-by-value. We show that the three translations (mainly due to Moggi) into the computational metalanguage correspond to three translations (mainly due to Girard) into intuitionistic linear logic. We also extend these results to languages with recursion.
Lazy, or call-by-need, languages schedule work dynamically by building closures, and shun side effects; strict, or call-by-value languages avoid the overhead of closures and may exploit side effects. Each style has complementary advantages and complimentary adherents.
The gap between the lazy and strict camps has two dimensions, which we shall name style and models. Recent developments suggest that along both dimensions the gap is shrinking. We list some commercial applications of each kind of language, and examine each dimension of difference in turn.