OpenToken

History

current version: 5.0a

OpenToken may be obtained in several ways:

OpenToken is a facility for performing token analysis and parsing within the Ada language. It is designed to provide all the functionality of a traditional lexical analyzer/parser generator, such as lex/yacc. But due to the magic of inheritance and runtime polymorphism it is implemented entirely in Ada as withed-in code. No precompilation step is required, and no messy tool-generated source code is created. The tradeoff is that the grammar is generated at runtime.

For example, here is an OpenToken LR specification of a typical arithmetic grammar, taken from Example 5.10, section 5.3, page 295 of the red dragon book (note that we have added an explicit EOF (end of file) symbol):

Grammar specification:

L -> E EOF
E -> E + T
E -> T
T -> T * F
T -> F
F -> ( E )
F -> integer

Ada code:

Grammar : constant Production_List.Instance :=
 L <= E & EOF                      + Print_Value'Access and
 E <= E & Plus & T                 + Add_Integers       and
 E <= T                                                 and
 T <= T & Times & F                + Multiply_Integers  and
 T <= F                                                 and
 F <= Left_Paren & E & Right_Paren + Synthesize_Second  and
 F <= Int_Literal;

This grammar is processed at runtime into a state machine data structure, then used to parse input text. It uses dynamic memory allocation at runtime to hold a stack of the tokens and productions.

Here is a specification of an equivalent grammar, rewritten to allow a recursive-descent implementation:

Grammar specification:

L -> E  EOF
E -> T {+ T}
T -> F {* F}
F -> ( E )
F -> integer

Ada code:

E : Operation_List.Handle    := new Operation_List.Instance;
F : Integer_Selection.Handle :=
 (Left_Paren & E & Right_Paren + Build_Parens'Access or Int) + Build_Selection'Access;
T : Operation_List.Handle    := F ** Times * Times_Element'Access + Init_Times'Access;
L : Integer_Sequence.Handle  := E & EOF + Build_Print'Access;

E.all := Operation_List.Get
 (Element     => T,
  Separator   => Plus,
  Initialize  => Init_Plus'Access,
  Add_Element => Plus_Element'Access);

This grammar is used directly at runtime to parse input text. It does not use dynamic memory allocation during parsing (dynamic memory allocation is used while building the grammar in the statements above), but it does use more stack space than the LR version. Both parsers are reasonably efficient.

OpenToken also allows implementing horribly inefficient recursive descent grammars that still work; it supports infinite lookahead with backtracking. This can be useful when just getting started with parsers; first get it working, then optimize it.

The lexer phase of OpenToken parsers is implemented by classes of recognizers. Each recognizer runs in parallel, and the one matching the longest input lexeme wins. This is not as efficient as traditional lexers, for example those generated by lex, since those use a finite state machine. But it is easier to use, because the recognizers are already written, well documented, and can be independently tested.

Some reusable parse tokens are also included in OpenToken, but they tend to be domain-specific.

Ada's type safety features make misbehaving lexers and parsers easier to debug. In addition, OpenToken includes a trace feature, that outputs useful information during grammar compile and user input parsing, also aiding debugging.

OpenToken is distributed under GPL version 3, with the GNAT modification that allows use in non-GPL projects.

OpenToken was originally written by Ted Dennison. Contributions were made by Christoph Grein and Stephen Leake.

OpenToken is currently maintained by Stephen Leake. Submit bugs directly to Stephen. Discussion about OpenToken is on the newsgroup comp.lang.ada.

OpenToken has been tested on the following operating systems/compilers:

The simplest way to use the source distribution is to install it in the standard GNAT directories, where the compiler will find it by default. This assumes you have write permission in the GNAT directory tree. Then just put 'with "OpenToken";' in your project file. To install it using a bash shell (Cygwin on Windows, or Linux), and assuming GNAT and 'make' are in your PATH:

cd opentoken/build/release
make install

See developer instructions if you want to run the tests or work on OpenToken, and for hints in case the above does not work.

Ted's original OpenToken web page is no longer being maintained. The AdaPower mailing list and discussion board are no longer used.

There is a User's Guide that describes how to create a simple application using OpenToken. There are also several examples in the source; see the Examples directory. The Test directory also has useful examples, although oriented towards testing OpenToken itself.

History

Version 5.0a 9 Feb 2014

Version 4.0b 29 Jun 2010

Version 4.0a 7 Feb 2010

Version 3.1 August 4, 2009

Version 3.0b 13 August 2000

by Ted Dennison

This version introduces recursive decent parsing. This has the following advantages over table-driven parsers:

The disadvantages are:

A general list of the changes is below:

Version 2.0 27 January 2000

This is the first version to include parsing capability. The existing packages underwent a major reorganization to accommodate the new functionality. As some of the restructuring that was done is incompatible with old code, the major revision has been bumped up to 2. A partial list of changes is below:

Version 1.3.6

This version fixes a rare bug in the Ada style based numeric recognizers. The SLOC counter can now successfully count all the source files in Gnat's adainclude directory.

Version 1.3.5

This version adds a simple Ada SLOC counting program into the examples. A bug with the Real token recognizer that caused constraint_errors has been fixed. Also bugs causing constraint errors in the ada-style based integer and real recognizers on long non-based numbers have been fixed.

Version 1.3

This version adds the default token capability to the Analyzer package. This allows a more flexible (if somewhat inefficient) means of error handling to the analyzer. The default token can be used as an error token, or it can be made into a non-reportable token to ignore unknown elements entirely.

Identifier tokens were generalized a bit to allow user-defined character sets for the first and subsequent characters. This not only gives it the ability to handle syntaxes that don't exacly match Ada's, but it allows one to define identifiers for languages that aren't latin-1 based. Also, the ability to turn off non-repeatable underscores was added.

Integer and Real tokens had an option added to support signed literals. This option is set on by default (which causes a minor backward incompatibility). Syntaxes that have addition or subtraction operators will need to turn this option off.

A test to verify proper handling of default parameters was added to the Test directory. A makefile was also added to the same directory to facilitate automatic compiling and running of the tests. This makefile will not work in a non-Gnat/NT environment without some modification.

New recognizers were added for enclosed comments (eg: C's /* */ comments) and single character escape sequences. Also a "null" recognizer was added for use as a default token.

Version 1.2.1

This version adds the CSV field token recognizer that was inadvertently left out of 1.2. This recognizer was designed to match fields in comma-separated value (CSV) files, which is a somewhat standard file format for databases and spreadsheets. Also, the extraneous CVS directories in the zip version of the distribution were removed.

Version 1.2

The long-awaited string recognizer has been added. It is capable of recognizing both C and Ada-style strings. In addition, there are a great many submissions by Christoph Grein in this release. He contributed mostly complete lexical analyzers for both Java and Ada, along with all the extra token recognizers he needed to accomplish this feat. He didn't need as many extra recognizers as I would have thought he'd need. But even so, slightly less than 1/2 of the recognizers in this release were contributed by Chris (with a broken arm, no less!)

Version 1.1

The main code change to this version is a default text feeder function that has been added to the analyzer. It reads its input from Ada.Text_IO.Current_Input, so you can change the file to whatever you want fairly easily. The capability to create and use your own feeder function still exists, but it should not be necessary in most cases. If you already have code that does this, it should still compile and work properly.

The other addition is the first version of the OpenToken user's guide. All it contains right now is a user manual walking through the steps needed to make a simple token analyzer. Feedback and/or ideas on this are welcome.

Version 1.0

This is the very first publicly released version. This package is based on work I (Ted Dennison) did while working on the JPATS trainer for FlightSafety International. The germ of this idea came while I was trying to port a fairly ambitious, but fatally buggy Ada 83 token recognition package written for a previous simulator. But once I was done, I was rather suprised at the flexibility of the final product. Seeing the possible benefit to the community, and to the company through user-submitted enhancement and debugging, I suggested that this code be released as Open Source. They were open-minded enough to agree. Bravo!


my home page Author : Stephen Leake Valid HTML 4.01! Created with Emacs powered by Ada Last modified: Tue Jan 01 23:39:35 EST 2013