SimpleParse is a BSD-licensed Python package
providing a simple and fast parser generator using a modified version
of the mxTextTools
text-tagging engine. SimpleParse allows you to generate parsers
directly from your
Unlike most parser generators, SimpleParse generates single-pass parsers (there is no distinct tokenization stage), an approach taken from the predecessor project (mcf.pars) which attempted to create "autonomously parsing regex objects". The resulting parsers are not as generalized as those created by, for instance, the Earley algorithm, but they do tend to be useful for the parsing of computer file formats and the like (as distinct from natural language and similar "hard" parsing problems).
As of version 2.1.0 the SimpleParse project includes a patched copy
of the mxTextTools tagging library with the non-recursive rewrite of
the core parsing loop. This means that you will need to build the
extension module to use SimpleParse, but the effect is to provide a
uniform parsing platform where all of the features of a give
SimpleParse version are always available.
For those interested in working on the project, I'm actively interested in welcoming and supporting both new developers and new users. Feel free to contact me.
You will need a copy of Python with distutils support (Python versions 2.0 and above include this). You'll also need a C compiler compatible with your Python build and understood by distutils.
To install the base SimpleParse engine, download the latest version in your preferred format. If you are using the Win32 installer, simply run the executable. If you are using one of the source distributions, unpack the distribution into a temporary directory (maintaining the directory structure) then run:
in the top directory created by the expansion process. This
will cause the patched mxTextTools library to be built as a sub-package
of the simpleparse package and will then install the whole package to
New in 2.1.0a1:
New in 2.0.1:
diff -w -r1.4 error.py
< return '%s: %s'%( self.__class__.__name__, self.messageFormat(message) )
> return '%s: %s'%( self.__class__.__name__, self.messageFormat(self.message) )
New in 2.0:
Our (current) parsers are top-down, in that they work from the top
of the parsing graph (the root production). They are not, however,
tokenising parsers, so there is no appropriate LL(x) designation as far
as I can see, and there is an arbitrary lookahead mechanism that could
theoretically parse the entire rest of the file just to see if a
particular character matches). I would hazard a guess that they
are theoretically closest to a deterministic recursive-descent parser.
There are no backtracking facilities, so any ambiguity is handled by
choosing the first successful match of a grammar (not the longest, as
in most top-down parsers, mostly because without tokenisation, it would
be expensive to do checks for each possible match's length). As a
result of this, the parsers are entirely deterministic.
The time/memory characteristics are such that, in general, the time
to parse an input text varies with the amount of text to parse. There
are two major factors, the time to do the actual parsing (which, for
simple deterministic grammars should be close to linear with the length
of the text, though a pathalogical grammar might have radically
different operating characteristics) and the time to build the results
tree (which depends on the memory architecture of the machine, the
currently free memory, and the phase of the moon). As a rule,
SimpleParse parsers will be faster (for suitably limited grammars) than
anything you can code directly in Python. They will not generally
outperform grammar-specific parsers written in C.
mxTextTools Rewrite Enhancements
Alternate C Back-end?
NOTE: This section only applies to SimpleParse versions before 2.1.0, SimpleParse 2.1.0 and above include a patched version of mxTextTools already!
You will want an mxBase 2.1.0 distribution to run SimpleParse, preferably with the non-recursive rewrite. If you want to use the non-recursive implementation, you will need to get the source archive for mxTextTools. It is possible to use mxBase 2.0.3 with SimpleParse, but not to use it for building the non-recursive TextTools engine (2.0.3 also lacks a lot of features and bug-fixes found in the 2.1.0 versions).
Note: without the non-recursive rewrite of 2.1.0 (i.e. with the recursive version), the test suite will not pass all tests. I'm not sure why they fail with the recursive version, but it does argue for using the non-recursive rewrite.
To build the non-recursive TextTools engine, you'll need to
get the source distribution for the non-recursive implementation from
file repository. Note,
there are incompatabilities in the mxBase 2.1 versions that make it
necessary to use the versions specified below to build the
This archive is intended to be expanded over the mxBase source archive from the top-level directory, replacing one file and adding four others.
tar -xvf non-recursive-1.0.0b1.tar
(Or use WinZip on Windows). When you have completed that, run:
setup.py build --force install
in the top directory of the eGenix-mx-base source tree.
The 2.1.0 and greater releases include the eGenix mxTextTools extension:
the eGenix.com Public License see the mxLicense.html
file for details on
licensing terms for the original library, the eGenix extensions are:
Copyright (c) 1997-2000, Marc-Andre Lemburg
Copyright (c) 2000-2001, eGenix.com Software GmbH
Extensions to the eGenix extensions (most significantly the rewrite of the core loop) are copyright Mike Fletcher and released under the SimpleParse License below:
Copyright © 2003-2006, Mike Fletcher
Copyright © 1998-2006, Copyright by
Mike C. Fletcher; All Rights Reserved.
Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee or royalty is hereby granted, provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice appear in supporting documentation or portions thereof, including modifications, that you make.
THE AUTHOR MIKE C. FLETCHER DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE!
Open Source project