IcfpProgrammingContest

ProperTreatment | RecentChanges | Preferences
Edit this page | View other revisions | Last edited 2006-11-25 (changes)

Markup Optimisation by Probabilistic Parsing:

LALR(5000000), the First Prize Winner in the 2001 ICFP Programming Contest

Chung-chieh Shan (KenShan) and Dylan Thurston

Our submission to the [ICFP 2001 programming contest] won the first prize. Our program is named "LALR(5000000)", for the reason explained in the README file:

    The problem is equivalent to finding the most likely parse (or rather,
    a pretty good parse) of a given string in a probabilistic context-free
    grammar.  More precisely, the string to be parsed is the sequence of
    decorated characters (or rather, contiguous chunks thereof); the PCFG
    describes the stack transitions and costs of the markup language; each
    parse tree is an acceptable way to describe the given sequence of
    decorated characters using the markup language.  Fortunately, the
    grammar is highly regular; unfortunately, the grammar is also highly
    recursive and connected.

    Standard parsing algorithms exist for PCFG parsing.  Our program is
    roughly based on one such algorithm, probabilistic CYK parsing.
    Seldom are parsing algorithms designed to handle multi-megabyte
    sentences, however...

    Dylan Thurston <dpt@math.harvard.edu>
    Chung-chieh Shan <ken@digitas.harvard.edu>

Download

We are currently working on a write-up to describe our entry. A draft is available (dated September 20, 2001; 06:25:33; revision 1.23):

Note that this document is currently in draft form. We would welcome comments and suggestions -- after all, this is a [wiki] page!

You can also [download the code we submitted here] (570,941 bytes).

--KenShan


ProperTreatment | RecentChanges | Preferences
Edit this page | View other revisions | Last edited 2006-11-25 (changes)