Bytewise EBNF

Parsing at the byte level is a royal pain. Ideally I would like to be able to specify a conceptual layout for a group of bytes (say an ID3 header) and not deal with the intricacies of twiddling bits and whatnot.

The idea behind this project is to use grammars to specify the pattern of bytes and then the parser generates a series of events similar to the way SAX operates. Events are fired when a rule is started or ended of when a terminal is captured.

Things are a bit different from standard EBNF because frequently byte packing formats will include a byte for the length of the postceding data, so there has to be some method for representing that in the grammar (and the parser has to handle it as well). This potentially leads to some computability issues, depending on the level of flexibility, but an attempt was made to make the program at least fail in a predictable way if the grammar is pathological.

There is a specification for the EBNF used in this project. Like any good descriptive language, it is capable of describing itself.

The parser for the BEBNF grammar is written using ANTLR: bytewise_ebnf.g.