Describing Programming Language Syntax
Backus-Naur Form (BNF)
A formal mathematical notation technique, or a method for describing programming languages, often used to describe the syntax of languages used in computing. It first officially appeared in the Algol 60 Report.
The language (any language) defined by BNF grammar is the set of all strings that can be produced by following the rules. The lexical structure of programming languages includes keywords, literals/constants, special symbols, and identifiers.
A rule follows this structure:
symbol := alternative1 | alternative2 ...
It states that the symbol on the left-hand side of the := must be replaced by one of the alternatives on the right hand side. Alternatives are separated by |s.
The notation for BNF in the example below:
• Angle brackets, <...>, for a non-terminal. • Vertical bar, ...|..., for choice. • Parenthesis, (...), for grouping.
Example:
<number> ::= <digit> | <number> <digit>
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
which can be read as something like:
- "a number is a digit, or any number followed by an extra digit"
- (which is a contorted but precise way of saying that a number consists of one or more digits)
- "a digit is any one of the characters 0, 1, 2, ... 9"
The advantages of using a BNF notation to define grammar for a programming language:
- There is a formally agreed syntax of the language
- It is much easier to make compilers because the parser for the compiler can be generated automatically