Jorge Velez

We want to trun strings (code) into computer instructions
Done in phases
Turn strings into abstract syntax trees (parse)
Translate abstract syntax trees into executable instructions (interpret or compile)

Meta-discourse

Language Syntax and Semantics
Syntax

Regular Expressions, DFSAs and NDFSAs
Grammars

Semantics

Natural Semantics
Transition Semantics

Language Syntax

Syntax is the description of which strings of symbols are meaningful expressions in a language
It takes more than syntax to understand a language; need meaning (semantics) too
Syntax is the enrtry point

Elements of Syntax

Character set - previously always ASCII, now often 64 character sets
Keywords - usually reserved
Special constants - cannot be assigned to
Identifiers - can be assigend to
Operator symbols
Delimiters (parenthesis, braces, brackets)
Blanks (aka white space)

Expressions

\[\text{if ... then begin ... ; ... end else begin ... ; ... end}\]

Type expressions

\[\text{typexpr}_1 \rightarrow \text{typexpr}_2\]

Declarations (in functional languages)

\[\text{let pattern = expr} \]

Statements (in imperative languages)

\[a = b + c \]

Subprograms

\[\text{let } \text{pattern}_1 = \text{expr}_1 \text{ in expr}\]

Modules
Interfaces
Classes (for object-oriented languages)

Lexing and Parsing

Converting strings to abstract syntax trees done in two phases

\(\textbf{Lexing}:\) Converting string (or streams of characters) into lists (or streams) of tokens (the “words” of the language)

Specification Technique: Regular Expressions

\(\textbf{Parsing}:\) Convert a list of tokens into a abstract syntax tree

Specification Technique: BNF Grammars

Formal Language Descriptions

Regular expressions, regular grammars, finite state automata
Context-free grammars, BNF grammars, syntax diagrams
Whole family more of grammars and automata - covered in automata theory

Grammars

Grammars are formal descriptions of which strings over a given character set are in a particular language
Language designers write grammar
Language implementers use grammar to know what programs to accept
Language users use grammar to know how to wrtie legitimate programs

Regular Expressions - Review

Start with a given character set - \(\textbf{a, b, c, ...}\)
\(\textcolor{red}{L(\epsilon) = \{'' ''\}}\)
Each character is a regular expression

It represents the set of one string containing just that character
\(\textcolor{red}{L(a) = \{a\}}\)

If \(\textcolor{red}{\textbf{x}}\) and \(\textcolor{red}{\textbf{y}}\) are regular expressions, then \(\textcolor{red}{\textbf{xy}}\) is a regular expression

It represents the set of all strings made from first a string described by \(\textcolor{red}{\textbf{x}}\) then a string decribed by \(\textcolor{red}{\textbf{y}}\)

\[\text{If } \textcolor{red}{L(\textbf{x}) = \{a, ab\}} \text{ and } \textcolor{red}{L(\textbf{y}) = \{c, d\}}\] \[\text{then } \textcolor{red}{L(\textbf{xy}) = \{ac, ad, abc, abd\}}\]

If \(\textcolor{red}{\textbf{x}}\) and \(\textcolor{red}{\textbf{y}}\) are regular expressions, then \(\textcolor{red}{\textbf{x} \lor \textbf{y }}\) is a regular expression

It represents the set of strings described by either \(\textcolor{red}{\textbf{x}}\) or \(\textcolor{red}{\textbf{y}}\)

\[\text{If } \textcolor{red}{L(\textbf{x}) = \{a, ab\}} \text{ and } \textcolor{red}{L(\textbf{y}) = \{c, d\}}\] \[\text{then } \textcolor{red}{L(\textbf{x} \lor \textbf{y }) = \{a, ab, c, d\}}\]

If \(\textcolor{red}{\textbf{x}}\) is a regular expression, then so is \(\textcolor{red}{\textbf{(x)}}\)

It represents the same thing as \(\textcolor{red}{\textbf{x}}\)

If \(\textcolor{red}{\textbf{x}}\) is a regular expression, then so is \(\textcolor{red}{\textbf{x}^* }\)

It represents strings made from concatenating zero or more strings from \(\textcolor{red}{\textbf{x}}\)
If \(\textcolor{red}{L(\textbf{x})}\) = \(\{a, ab\}\) then \(\textcolor{red}{L(\textbf{x}^*)} =\) \(\{'', a, ab, aa, aab, abab, ...\}\)

\(\epsilon\)

It represents \(\{" "\}\), set containing the empty string

\(\color{red}\phi\)

It represents \(\{\}\), the empty set

Example Regular Expressions

\((0 \lor 1)^* 1\)

The set of all strings of \(0\)’s and \(1\)’s ending in \(1\), \(\{1, 01, 11, ...\}\)

\(\text{a}^*\text{b}(\text{a}^*)\)

The set of all strings of \(\text{a}\)’s and \(\text{b}\)’s with exactly one \(\text{b}\)

\(((01) \lor (10))^*\)
Regular expressions (equivalently, regular grammars) important for lexing, breaking strings into recognized words

Right Regular Grammars