
- We want to trun strings (code) into computer instructions
- Done in phases
- Turn strings into abstract syntax trees (parse)
- Translate abstract syntax trees into executable instructions (interpret or compile)
- Language Syntax and Semantics
- Syntax
- Regular Expressions, DFSAs and NDFSAs
- Grammars
- Semantics
- Natural Semantics
- Transition Semantics
- Syntax is the description of which strings of symbols are meaningful expressions in a language
- It takes more than syntax to understand a language; need meaning (semantics) too
- Syntax is the enrtry point
- Character set - previously always ASCII, now often 64 character sets
- Keywords - usually reserved
- Special constants - cannot be assigned to
- Identifiers - can be assigend to
- Operator symbols
- Delimiters (parenthesis, braces, brackets)
- Blanks (aka white space)
- Expressions
\[\text{if ... then begin ... ; ... end else begin ... ; ... end}\]
- Type expressions
\[\text{typexpr}_1 \rightarrow \text{typexpr}_2\]
- Declarations (in functional languages)
\[\text{let pattern = expr} \]
- Statements (in imperative languages)
\[a = b + c \]
- Subprograms
\[\text{let } \text{pattern}_1 = \text{expr}_1 \text{ in expr}\]
- Modules
- Interfaces
- Classes (for object-oriented languages)
- Converting strings to abstract syntax trees done in two phases
- \(\textbf{Lexing}:\) Converting string (or streams of characters) into lists (or streams) of tokens (the “words” of the language)
- Specification Technique: Regular Expressions
- \(\textbf{Parsing}:\) Convert a list of tokens into a abstract syntax tree
- Specification Technique: BNF Grammars
- Regular expressions, regular grammars, finite state automata
- Context-free grammars, BNF grammars, syntax diagrams
- Whole family more of grammars and automata - covered in automata theory
- Grammars are formal descriptions of which strings over a given character set are in a particular language
- Language designers write grammar
- Language implementers use grammar to know what programs to accept
- Language users use grammar to know how to wrtie legitimate programs
- Start with a given character set - \(\textbf{a, b, c, ...}\)
- \(\textcolor{red}{L(\epsilon) = \{'' ''\}}\)
- Each character is a regular expression
- It represents the set of one string containing just that character
- \(\textcolor{red}{L(a) = \{a\}}\)
- If \(\textcolor{red}{\textbf{x}}\) and \(\textcolor{red}{\textbf{y}}\) are regular expressions, then \(\textcolor{red}{\textbf{xy}}\) is a regular expression
- It represents the set of all strings made from first a string described by \(\textcolor{red}{\textbf{x}}\) then a string decribed by \(\textcolor{red}{\textbf{y}}\)
\[\text{If } \textcolor{red}{L(\textbf{x}) = \{a, ab\}} \text{ and } \textcolor{red}{L(\textbf{y}) = \{c, d\}}\] \[\text{then } \textcolor{red}{L(\textbf{xy}) = \{ac, ad, abc, abd\}}\]
- If \(\textcolor{red}{\textbf{x}}\) and \(\textcolor{red}{\textbf{y}}\) are regular expressions, then \(\textcolor{red}{\textbf{x} \lor \textbf{y }}\) is a regular expression
- It represents the set of strings described by either \(\textcolor{red}{\textbf{x}}\) or \(\textcolor{red}{\textbf{y}}\)
\[\text{If } \textcolor{red}{L(\textbf{x}) = \{a, ab\}} \text{ and } \textcolor{red}{L(\textbf{y}) = \{c, d\}}\] \[\text{then } \textcolor{red}{L(\textbf{x} \lor \textbf{y }) = \{a, ab, c, d\}}\]
- If \(\textcolor{red}{\textbf{x}}\) is a regular expression, then so is \(\textcolor{red}{\textbf{(x)}}\)
- It represents the same thing as \(\textcolor{red}{\textbf{x}}\)
- If \(\textcolor{red}{\textbf{x}}\) is a regular expression, then so is \(\textcolor{red}{\textbf{x}^* }\)
- It represents strings made from concatenating zero or more strings from \(\textcolor{red}{\textbf{x}}\)
- If \(\textcolor{red}{L(\textbf{x})}\) = \(\{a, ab\}\) then \(\textcolor{red}{L(\textbf{x}^*)} =\) \(\{'', a, ab, aa, aab, abab, ...\}\)
- \(\epsilon\)
- It represents \(\{" "\}\), set containing the empty string
- \(\color{red}\phi\)
- It represents \(\{\}\), the empty set
- \((0 \lor 1)^* 1\)
- The set of all strings of \(0\)’s and \(1\)’s ending in \(1\), \(\{1, 01, 11, ...\}\)
- \(\text{a}^*\text{b}(\text{a}^*)\)
- The set of all strings of \(\text{a}\)’s and \(\text{b}\)’s with exactly one \(\text{b}\)
- \(((01) \lor (10))^*\)
- Regular expressions (equivalently, regular grammars) important for lexing, breaking strings into recognized words