# Introduction

# Compilers and Interpreters

# Structure of a Compiler

The structure of a compiler contains 5 parts.

  1. Lexical Analysis
  2. Parsing
  3. Semantic Analysis
  4. Optimization
  5. Code Generalization

# Lexical Analysis

Lexical analysis divides program text into "words" or "tokens"(记号).

# Parsing

Parsing is to understand the sentence structure. Or parsing is diagramming sentences.

How human parsing a English sentence?

Step 1: identify the role of each words in a sentences. (词性分析)

Step 2: group the words together into higher level constructs.

Step 3: group them together and get a sentence.

# The Economy of Programming Language

Talk about 3 question.

  1. Why are there so many programming languages?
    • Application domains have distinctive/conflicting needs.
  2. Why are there new programming languages?
    • Programmer training is the dominant cost for a programming language.
      • Widely used language will be slowly changed.
      • It's easy to start a new language.
      • productivity > training cost
      • Language adopted to fill a void.
  3. What is a good programming language?
    • There is no universally accepted metric for language design.

# Cool Overview

We will make a compiler: Cool -> MIPS assembly language

It contains 5 programming assignments:

  • write a cool program
  • lexical analysis
  • parsing
  • semantic analysis
  • code generation
  • optimization (optional)

a valid cool program

class Main {
   i : IO <- new IO;
   main():IO {i.out_string("Hello World!\n"); }
};

# Lexical Analysis

  • Classify program into substrings according to rule.
  • Communicate tokens to the parser.

A token is a pair like <token class, lexeme> to mark the class of each substrings.

An implementation must do two things:

  • Recognize substrings corresponding to tokens.
  • Identify the token class of each lexeme.

# Example and Lookahead

DO 5 I = 1,25
DO 5 I = 1.25

LA's goal is to partition the string. This is implemented by reading left-to-right, recognizing one token at a time.

"Lookahead" may be required to decide where one token ends and the next token begins.