



## Table of Contents

- 1. An Overview of Pipelining
- 2. A Pipelined Datapath
- 3. Pipelined Control
- 4. Data Hazards and Forwarding
- 5. Data Hazards and Stalls
- 6. Branch Hazards
- 7. Using a Hardware Description Language to Describe and Model a Pipeline
- 8. Exceptions
- 9. Advanced Pipelining: Extracting More Performance
- 10. Real Stuff: The Pentium 4 Pipeline
- 11. Concluding Remarks

3/119

## 6.1 An Overview of Pipelining: An Example

- The time from placing a single dirty sock in the washer until it is dried, folded, and put away is not shorter for pipelining;
  - the reason pipelining is faster for many loads is that everything is working in parallel, so more loads are finished per hour.
- Pipelining improves throughput of our laundry system without improving the time to complete a single load.
  - Hence, pipelining would not decrease the time to complete one load of laundry,
  - but when we have many loads of laundry to do, the improvement in throughput decreases the total time to complete the work.

4/119





## **Pipeline Instruction Execution**

- □ MIPS instructions classically take five steps:
- 1. Fetch instruction from memory
- 2. Read registers while decoding the instruction
  - The format of MIPS instructions allows reading and decoding to occur simultaneously
- 3. Execute the operation or calculate an address
- 4. Access an operand in data memory
- 5. Write the result into a register

Hence, the MIPS pipeline we explore in this chapter has five stages.





| Instruction                                                | n class                    | Instruction<br>fetch                       | Register<br>read             | ALU<br>operation       | Data<br>access      | Register<br>write     | Total<br>time |
|------------------------------------------------------------|----------------------------|--------------------------------------------|------------------------------|------------------------|---------------------|-----------------------|---------------|
| Load word (                                                | lw)                        | 200 ps                                     | 100 ps                       | 200 ps                 | 200 ps              | 100 ps                | 800 ps        |
| Store word (                                               | sw)                        | 200 ps                                     | 100 ps                       | 200 ps                 | 200 ps              |                       | 700 ps        |
| R-format (ac<br>or, slt)                                   | ld, sub, and,              | 200 ps                                     | 100 ps                       | 200 ps                 |                     | 100 ps                | 600 ps        |
| Branch (bec                                                | ą)                         | 200 ps                                     | 100 ps                       | 200 ps                 |                     |                       | 500 ps        |
| g. 6.2: To<br>omponent.<br>nis calcula<br>gn extensi       | tal time for<br>tion assum | each instru<br>nes that the<br>ve no delay | uction ca<br>e multipl<br>/. | alculated<br>exors, co | from th<br>ntrol ur | e time f<br>hit, PC a | or each       |
| ig. 6.2: To<br>omponent.<br>his calcula<br>gn extensi      | tal time for<br>tion assum | each instru<br>nes that the<br>ve no delay | uction ca<br>e multipl<br>/. | alculated<br>exors, co | from th<br>ntrol ur | e time f<br>hit, PC a | or each       |
| ig. 6.2: Tot<br>omponent.<br>his calcula<br>gn extensi     | tal time for<br>tion assum | each instrunes that the ve no delay        | uction ca<br>e multipl<br>/. | alculated<br>exors, co | from th             | e time f<br>hit, PC a | or each       |
| ig. 6.2: To<br>omponent.<br>his calcula<br>ign extensi     | tal time for<br>tion assum | each instru<br>nes that the<br>ve no delay | uction ca<br>e multipl<br>/. | alculated<br>exors, co | from th             | e time f<br>hit, PC a | or each       |
| Fig. 6.2: To<br>component.<br>This calcula<br>sign extensi | tal time for<br>tion assum | each instru<br>nes that the<br>ve no delay | uction ca<br>e multipl<br>/. | alculated<br>exors, co | from th             | e time f<br>nit, PC a | or each       |

|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Program<br>execution Time 200 400 600 800 1000 1200 1400 1600 1800<br>order I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Iw \$1, 100(\$0)     Instruction Reg     ALU     Data Access     Reg       Iw \$2, 200(\$0)     800 ps     Instruction Factor     Reg     ALU     Data Access                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Iw \$3, 300(\$0) 800 ps                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Program<br>execution Time 200 400 600 800 1000 1200 1400<br>order I I I →<br>(in instructions)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Iw     \$1, 100(\$0)     Instruction<br>fetch     Reg     ALU     Data<br>access     Reg       Iw     \$2, 200(\$0)     200 ps     Instruction<br>fetch     Reg     ALU     Data<br>access     Reg                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Ivv \$3, 300(\$0)<br>200 ps Instruction<br>200 ps 200 ps |
| <ul> <li>Both use the sa</li> <li>A fourfold</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | ame hardware components, whose time is listed in Figure 6.2.<br>I speedup on <b>average time between instructions:</b> from 800 ps down to 200 ps.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| <ul> <li>Compare this fi<br/>then the dryer s</li> <li>The compare the com</li></ul> | gure to Figure 6.1: For the laundry, we assumed all stages were equal. If the dryer were slowest,<br>stage would set the stage time.<br>outer oiceline stage times are limited by the slowest resource, either the ALU operation or the                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |



| Hazards:                                                                                                                                      |
|-----------------------------------------------------------------------------------------------------------------------------------------------|
| <ul> <li>the situations in pipelining when the next instruction cannot execute in the following<br/>clock cycle</li> </ul>                    |
| Structural hazard:                                                                                                                            |
| <ul> <li>An occurrence in which a planned instruction cannot execute in the proper clock<br/>cycle</li> </ul>                                 |
| <ul> <li>because the hardware cannot support the combination of instructions that are set to<br/>execute in the given clock cycle.</li> </ul> |
| Data hazard:                                                                                                                                  |
| Also called pipeline data hazard                                                                                                              |
| An occurrence in which a planned instruction cannot execute in the proper clock cycle                                                         |
| <ul> <li>because the data that is needed to execute the instruction is not yet available</li> </ul>                                           |
| Forwarding:                                                                                                                                   |
| Also called bypassing                                                                                                                         |
| <ul> <li>A method of resolving a data hazard by retrieving the missing data element from<br/>internal buffers</li> </ul>                      |
| <ul> <li>rather than waiting for it to arrive from programmer-visible registers or memory</li> </ul>                                          |
| The name forwarding comes from the idea that the result is passed forward from an earlier instruction to a later instruction                  |
| Bypassing comes from passing the result by the register file to the desired unit                                                              |



| <ul> <li>A specific form of data hazard</li> <li>in which the data requested by a load instruction has not yet become available when it is requested.</li> <li>Pipeline stall: <ul> <li>Also called <b>bubble</b></li> <li>A stall initiated in order to resolve a hazard</li> </ul> </li> <li>Control hazard: <ul> <li>Also called branch hazard</li> <li>An occurrence in which the proper instruction cannot execute in the proper clock cycle</li> <li>because the instruction that was fetched is not the one that is needed</li> <li>that is, the flow of instruction addresses is not what the pipeline expected.</li> </ul> </li> </ul> | Load-use data hazard:                                                                                             |                       |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|-----------------------|
| <ul> <li>In which the data requested by a load instruction has not yet become available when it is requested.</li> <li>Pipeline stall: <ul> <li>Also called <b>bubble</b></li> <li>A stall initiated in order to resolve a hazard</li> </ul> </li> <li>Control hazard: <ul> <li>Also called branch hazard</li> <li>An occurrence in which the proper instruction cannot execute in the proper clock cycle <ul> <li>because the instruction that was fetched is not the one that is needed</li> <li>that is, the flow of instruction addresses is not what the pipeline expected.</li> </ul> </li> </ul></li></ul>                               | <ul> <li>A specific form of data hazard</li> <li>is which the data requested by a load instruction has</li> </ul> | not yet hereine       |
| <ul> <li>Pipeline stall:         <ul> <li>Also called bubble</li> <li>A stall initiated in order to resolve a hazard</li> </ul> </li> <li>Control hazard:         <ul> <li>Also called branch hazard</li> <li>An occurrence in which the proper instruction cannot execute in the proper clock cycle             <ul> <li>because the instruction that was fetched is not the one that is needed</li> <li>that is, the flow of instruction addresses is not what the pipeline expected.</li> </ul> </li> </ul></li></ul>                                                                                                                        | <ul> <li>available when it is requested.</li> </ul>                                                               | not yet become        |
| <ul> <li>Also called <b>bubble</b></li> <li>A stall initiated in order to resolve a hazard</li> <li>Control hazard:</li> <li>Also called branch hazard</li> <li>An occurrence in which the proper instruction cannot execute in the proper clock cycle <ul> <li>because the instruction that was fetched is not the one that is needed</li> <li>that is, the flow of instruction addresses is not what the pipeline expected.</li> </ul> </li> </ul>                                                                                                                                                                                            | Pipeline stall:                                                                                                   |                       |
| <ul> <li>A stall initiated in order to resolve a hazard</li> <li>Control hazard:</li> <li>Also called branch hazard</li> <li>An occurrence in which the proper instruction cannot execute in the proper clock cycle</li> <li>because the instruction that was fetched is not the one that is needed</li> <li>that is, the flow of instruction addresses is not what the pipeline expected.</li> </ul>                                                                                                                                                                                                                                           | <ul> <li>Also called <b>bubble</b></li> </ul>                                                                     |                       |
| <ul> <li>Control nazard:</li> <li>Also called branch hazard</li> <li>An occurrence in which the proper instruction cannot execute in the proper clock cycle         <ul> <li>because the instruction that was fetched is not the one that is needed</li> <li>that is, the flow of instruction addresses is not what the pipeline expected.</li> </ul> </li> </ul>                                                                                                                                                                                                                                                                               | A stall initiated in order to resolve a hazard                                                                    |                       |
| <ul> <li>Also called branch hazard</li> <li>An occurrence in which the proper instruction cannot execute in the prop clock cycle</li> <li>because the instruction that was fetched is not the one that is needed</li> <li>that is, the flow of instruction addresses is not what the pipeline expected.</li> </ul>                                                                                                                                                                                                                                                                                                                              |                                                                                                                   |                       |
| <ul> <li>An occurrence in which the proper instruction cannot execute in the prop clock cycle</li> <li>because the instruction that was fetched is not the one that is needed</li> <li>that is, the flow of instruction addresses is not what the pipeline expected.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                 | Also called branch hazard                                                                                         |                       |
| <ul> <li>because the instruction that was fetched is not the one that is needed</li> <li>that is, the flow of instruction addresses is not what the pipeline expected.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                               | <ul> <li>An occurrence in which the proper instruction cannot<br/>clock cycle</li> </ul>                          | execute in the proper |
| <ul> <li>that is, the flow of instruction addresses is not what the pipeline expected.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | <ul> <li>because the instruction that was fetched is not the one</li> </ul>                                       | that is needed        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | that is, the flow of instruction addresses is not what the                                                        | pipeline expected.    |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                   |                       |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                   |                       |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                   |                       |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                   |                       |







|           | eorderi               | ng coo            | de to avoid                | pipelin           | e stalls<br>ment in C                                              |
|-----------|-----------------------|-------------------|----------------------------|-------------------|--------------------------------------------------------------------|
| A         | = B +                 | - Е;              |                            | ie segi           | nent in O.                                                         |
| С         | = B +                 | - F;              |                            |                   |                                                                    |
| H<br>Va   | ere is tł<br>ariables | ne ger<br>are ir  | nerated MIF<br>n memory a  | PS cod<br>nd are  | e for this segment, assuming all addressable as offsets from \$t0: |
|           | lw                    | \$t1,             | 0(\$t0)                    | #                 |                                                                    |
|           | lw                    | \$t2,             | 4(\$t0)                    | #                 |                                                                    |
|           | add                   | \$t3,             | \$t1, \$t2                 | #                 |                                                                    |
|           | sw                    | \$t3,             | 12(\$t0)                   | #                 |                                                                    |
|           | lw                    | \$t4,             | 8(\$t0)                    | #                 |                                                                    |
|           | add                   | \$t5,             | \$t1, \$t4                 | #                 |                                                                    |
|           | sw                    | \$t5,             | 16(\$t0)                   | #                 |                                                                    |
| D P<br>th | roblem:<br>ne instru  | find t<br>ictions | he hazards<br>s to avoid a | in the<br>ny pipe | following code segment and reorder eline stalls.                   |















| Structural hazards usually revolve around the floating-point unit, which may not be fully pipelined,                                                                                                                   |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| which tend to have higher branch frequencies as well as less predictable branches.                                                                                                                                     |
| Data hazards can be performance bottlenecks in both integer and floating-point programs.                                                                                                                               |
| Comparison:                                                                                                                                                                                                            |
| Often it is easier to deal with data hazards in floating-point programs<br>because the lower branch frequency and more regular access patterns<br>allow the compiler to try to schedule instructions to avoid hazards. |
| It is more difficult to perform such optimizations in integer programs that<br>have less regular access involving more use of pointers.                                                                                |
| There are more ambitious compiler and hardware techniques for reducing data dependences through scheduling.                                                                                                            |



























| Five Pipe Stages                                                                                                                |
|---------------------------------------------------------------------------------------------------------------------------------|
| Question:                                                                                                                       |
| in Fig. 6.14 (b) for a load: which instruction supplies the write register number?                                              |
| Answer:                                                                                                                         |
| the instruction in the IF/ID PR supplies the write register number                                                              |
| yet this instruction occurs considerably after the load instruction!                                                            |
| Hence, we need to preserve the destination register number in the load instruction.                                             |
| <ul> <li>Just as store passed the register contents from the ID/EX to the EX/MEM<br/>for use in the MEM stage,</li> </ul>       |
| Ioad must pass the register number from the ID/EX through EX/MEM to<br>the MEM/WB for use in the WB stage.                      |
| Another way to think about the passing of the register number:                                                                  |
| <ul> <li>in order to share the pipelined datapath, we need to preserve the<br/>instruction read during the IF stage,</li> </ul> |
| <ul> <li>so each PR contains a portion of the instruction needed for that stage and<br/>later stages.</li> </ul>                |
| 40/119                                                                                                                          |

















| LW       00       load word       XXXXXX       add       0010         SW       00       store word       XXXXXX       add       0010         Branch equal       01       branch equal       XXXXXX       subtract       0110         R-type       10       add       100000       add       0010         R-type       10       subtract       100010       subtract       0110         R-type       10       AND       100100       and       0000         R-type       10       OR       100101       or       0001         R-type       10       set on less than       101010       set on less than       0111 | Instruction<br>opcode | ALUOp    | Instruction<br>operation | Function code   | Desired<br>ALU action | ALU control<br>input |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------|----------|--------------------------|-----------------|-----------------------|----------------------|
| SW       00       store word       XXXXXX       add       0010         Branch equal       01       branch equal       XXXXXX       subtract       0110         R-type       10       add       100000       add       0010         R-type       10       subtract       100000       add       0010         R-type       10       subtract       100100       and       0000         R-type       10       OR       100101       or       0001         R-type       10       Set on less than       101010       set on less than       0111                                                                       | LW                    | 00       | load word                | XXXXXXX         | add                   | 0010                 |
| Branch equal         01         branch equal         XXXXXX         subtract         0110           R-type         10         add         100000         add         0010           R-type         10         subtract         100010         subtract         0110           R-type         10         AND         100100         and         0000           R-type         10         OR         100101         or         0001           R-type         10         Set on less than         101010         set on less than         0111                                                                        | SW                    | 00       | store word               | XXXXXX          | add                   | 0010                 |
| R-type10add100000add0010R-type10subtract100010subtract0110R-type10AND100100and0000R-type100R100101or0001R-type10set on less than101010set on less than0111                                                                                                                                                                                                                                                                                                                                                                                                                                                         | Branch equa           | 01       | branch equal             | XXXXXX          | subtract              | 0110                 |
| R-type       10       subtract       100010       subtract       0110         R-type       10       AND       100100       and       0000         R-type       10       OR       100101       or       0001         R-type       10       Set on less than       101010       set on less than       0111                                                                                                                                                                                                                                                                                                          | R-type                | 10       | add                      | 100000          | add                   | 0010                 |
| R-type10AND100100and0000R-type100R100101or0001R-type10set on less than101010set on less than0111A copy of Fig. 5.12 on p. 302.This figure shows how the ALU control bits are set depending on the ALUOp control bits and the different funct codes for the R-type instruction.                                                                                                                                                                                                                                                                                                                                     | R-type                | 10       | subtract                 | 100010          | subtract              | 0110                 |
| R-type100R100101or0001R-type10set on less than101010set on less than0111A copy of Fig. 5.12 on p. 302.This figure shows how the ALU control bits are set depending on the ALUOp control bits and the different funct codes for the R-type instruction.                                                                                                                                                                                                                                                                                                                                                             | R-type                | 10       | AND                      | 100100          | and                   | 0000                 |
| R-type       10       set on less than       101010       set on less than       0111         A copy of Fig. 5.12 on p. 302.       This figure shows how the ALU control bits are set depending on the ALUOp control bits and the different funct codes for the R-type instruction.       0111                                                                                                                                                                                                                                                                                                                     | R-type                | 10       | OR                       | 100101          | or                    | 0001                 |
| A copy of Fig. 5.12 on p. 302.<br>This figure shows how the ALU control bits are set depending on the ALUOp control bits and the different funct codes for the R-type instruction.                                                                                                                                                                                                                                                                                                                                                                                                                                 | R-type                | 10       | set on less than         | 101010          | set on less than      | 0111                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | This figure s         | shows ho | w the ALU co             | ontrol bits are | set dependi           | ng on the A          |

|                     | Effect when deasserted (0)                                                                                        | Effect when asserted (1)                                                                                                                                           |
|---------------------|-------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| RegDst              | The register destination number for the Write register<br>comes from the rt field (bits 20:16).                   | The register destination number for the Write register comes from the rd field (bits 15:11).                                                                       |
| RegWrite            | None.                                                                                                             | The register on the Write register input is written with the value o<br>the Write data input.                                                                      |
| ALUSrc              | The second ALU operand comes from the second<br>register file output (Read data 2).                               | The second ALU operand is the sign-extended, lower 16 bits of th<br>instruction.                                                                                   |
| PCSrc               | The PC is replaced by the output of the adder that<br>computes the value of PC + 4.                               | The PC is replaced by the output of the adder that computes the<br>branch target.                                                                                  |
| MemRead             | None.                                                                                                             | Data memory contents designated by the address input are put of the Read data output.                                                                              |
| MemWrite            | None.                                                                                                             | Data memory contents designated by the address input are<br>replaced by the value on the Write data input.                                                         |
| MemtoReg            | The value fed to the register Write data input comes<br>from the ALU.                                             | The value fed to the register Write data input comes from the dat<br>memory.                                                                                       |
| A copy o<br>The ALU | f Fig 5.16 on p. 306. The function o<br>control lines (ALUOp) are defined<br>1-bit control to a two-way mux is as | f each of seven control signals is define<br>in the second column of Figure 6.23.<br>serted, the mux selects the input<br>is deasserted, the mux selects the 0 inp |

| Reg<br>InstructionALU<br>Op1ALU<br>Op1ALU<br>Op0ALU<br>SrcHern<br>BranchMem<br>RedMem<br>WriteReg<br>WriteMem to<br>RegRformat111000010 $1w$ 000101011 $1w$ 000101011 $1w$ 00101010XbeqX0101000XIntervalues of the control lines are the same as in Figure 5.18 on page 308Intervalues of the control lines are the groups corresponding to the last three pipeline stagesStages |                                                          | Execu                               | tion/addres<br>contro      | s calculation<br>ol lines | n stage    | Mem                   | ory access :<br>control lines | stage        | Write-ba<br>contr    | ack stage<br>ol lines |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|-------------------------------------|----------------------------|---------------------------|------------|-----------------------|-------------------------------|--------------|----------------------|-----------------------|
| Rformat110000010 $1w$ 0001010111 $sw$ X0010010XbeqX0101000X $\Box$ The values of the control lines are the same as in Figure 5.18 on page 308 $\Box$ but they have been shuffled into three groups corresponding to the last three pipeline stages                                                                                                                               | Instruction                                              | Reg<br>Dst                          | ALU<br>Op1                 | ALU<br>Op0                | ALU<br>Src | Branch                | Mem<br>Read                   | Mem<br>Write | Reg<br>Write         | Mem to<br>Reg         |
| 1w000101011 $sw$ X0010010XbeqX0101000XThe values of the control lines are the same as in Figure 5.18 on page 308but they have been shuffled into three groups corresponding to the last three pipeline stages                                                                                                                                                                    | R-format                                                 | 1                                   | 1                          | 0                         | 0          | 0                     | 0                             | 0            | 1                    | 0                     |
| sw     X     0     0     1     0     0     1     0     X       beq     X     0     1     0     1     0     0     0     X                                                                                                                                                                                                                                                         | lw                                                       | 0                                   | 0                          | 0                         | 1          | 0                     | 1                             | 0            | 1                    | 1                     |
| <ul> <li>beq X 0 1 0 1 0 0 0 X</li> <li>The values of the control lines are the same as in Figure 5.18 on page 308</li> <li>but they have been shuffled into three groups corresponding to the last three pipeline stages</li> </ul>                                                                                                                                             | CW                                                       | х                                   | 0                          | 0                         | 1          | 0                     | 0                             | 1            | 0                    | Х                     |
| <ul> <li>The values of the control lines are the same as in Figure 5.18 on page 308</li> <li>but they have been shuffled into three groups corresponding to the last three pipeline stages</li> </ul>                                                                                                                                                                            | on                                                       |                                     |                            |                           |            |                       |                               |              |                      |                       |
|                                                                                                                                                                                                                                                                                                                                                                                  | <ul> <li>The value</li> <li>but they pipeline</li> </ul> | x<br>ues of th<br>have be<br>stages | o<br>ne contro<br>een shuf | l lines ar fled into      | e the sai  | ne as in<br>oups corr | o<br>Figure 5<br>respondii    | .18 on pang  | age 308<br>last thre | ee x                  |













| De      | pendence Detection                                                                                                                                                                                                                                                                                                          |
|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Cla     | assify the dependences in this sequence from p. 403:                                                                                                                                                                                                                                                                        |
|         | sub \$2, \$1, \$3                                                                                                                                                                                                                                                                                                           |
|         | and \$12,\$2, \$5                                                                                                                                                                                                                                                                                                           |
|         | or \$13,\$6, \$2                                                                                                                                                                                                                                                                                                            |
|         | add \$14,\$2, \$2  # 1st(\$2) & 2nd(\$2) set by sub                                                                                                                                                                                                                                                                         |
|         | sw \$15,100(\$2) # Index(\$2) set by sub                                                                                                                                                                                                                                                                                    |
| Th<br>D | e remaining hazards are as follows:<br>The sub-or is a type 2b hazard:<br>MEM/WB.RegisterRd = ID/EX.RegisterRt = \$2<br>The two dependences on sub-add are not hazards because the<br>register file supplies the proper data during the ID stage of add.<br>There is no data hazard between sub and sw because sw reads \$2 |
|         | the clock cycle after sub writes \$2.                                                                                                                                                                                                                                                                                       |









| ForwardA = 00       ID/EX       The first ALU operand comes from the register file.         ForwardA = 10       EX/MEM       The first ALU operand is forwarded from the prior ALU result.         ForwardA = 01       MEM/WB       The first ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 00       ID/EX       The second ALU operand comes from the register file.         ForwardB = 10       EX/MEM       The second ALU operand is forwarded from the prior ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         Image: teacher data from the prior ALU result is described in the second to the second toperand is forwarded from data memory or an earl | ForwardA = 00<br>ForwardA = 10<br>ForwardA = 01<br>ForwardB = 00<br>ForwardB = 10<br>ForwardB = 01 | ID/EX<br>EX/MEM<br>MEM/WB<br>ID/EX<br>EX/MEM<br>MEM/WB | The first ALU operand comes from the register file. The first ALU operand is forwarded from the prior ALU result. The first ALU operand is forwarded from data memory or an earlier ALU result. The second ALU operand comes from the register file. The second ALU operand is forwarded from data memory or an earlier ALU result. efforwarding multiplexors in Figure 6.30.           |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ForwardA = 10       EX/MEM       The first ALU operand is forwarded from the prior ALU result.         ForwardA = 01       MEM/WB       The first ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 00       ID/EX       The second ALU operand comes from the register file.         ForwardB = 10       EX/MEM       The second ALU operand is forwarded from the prior ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         Image: the second all operand is forwarded from the prior ALU result.       The signed immediate that is another input to the ALU is described in the elaboration at the end of this section.                                                                                                                                                                                                                                                                                        | ForwardA = 10<br>ForwardA = 01<br>ForwardB = 00<br>ForwardB = 10<br>ForwardB = 01                  | EX/MEM<br>MEM/WB<br>ID/EX<br>EX/MEM<br>MEM/WB          | The first ALU operand is forwarded from the prior ALU result. The first ALU operand is forwarded from data memory or an earlier ALU result. The second ALU operand comes from the register file. The second ALU operand is forwarded from the prior ALU result. The second ALU operand is forwarded from data memory or an earlier ALU result. eforwarding multiplexors in Figure 6.30. |
| ForwardA = 01       MEM/WB       The first ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 00       ID/EX       The second ALU operand comes from the register file.         ForwardB = 10       EX/MEM       The second ALU operand is forwarded from the prior ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         The control values for the forwarding multiplexors in Figure 6.30.       The signed immediate that is another input to the ALU is described in the elaboration at the end of this section.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | ForwardA = 01<br>ForwardB = 00<br>ForwardB = 10<br>ForwardB = 01                                   | MEM/WB<br>ID/EX<br>EX/MEM<br>MEM/WB                    | The first ALU operand is forwarded from data memory or an earlier<br>ALU result.<br>The second ALU operand comes from the register file.<br>The second ALU operand is forwarded from the prior ALU result.<br>The second ALU operand is forwarded from data memory or an<br>earlier ALU result.                                                                                         |
| ForwardB = 00       ID/EX       The second ALU operand comes from the register file.         ForwardB = 10       EX/MEM       The second ALU operand is forwarded from the prior ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         Image: The control values for the forwarding multiplexors in Figure 6.30.       The signed immediate that is another input to the ALU is described in the elaboration at the end of this section.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | ForwardB = 00<br>ForwardB = 10<br>ForwardB = 01                                                    | ID/EX<br>EX/MEM<br>MEM/WB                              | The second ALU operand comes from the register file.<br>The second ALU operand is forwarded from the prior ALU result.<br>The second ALU operand is forwarded from data memory or an<br>earlier ALU result.<br>e forwarding multiplexors in Figure 6.30.                                                                                                                                |
| ForwardB = 10       EX/MEM       The second ALU operand is forwarded from the prior ALU result.         ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         The control values for the forwarding multiplexors in Figure 6.30.       The signed immediate that is another input to the ALU is described in the elaboration at the end of this section.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | ForwardB = 10<br>ForwardB = 01                                                                     | EX/MEM<br>MEM/WB                                       | The second ALU operand is forwarded from the prior ALU result.<br>The second ALU operand is forwarded from data memory or an<br>earlier ALU result.                                                                                                                                                                                                                                     |
| ForwardB = 01       MEM/WB       The second ALU operand is forwarded from data memory or an earlier ALU result.         The control values for the forwarding multiplexors in Figure 6.30.       The signed immediate that is another input to the ALU is described in the elaboration at the end of this section.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | ForwardB = 01                                                                                      | MEM/WB                                                 | The second ALU operand is forwarded from data memory or an earlier ALU result.                                                                                                                                                                                                                                                                                                          |
| <ul> <li>The control values for the forwarding multiplexors in Figure 6.30.</li> <li>The signed immediate that is another input to the ALU is described in the elaboration at the end of this section.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | control value                                                                                      | es for the                                             | forwarding multiplexors in Figure 6.30.                                                                                                                                                                                                                                                                                                                                                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                    |                                                        |                                                                                                                                                                                                                                                                                                                                                                                         |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                    |                                                        |                                                                                                                                                                                                                                                                                                                                                                                         |





| _ |                                                                                                                                                                                                                                                                                                                                                                                                                                              |                               |
|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
|   | One complication is potential data hazards between the result of the inst<br>WB stage, the result of the instruction in the MEM stage, and the source<br>instruction in the ALU stage.                                                                                                                                                                                                                                                       | ruction in the operand of the |
|   | For example, when summing a vector of numbers in a single register, a sinstructions will all read and write to the same register:<br>add \$1,\$1,\$2                                                                                                                                                                                                                                                                                         | sequence of                   |
|   | 25 15 bbc                                                                                                                                                                                                                                                                                                                                                                                                                                    |                               |
|   | add \$1,\$1,\$4                                                                                                                                                                                                                                                                                                                                                                                                                              |                               |
|   | add \$1,\$1,\$4<br><br>In this case, the result is forwarded from the MEM stage because the result                                                                                                                                                                                                                                                                                                                                           | sult in the MEM               |
|   | add \$1,\$1,\$3<br>add \$1,\$1,\$4<br><br>In this case, the result is forwarded from the MEM stage because the resistage is the more recent result.                                                                                                                                                                                                                                                                                          | sult in the MEM               |
|   | add \$1,\$1,\$3<br>add \$1,\$1,\$4<br><br>In this case, the result is forwarded from the MEM stage because the res<br>stage is the more recent result.<br>Thus the control for the MEM hazard would be (with the additions highlig                                                                                                                                                                                                           | sult in the MEM<br>Ihted):    |
|   | add \$1,\$1,\$1,\$3<br>add \$1,\$1,\$1,\$4<br><br>In this case, the result is forwarded from the MEM stage because the res<br>stage is the more recent result.<br>Thus the control for the MEM hazard would be (with the additions highlig                                                                                                                                                                                                   | sult in the MEM<br>ghted):    |
|   | add \$1,\$1,\$1,\$3<br>add \$1,\$1,\$1,\$4<br><br>In this case, the result is forwarded from the MEM stage because the res<br>stage is the more recent result.<br>Thus the control for the MEM hazard would be (with the additions highlig<br>if (MEM/WB.RegWrite<br>and (MEM/WB.RegisterRd ± 0)                                                                                                                                             | sult in the MEM<br>jhted):    |
|   | <pre>add \$1,\$1,\$3 add \$1,\$1,\$4 In this case, the result is forwarded from the MEM stage because the resistage is the more recent result. Thus the control for the MEM hazard would be (with the additions highlig  if (MEM/WB.RegWrite and (MEM/WB.RegisterRd = 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)</pre>                                                                                                                    | sult in the MEM<br>Jhted):    |
|   | <pre>add \$1,\$1,\$3<br/>add \$1,\$1,\$4<br/><br/>In this case, the result is forwarded from the MEM stage because the resistage is the more recent result.<br/>Thus the control for the MEM hazard would be (with the additions highlig)<br/>if (MEM/WB.RegWrite<br/>and (MEM/WB.RegisterRd # 0)<br/>and (EX/MEM.RegisterRd # ID/EX.RegisterRs)<br/>and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01</pre>                         | sult in the MEM<br>Jhted):    |
|   | <pre>add \$1,\$1,\$3<br/>add \$1,\$1,\$4<br/><br/>In this case, the result is forwarded from the MEM stage because the resistage is the more recent result.<br/>Thus the control for the MEM hazard would be (with the additions highlig)<br/>if (MEM/WB.RegWrite<br/>and (MEM/WB.RegisterRd # 0)<br/>and (EX/MEM.RegisterRd # ID/EX.RegisterRs)<br/>and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01<br/>if (MEM/WB.RegWrite</pre> | sult in the MEM<br>ghted):    |
|   | <pre>add \$1,\$1,\$3<br/>add \$1,\$1,\$4<br/><br/>In this case, the result is forwarded from the MEM stage because the resistage is the more recent result.<br/>Thus the control for the MEM hazard would be (with the additions highlig<br/>if (MEM/WB.RegWrite<br/>and (MEM/WB.RegisterRd = 0)<br/>and (EX/MEM.RegisterRd = ID/EX.RegisterRs)<br/>and (MEM/WB.RegisterRd = 0)<br/>if (MEM/WB.RegisterRd = 0)</pre>                         | sult in the MEM<br>Jhted):    |



























| <ul><li>the</li><li>we</li></ul> | ) pipeli<br>move | ine is c<br>d the b | ptimiz<br>prancł | zed i<br>n exe | for<br>ecu | branches that are not taken and<br>tion to the ID stage |  |
|----------------------------------|------------------|---------------------|------------------|----------------|------------|---------------------------------------------------------|--|
| 36                               | sub              | \$10,               | \$4,             | \$8            | #          |                                                         |  |
| 40                               | beq              | \$1,                | \$3,             | 7              | #          | PC-relative branch to $40 + 4 + 7 * 4 = 72$             |  |
| 44                               | and              | \$12,               | \$2,             | \$5            | #          |                                                         |  |
| 48                               | or               | \$13,               | \$2,             | \$6            | #          |                                                         |  |
| 52                               | add              | \$14,               | \$4,             | \$2            | #          |                                                         |  |
| 56                               | slt              | \$15,               | \$6,             | \$7            | #          |                                                         |  |
|                                  |                  |                     |                  |                | #          |                                                         |  |
| 72                               | lw               | \$4,                | 50(\$            | \$7)           | #          |                                                         |  |





## **Dynamic Branch Prediction** Dynamic branch prediction: Assuming a branch is not taken is one simple form of branch prediction in an aggressive pipeline, a simple static prediction scheme will probably waste too much performance. One approach is to look up the address of the instruction • to see if a branch was taken the last time this instruction was executed, and, if so, to begin fetching new instructions from the same place as the last time. Implementations: With more hardware, it is possible to try to predict branch behavior during program execution. One implementation of that approach is a branch prediction buffer or branch history table. Branch prediction buffer: also called branch history table: ✤ a small memory indexed by the lower portion of the address of the branch instruction contains one or more bits indicating whether the branch was recently taken or not. D Prediction is just a hint that is assumed to be correct, so fetching begins in the predicted direction. If the hint turns out to be wrong, the incorrectly predicted instructions are deleted, the pre-diction bit is inverted and stored back, and the proper sequence is fetched and executed.

83/119

|                                                                          | Prediction                                                                                                                                                                                             |
|--------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                          |                                                                                                                                                                                                        |
| <ul> <li>Froblem</li> <li>Cons</li> <li>taken</li> <li>predic</li> </ul> | der a loop branch that branches nine times in a row, then is not<br>once. What is the prediction accuracy for this branch, assuming the<br>ction bit for this branch remains in the prediction buffer? |
| □ Answer:                                                                |                                                                                                                                                                                                        |
| <ul> <li>The s<br/>loop i</li> </ul>                                     | teady-state prediction behavior will mispredict on the first and last terations.                                                                                                                       |
| ■ M<br>ta                                                                | ispredicting the last iteration is inevitable since the prediction bit will say ken: the branch has been taken nine times in a row at that point.                                                      |
| ■ T<br>pi<br>oi                                                          | ne misprediction on the first iteration happens because the bit is flipped on<br>ior execution of the last iteration of the loop, since the branch was not taken<br>in that exiting iteration.         |
| <ul> <li>Thus,<br/>is onl</li> </ul>                                     | the prediction accuracy for this branch that is taken 90% of the time y 80% (two incorrect predictions and eight correct ones).                                                                        |
|                                                                          |                                                                                                                                                                                                        |
|                                                                          |                                                                                                                                                                                                        |
|                                                                          | 84/119                                                                                                                                                                                                 |







| From the Example on p. 215 ("P.     | orformance of Single Cycle           |
|-------------------------------------|--------------------------------------|
| Machines") we get the followi       | no functional unit times:            |
| □ 200 ps for memory access          |                                      |
| □ 100 ps for ALU operation          |                                      |
| □ 50 ps for register file read or w | rite                                 |
| For the single-cycle datapath, thi  | s leads to a clock cycle of          |
| 200 + 50 + 100 + 200 + 50 = 6       | 600 ps                               |
| The Example on p. 330 ("CPI in a    | a Multicycle CPU") has the following |
| instruction frequencies:            |                                      |
| □ 25% loads                         |                                      |
| 10% stores                          |                                      |
| 11% branches                        |                                      |
| □ 2% jumps                          |                                      |
| 52% ALU instructions                | 220 showed that the CDI for the      |
| multiple design was 4.12. The       | source for the multicycle            |
| datapath and the pipelined des      | ion must be the same as the          |
| longest functional unit: 200 ps     |                                      |











| Answer: p. 429                                                                  |
|---------------------------------------------------------------------------------|
| □ Figure 6.43 shows the events:                                                 |
| starting with the add instruction in the EX stage.                              |
| The overflow is detected during that phase, and                                 |
| <ul> <li>4000 0040<sub>hex</sub> is forced into the PC.</li> </ul>              |
| Clock cycle 7 shows that the add and following instructions are flushed,        |
| <ul> <li>and the first instruction of the exception code is fetched.</li> </ul> |
| Note that:                                                                      |
| the address of the instruction following the add is saved:                      |
| $4C_{hex} + 4 = 50_{hex}$ .                                                     |
|                                                                                 |
|                                                                                 |
|                                                                                 |
|                                                                                 |
|                                                                                 |
|                                                                                 |
|                                                                                 |
| 94/119                                                                          |







| Precise or Imprecise Exceptions                                                                                                                         |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Imprecise interrupt:                                                                                                                                    |  |
| <ul> <li>Also called imprecise exception.</li> </ul>                                                                                                    |  |
| Interrupts or exceptions in pipelined computers that are not associated<br>with the exact instruction that was the cause of the interrupt or exception. |  |
| Precise interrupt                                                                                                                                       |  |
| Also called precise exception.                                                                                                                          |  |
| An interrupt or exception that is always associated with the correct instruction in pipelined computers.                                                |  |
|                                                                                                                                                         |  |
|                                                                                                                                                         |  |
|                                                                                                                                                         |  |
|                                                                                                                                                         |  |
|                                                                                                                                                         |  |
|                                                                                                                                                         |  |
|                                                                                                                                                         |  |
| 20//40                                                                                                                                                  |  |
| 90/113                                                                                                                                                  |  |



| Multiple-Issue Pipeline                                                                                                                                                                                                                             |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| issue slots: the positions from which instructions could issue in a given clock cycle;                                                                                                                                                              |
| by analogy these correspond to positions at the starting blocks for a sprint.                                                                                                                                                                       |
| Two primary and distinct responsibilities in a multiple-issue pipeline:                                                                                                                                                                             |
| 1. Packaging instructions into issue slots:                                                                                                                                                                                                         |
| <ul> <li>How does the processor determine how many instructions and which<br/>instructions can be issued in a given clock cycle?</li> </ul>                                                                                                         |
| <ul> <li>In static issue processors: this process is at least partially handled by the<br/>compiler;</li> </ul>                                                                                                                                     |
| <ul> <li>In dynamic issue designs: it is normally dealt with at runtime by the processor,</li> <li>although the compiler will often have already tried to help improve the issue rate by placing the instructions in a beneficial order.</li> </ul> |
| 2. Dealing with data and control hazards:                                                                                                                                                                                                           |
| <ul> <li>In static issue processors: some or all of the consequences of data and<br/>control hazards are handled statically by the compiler.</li> </ul>                                                                                             |
| <ul> <li>In dynamic issue processors: attempt to alleviate at least some classes of<br/>hazards using hardware techniques operating at execution time.</li> </ul>                                                                                   |
| 100/119                                                                                                                                                                                                                                             |



| Static Multiple Issue                                                                                                                                                                    |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <b>issue packet</b> : the set of instructions that issues together in 1 clock cycle;                                                                                                     |
| the packet may be determined statically by the compiler or dynamically by<br>the processor.                                                                                              |
| Static multiple-issue processors: all use the compiler to                                                                                                                                |
| <ul> <li>assist with packaging instructions</li> <li>and handling hazards.</li> </ul>                                                                                                    |
| In a static issue processor, you can think of the set of instructions that<br>issue in a given clock cycle (e.g., an issue packet) as one large<br>instruction with multiple operations. |
| This view led to the original name for this approach: Very Long Instruction<br>Word (VLIW).                                                                                              |
| <ul> <li>The Intel IA-64 architecture uses this approach,</li> <li>With a name: Explicitly Parallel Instruction Computer (EPIC).</li> </ul>                                              |
| Most static issue processors also rely on the compiler to take on some responsibility for handling data and control hazards.                                                             |
| 102/119                                                                                                                                                                                  |



| instruction type                         |         |                           |                             | Pi                   | pe stage                       | s                  |                               |    |
|------------------------------------------|---------|---------------------------|-----------------------------|----------------------|--------------------------------|--------------------|-------------------------------|----|
| ALU or branch instruction                | IF      | ID                        | EX                          | MEM                  | WB                             |                    |                               |    |
| Load or store instruction                | IF      | ID                        | EX                          | MEM                  | WB                             |                    |                               |    |
| ALU or branch instruction                |         | IF                        | ID                          | EX                   | MEM                            | WB                 |                               |    |
| Load or store instruction                |         | IF                        | ID                          | EX                   | MEM                            | WB                 |                               |    |
| ALU or branch instruction                |         |                           | IF                          | ID                   | EX                             | MEM                | WB                            |    |
| Load or store instruction                |         |                           | IF                          | ID                   | EX                             | MEM                | WB                            |    |
| ALU or branch instruction                |         |                           |                             | IF                   | ID                             | EX                 | MEM                           | WB |
| Load or store instruction                |         |                           |                             | IF                   | ID                             | EX                 | MEM                           | WB |
| Here we have assumed the issue pipeline. | trictly | ne fiv<br>neces<br>egiste | e-stag<br>ssary,<br>r write | it does<br>es at the | ture as<br>have se<br>e end of | ome adv<br>the pip | r the sir<br>vantage<br>eline | s. |



|       | Fig                      | g. 6.46:                  |             |         |
|-------|--------------------------|---------------------------|-------------|---------|
| AL    | LU or branch instruction | Data transfer instruction | Clock cycle |         |
| Loop: |                          | 1w \$t0, 0(\$s1)          | 1           |         |
| addi  | \$s1,\$s1,-4             |                           | 2           |         |
| addu  | 1 \$t0,\$t0,\$s2         |                           | 3           |         |
| bne   | \$s1,\$zero,Loop         | sw \$t0, 4(\$s1)          | 4           |         |
|       |                          |                           |             |         |
|       |                          |                           |             | 106/119 |

| Loop:                                                                   |                                             |                                                                                    | Data                | transfer instruction                      | Clock cycle             |  |
|-------------------------------------------------------------------------|---------------------------------------------|------------------------------------------------------------------------------------|---------------------|-------------------------------------------|-------------------------|--|
|                                                                         | addi                                        | \$s1,\$s1,-16                                                                      | 1w                  | \$t0, 0(\$s1)                             | 1                       |  |
|                                                                         |                                             |                                                                                    | 1w                  | \$t1,12(\$s1)                             | 2                       |  |
|                                                                         | addu                                        | \$t0,\$t0,\$s2                                                                     | 1w                  | \$t2, 8(\$s1)                             | 3                       |  |
|                                                                         | addu                                        | \$t1,\$t1,\$s2                                                                     | 1w                  | \$t3, 4(\$s1)                             | 4                       |  |
|                                                                         | addu                                        | \$t2,\$t2,\$s2                                                                     | SW                  | \$t0, 16(\$s1)                            | 5                       |  |
|                                                                         | addu                                        | \$t3,\$t3,\$s2                                                                     | SW                  | \$t1,12(\$s1)                             | 6                       |  |
|                                                                         |                                             |                                                                                    | sw                  | \$t2, 8(\$s1)                             | 7                       |  |
|                                                                         | bne                                         | \$s1,\$zero,Loop                                                                   | SW                  | \$t3, 4(\$s1)                             | 8                       |  |
| <ul> <li>The e</li> <li>Since</li> <li>addre</li> <li>4, mir</li> </ul> | empty s<br>the firs<br>esses lo<br>nus 8, a | lots are nops.<br>st instruction in the l<br>paded are the origin<br>and minus 12. | oop dec<br>al value | rements \$s1 by 1<br>of \$s1, then that a | 6, the<br>address minus |  |



| Processor                           | Maximum instr.<br>issues / clock                                    | Functional units                                                           | Maximum<br>ops. per clock                                 | Maximum<br>clock rate                 | Transistors<br>(millions)                    | Power<br>(watts)                 | SPEC<br>int2000           | SPEC<br>fp2000 |
|-------------------------------------|---------------------------------------------------------------------|----------------------------------------------------------------------------|-----------------------------------------------------------|---------------------------------------|----------------------------------------------|----------------------------------|---------------------------|----------------|
| Itanium                             | 6                                                                   | 4 integer/media<br>2 memory<br>3 branch<br>2 FP                            | 9                                                         | 0.8 GHz                               | 25                                           | 130                              | 379                       | 701            |
| Itanium 2                           | 6                                                                   | 6 integer/media<br>4 memory<br>3 branch                                    | 11                                                        | 1.5 Ghz                               | 221                                          | 130                              | 810                       | 1427           |
| A summ                              | ary of the ch                                                       | 2₽<br>aracteristics                                                        | of the Itaniu                                             | m and Ita                             | nium 2, Int                                  | el's firs                        | t two                     |                |
| A summ<br>impleme<br>➢ In a<br>an c | ary of the ch<br>entations of t<br>addition to hig<br>on-chip level | 2 ₱<br>haracteristics<br>he IA-64archi<br>gher clock rati<br>3 cache, vers | of the Itaniu<br>tecture.<br>es and more<br>sus an off-ch | m and Ita<br>function<br>hip level \$ | nium 2, Int<br>al units, the<br>3 cache in t | el's firs<br>e Itaniu<br>he Itan | it two<br>m 2 inc<br>ium. | ludes          |
| A summ<br>impleme<br>≻ In a<br>an o | ary of the ch<br>entations of t<br>addition to hig<br>on-chip level | 2 ₱<br>haracteristics<br>he IA-64archi<br>gher clock rate<br>3 cache, vers | of the Itaniu<br>tecture.<br>es and more<br>sus an off-cl | m and Ita                             | nium 2, Int<br>al units, th<br>3 cache in t  | el's firs<br>e Itaniu<br>he Itan | t two<br>m 2 inc<br>ium.  | ludes          |





















119/119