|
Message
From: Heiko Panther<heiko.panther@w...>
Date: Mon Feb 16 23:17:38 CET 2004
Subject: [openrisc] or1200 execution units
I'm currently reviewing the gcc target backend. I'm looking at the function unit definitions. I'm trying to work out how they correspond to the hardware, and I want to make sure I understand the hardware right. Here's how I understand the or1200 implementation, please correct and enhance.
There are these independent units (optional ones in[]) : - ALU (shift, add, compare, move, logic, extend [, mul, mac]) - LSU (load, store) - SYSTEM (mfspr, mtspr)
One instruction per cycle is decoded. All units are able to accept one instruction per cycle.
All SYSTEM and ALU instructions except for "mul" have their results ready after one cycle.
"mul" and "mac" instructions have their results ready after three cycles.
"load" instructions are complete after 2 cycles on a cache hit. ???: On a miss (or no cache), the load will complete after (Mem access time + 1) cycles.
"store" instructions are complete after 1 cycle on hit, or (Mem access time) cycles on a miss.
Things I'm not clear on: - How does the MMU affect LSU execution time? - Can the LSU accept another load/store while a load/store is currently in execution? (Would a store instruction have to wait, like, 10 cycles when a load instruction missed?)
Based on these assumptions, this is the gcc function unit definition I'm proposing:
(define_function_unit "alu" 1 0 (eq_attr "type" "shift,add,logic,extend,move,compare") 1 1) (define_function_unit "alu" 1 0 (eq_attr "type" "mul") 1 3) (define_function_unit "lsu" 1 0 (eq_attr "type" "load") 2 2) (define_function_unit "lsu" 1 0 (eq_attr "type" "store") 1 1)
I'm also pondering how to implement an option to freely specify LSU latencies from the gcc command line for cache-less implementations.
Heiko
|
 |