|
Message
From: Dysthymicdolt@a...<Dysthymicdolt@a...>
Date: Thu Sep 2 00:26:45 CEST 2004
Subject: [openrisc] Write-back Stack Cache?
Has a write-back stack cache been considered?In addition to reducing data cache conflict misses, this might significantly reduce the amount of write memory traffic (even without fancy optimizations like cleaning blocks in dead stack frames). I suspect most stack writes are 32-bit data, so even a simplistic ECC implementation (one which translates smaller writes into read, merge, write operations with perhaps a cycle stall [one could decouple the stack cache to allow other non-stack operations to procede, but that might not be worth the extra complexity for a rare(?) case]) might not be a performance drain. WRT indirect stack references, since these are apparently relatively rare, several methods can be used:
1) Clarify/redefine the architecture to prohibit stack references from registers other than R1 and R2 and either
a) forcing any potential stack reference to use R2--yuck! [The ABI does not seem to indicate that interrupt handlers must be able to rely on R2 being the frame pointer.]
b) using a separate stack for objects that would be referenced indirectly.
2) Provide hardware-based coherence between the caches and perform normal operations (except that writes that update the stack area could avoid the write buffer for a write-through main data cache).
3) Provide a retry mechanism for indirect stack references. This would be like a pseudoassociative cache. To improve performance a hint bit could be associated with the other registers, allowing the processor to choose the stack cache first; a load to a register would clear its hint bit, an ALU operation would OR the hint bits of its source registers to generate a new hint bit (if either source register was used to reference the stack, the result is likely to be a stack reference), and a memory access would set the hint bit appropriately ('Stack-cachability' could be determined by PTE information [as some ARMs do for auxiliary cache], by a single-ended range check against R1 [addresses greater than SP-2092 {red-zone} are defined as stack], by TLB snooping [using a separate stack TLB, if the main data TLB misses and the stack TLB hits, the hint bit is set; if the stack TLB misses and the data TLB hits and the base is not R1 or R2, then the hint bit is cleared; if both miss or only the stack TLB misses and the base was R1 or R2, then the determination is more complex {if exclusion is guaranteed, the latter would indicate an error in the TLB fill; for R1- and R2-based references, the stack TLB should be used; with other base registers a range check or other method might be used--this can be somewhat slow because it is a TLB miss}], or by some other method.).
A stack cache could have sectored cache blocks to allow write-allocate with fine resolution and could use smaller tags (one might opt to use indirect tagging, i.e., the physical page number is replaced with the TLB number in the tag; this would add complexity, but even with only eight stack TLB entries flushing should be rare for a 4KiB stack cache; or one might restrict the stack cache to addresses within a certain range forcing occasional range resetting and cache revalidation). (The stack TLB might also have some virtual address bits hardwired, i.e., forcing a process' stack to be placed within a certain range of virtual addresses.)
(Other auxiliary cache structures might be beneficial. E.g., the memory region accessible with an R0 base could be reserved as a global data area [possibly read-only?] that could be read early [the immediate is the address] and possibly in parallel with other operations [limited by physical resources et al.]. One could also use hint bits associated with registers for way prediction; if the default way is determined by the register number, this might provide a means to partition caching while maintaining fast accesses.)
Paul A. Clayton just a technophile -------------- next part -------------- An HTML attachment was scrubbed... URL: attachment.htm
|
 |