LOGIN   :::   RECOVER PASS   :::   GET ACCOUNT    
Browse
  • Projects
  • Code (CVS)
  • Forums
  • News
  • Articles
  • Polls
  •  
    OpenCores
  • FAQ
  • CVS HowTo
  • Mission
  • Media
  • Tools
  • Advertise
  • Mirrors
  • Logos
  • Contact us
  • Find Resources
  • Job Opportunity
  •  
    Tools
  • Search
      
  • Download Cores (CVSGet)
  •  
    More
  • Wishbone
  • Perlilog
  • EDA tools
  • OpenTech CD
  •  
    Navigation: All forums > Openrisc > Message List > Message Post

    Message

    Reply | Reply all
    Date Prev | Date Next | Thread Prev | Thread Next Date Index | Thread Index

    From: Dysthymicdolt@a...<Dysthymicdolt@a...>
    Date: Thu Sep 2 00:26:45 CEST 2004
    Subject: [openrisc] Write-back Stack Cache?
    Top
    Has a write-back stack cache been considered?

    In addition to reducing data cache conflict misses, this might
    significantly reduce the amount of write memory traffic (even without
    fancy optimizations like cleaning blocks in dead stack frames). I
    suspect most stack writes are 32-bit data, so even a simplistic ECC
    implementation (one which translates smaller writes into read, merge,
    write operations with perhaps a cycle stall [one could decouple the
    stack cache to allow other non-stack operations to procede, but that
    might not be worth the extra complexity for a rare(?) case]) might not
    be a performance drain. WRT indirect stack references, since these
    are apparently relatively rare, several methods can be used:

    1) Clarify/redefine the architecture to prohibit stack references
    from registers other than R1 and R2 and either

    a) forcing any potential stack reference to use R2--yuck! [The
    ABI does not seem to indicate that interrupt handlers must be
    able to rely on R2 being the frame pointer.]

    b) using a separate stack for objects that would be referenced
    indirectly.

    2) Provide hardware-based coherence between the caches and perform
    normal operations (except that writes that update the stack area
    could avoid the write buffer for a write-through main data cache).

    3) Provide a retry mechanism for indirect stack references. This
    would be like a pseudoassociative cache. To improve performance a
    hint bit could be associated with the other registers, allowing
    the processor to choose the stack cache first; a load to a
    register would clear its hint bit, an ALU operation would OR the
    hint bits of its source registers to generate a new hint bit (if
    either source register was used to reference the stack, the
    result is likely to be a stack reference), and a memory access
    would set the hint bit appropriately ('Stack-cachability' could be
    determined by PTE information [as some ARMs do for auxiliary
    cache], by a single-ended range check against R1 [addresses
    greater than SP-2092 {red-zone} are defined as stack], by TLB
    snooping [using a separate stack TLB, if the main data TLB misses
    and the stack TLB hits, the hint bit is set; if the stack TLB misses
    and the data TLB hits and the base is not R1 or R2, then the hint
    bit is cleared; if both miss or only the stack TLB misses and the
    base was R1 or R2, then the determination is more complex {if
    exclusion is guaranteed, the latter would indicate an error in the
    TLB fill; for R1- and R2-based references, the stack TLB should be
    used; with other base registers a range check or other method might
    be used--this can be somewhat slow because it is a TLB miss}], or by
    some other method.).

    A stack cache could have sectored cache blocks to allow write-allocate
    with fine resolution and could use smaller tags (one might opt to use
    indirect tagging, i.e., the physical page number is replaced with the
    TLB number in the tag; this would add complexity, but even with only
    eight stack TLB entries flushing should be rare for a 4KiB stack cache;
    or one might restrict the stack cache to addresses within a certain
    range forcing occasional range resetting and cache revalidation). (The
    stack TLB might also have some virtual address bits hardwired, i.e.,
    forcing a process' stack to be placed within a certain range of virtual
    addresses.)

    (Other auxiliary cache structures might be beneficial. E.g., the memory
    region accessible with an R0 base could be reserved as a global data
    area [possibly read-only?] that could be read early [the immediate is
    the address] and possibly in parallel with other operations [limited by
    physical resources et al.]. One could also use hint bits associated
    with registers for way prediction; if the default way is determined by
    the register number, this might provide a means to partition caching
    while maintaining fast accesses.)

    Paul A. Clayton
    just a technophile
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: attachment.htm

     
    Copyright (c) 1999 OPENCORES.ORG. All rights reserved.