LOGIN   :::   RECOVER PASS   :::   GET ACCOUNT    
Browse
  • Projects
  • Code (CVS)
  • Forums
  • News
  • Articles
  • Polls
  •  
    OpenCores
  • FAQ
  • CVS HowTo
  • Mission
  • Media
  • Tools
  • Sponsors
  • Mirrors
  • Logos
  • Contact us
  •  
    Tools
  • Search
      
  • Download Cores (CVSGet)
  •  
    More
  • Wishbone
  • Perlilog
  • EDA tools
  • OpenTech CD
  •  
    Navigation: All forums > Cores > Message List > Message Post

    Message

    Reply | Reply all
    Date Prev | Date Next | Thread Prev | Thread Next Date Index | Thread Index

    From: Bill Cox<bill@v...>
    Date: Tue Sep 7 10:31:56 CEST 2004
    Subject: [oc] Winning with a reconfigurable computer
    Top
    On Tue, 2004-09-07 at 02:17, Richard Herveille wrote:
    > Let's see if I get it right. Main issue here is getting random data fast
    > enough into multiple execution units, or one big execution unit that works
    > on a lot of data (what's the difference?), right?
    Right.

    The concept of an execution unit might not be quite right. Basically,
    I'm talking about synthesizing the critical inner loops of an algorithm
    directly to hardware, and executing a whole loop per clock cycle.

    So, the execution unit in this case is custom for a specific software
    loop. This custom execution unit often needs several fields from
    different classes, all in parallel. With several memory ports, we could
    provide that data just as the loop's execution unit needs it.

    > How about using QDR-II, or DDR-II for that matter, memories? These are true
    > SRAMs, produced in a 90nm technology. The QDR-II has separate read and write
    > ports, uses a 250MHz double pumped (DDR) clock per port. This provides an
    > astonishingly:
    > - 1G read/write operations per sec
    > - 18Gbit/sec bandwidth per port (36bit databus)
    > - 2GByte/sec bandwidth per port (32bit databus)
    >
    > This all without any latency! Pipelined yes, latency no.
    > Now these devices are available in 72Mbit (2M x 36bits) densities. Not
    > directly enough for main memory, but it would make a hell of a cache to
    > execute your while loops from.
    > Best of all, both Xilinx and Altera claim they can handle these memories
    > with their FPGAs.
    Ok, that's cool. While these memories seem on the small side, they're
    large enough to do practical work on real EDA problems.

    For example, let's look at the placement problem. A big design might
    have 1 million placeable instances. Each QDR-II memory could handle a
    32-bit field for any class that had at most 2M instances.

    Memory for instance ports would be a problem. We'd need memories with
    several times the size of the instance memory to hold even a single
    field for ports on instances.

    So, I'd guess that I'd need several memories of the 2Mx36 bit variety,
    but also at least a couple that were several times deeper. We could use
    the 2M by 36 bit memories as cache for the bigger arrays, and store most
    data in a single large external memory.

    > Then there's another memory variant called QuadPorts. These memories have 4
    > independent ports. All running at 133MHz. This allows you to access the same
    > shared memory from 4 units independantly.
    > Unfortunately these memories are pretty small; 1Mbit max. But still, this
    > might be a candidate for your cache.

    These don't appeal to me much. I'd rather go with true parallel
    memories.

    Bill



     
    Copyright (c) 1999 OPENCORES.ORG. All rights reserved.