|
Message
From: Bill Cox<bill@v...>
Date: Tue Sep 7 10:31:56 CEST 2004
Subject: [oc] Winning with a reconfigurable computer
On Tue, 2004-09-07 at 02:17, Richard Herveille wrote: > Let's see if I get it right. Main issue here is getting random data fast > enough into multiple execution units, or one big execution unit that works > on a lot of data (what's the difference?), right? Right.
The concept of an execution unit might not be quite right. Basically, I'm talking about synthesizing the critical inner loops of an algorithm directly to hardware, and executing a whole loop per clock cycle.
So, the execution unit in this case is custom for a specific software loop. This custom execution unit often needs several fields from different classes, all in parallel. With several memory ports, we could provide that data just as the loop's execution unit needs it.
> How about using QDR-II, or DDR-II for that matter, memories? These are true > SRAMs, produced in a 90nm technology. The QDR-II has separate read and write > ports, uses a 250MHz double pumped (DDR) clock per port. This provides an > astonishingly: > - 1G read/write operations per sec > - 18Gbit/sec bandwidth per port (36bit databus) > - 2GByte/sec bandwidth per port (32bit databus) > > This all without any latency! Pipelined yes, latency no. > Now these devices are available in 72Mbit (2M x 36bits) densities. Not > directly enough for main memory, but it would make a hell of a cache to > execute your while loops from. > Best of all, both Xilinx and Altera claim they can handle these memories > with their FPGAs. Ok, that's cool. While these memories seem on the small side, they're large enough to do practical work on real EDA problems.
For example, let's look at the placement problem. A big design might have 1 million placeable instances. Each QDR-II memory could handle a 32-bit field for any class that had at most 2M instances.
Memory for instance ports would be a problem. We'd need memories with several times the size of the instance memory to hold even a single field for ports on instances.
So, I'd guess that I'd need several memories of the 2Mx36 bit variety, but also at least a couple that were several times deeper. We could use the 2M by 36 bit memories as cache for the bigger arrays, and store most data in a single large external memory.
> Then there's another memory variant called QuadPorts. These memories have 4 > independent ports. All running at 133MHz. This allows you to access the same > shared memory from 4 units independantly. > Unfortunately these memories are pretty small; 1Mbit max. But still, this > might be a candidate for your cache.
These don't appeal to me much. I'd rather go with true parallel memories.
Bill
|
 |