|
Message
From: Bill Cox<bill@v...>
Date: Tue Sep 7 13:16:31 CEST 2004
Subject: [oc] Winning with a reconfigurable computer
Hi, Marko.Hey, a real expert in the field! I was hoping there would be one or two of you out there to keep the discussion honest.
On Tue, 2004-09-07 at 05:17, Marko Mlinar wrote: > Bill, > > quite a discussion you have out here ;) > > I have spend quite some time on this subject and I am sure it will be one of > the hottest research areas in the following decade.
Almost makes me wish I were a grad student, rather than a EDA tool developer.
> First let me say I wrote several compilers from ANSI C to Verilog RTL, one of > the compilers (the first one) is available also on Opencores.
I'll check it out. Do you have a link to it? I missed it when I checked the links from the OpenCores site.
> The biggest problems as you already noted is of course memory bandwidth. More > specifically the problem is to deliver the data on time -- more specifically > the biggest problem here is latency. > Therefore search for *only* bigger bandwidth RAM will not help.
Agreed.
> Although it may seem at the first glance that C->RTL conversion is > straightforward and all it requires is a lot of work, but this is not true. > I have not been able to prove that solving this problem in finite time would > require a machine computationally better than Turing, but that seems very > likely, since the problem becomes very similar to code optimization.
Yes, it's obviously an NP hard (or worse) problem to solve optimally. I shouldn't have said it's not hard. In particular, doing it well is very hard. However, it's doable, as you've proved.
> However the problem can be non-optimally solved with transformations which > maintain code equality like current compilers do. This is of course very hard > with all the pointers in the C-code like you already mentioned. With simple > transformations you don't get enough optimizations to compete with high > frequency sequential CPUs. > > Java and C# therefore seem much more suitable for conversion than full ANSI C > compatibility. Or saying that C pointers cannot point to just any area, but > just to specific arrays.
I agree that a sub-set of Java or C# would be a better starting point. Both languages are less ambiguous, and both deemphasize pointers.
At work, we work in a subset of C, and we already use integers as handles to objects. Data already is stored in arrays of properties. We don't allow programmers to use pointers, except in very specific cases (like returning multiple values).
Even simpler than compiling Java or C# would be to pick a subset that's expressive enough to allow good algorithms to be written. We've already done this at work. No '.', '->', or '*' operators are allowed. Everything is either a variable or an object, and objects are accessed through their handles. It should be relatively easy to analyze our code, and our data structures are well defined in a text based schema file, which is used to generate all our object support code. We do allow new property arrays to be allocated to extend the fields of a class. We also automate generation of recursive destructors, eliminating most of the dangling pointer problems. So, we have little use for garbage collection, and our object support code can get dumped into the C->Verilog translator just like algorithmic code.
> One of the biggest finding I had is that every loop that can be optimized > (containing memory accesses) can be duplicated, where from the first loop you > can calculate memory addresses and in the second you collect data. In the > worst case your second loop is empty and you have no optimizations, but in > most cases you can generate the addresses needed to cope with long RAM access > latencies.
This is a cool idea. I'd like to examine how it translates into hardware more carefully.
> The next important finding is that it is quite impossible for a machine to > change the architecture (usually needs to change whole algorithm) to be more > suitable for HW and to save area.
I'm not surprised. I have found that taking advantage of parallel processing generally requires changes to the algorithms at a high level.
I'm not thinking of generating hardware that's many times faster than a traditional CPU. I'm just thinking that more than 2x is doable, primarily through good memory management across multiple data ports.
I also think the same tricks could be used to speed up a more traditional CPU. The main thing is to hammer down the memory bottleneck through several independent memory buses.
> best regards, > Marko
Thanks for the well informed thoughts on the topic.
Bill
|
 |