|
Message
From: Rudolf Usselmann<rudi@a...>
Date: Tue Sep 7 13:46:07 CEST 2004
Subject: [oc] Winning with a reconfigurable computer
On Tue, 2004-09-07 at 16:17, Marko Mlinar wrote: > Bill, > > quite a discussion you have out here ;) > > I have spend quite some time on this subject and I am sure it will be one of > the hottest research areas in the following decade. > > First let me say I wrote several compilers from ANSI C to Verilog RTL, one of > the compilers (the first one) is available also on Opencores. > > The biggest problems as you already noted is of course memory bandwidth. More > specifically the problem is to deliver the data on time -- more specifically > the biggest problem here is latency. > Therefore search for *only* bigger bandwidth RAM will not help. > > Although it may seem at the first glance that C->RTL conversion is > straightforward and all it requires is a lot of work, but this is not true. > I have not been able to prove that solving this problem in finite time would > require a machine computationally better than Turing, but that seems very > likely, since the problem becomes very similar to code optimization. > > However the problem can be non-optimally solved with transformations which > maintain code equality like current compilers do. This is of course very hard > with all the pointers in the C-code like you already mentioned. With simple > transformations you don't get enough optimizations to compete with high > frequency sequential CPUs. > > Java and C# therefore seem much more suitable for conversion than full ANSI C > compatibility. Or saying that C pointers cannot point to just any area, but > just to specific arrays. > > One of the biggest finding I had is that every loop that can be optimized > (containing memory accesses) can be duplicated, where from the first loop you > can calculate memory addresses and in the second you collect data. In the > worst case your second loop is empty and you have no optimizations, but in > most cases you can generate the addresses needed to cope with long RAM access > latencies. > > The next important finding is that it is quite impossible for a machine to > change the architecture (usually needs to change whole algorithm) to be more > suitable for HW and to save area. > > best regards, > Marko
Very interesting and kind of what I would have expected. If that was an easy/possible solution we would have seen this in production long time ago.
One question does come to mind here: Why do people always try to take such a high level language like C as a starting point ? As you Marko, point out, it's impossible to solve some of the higher level abstracts. Why not take assembly language and try parallelize it and map that in to Hardware ?
rudi ============================================================= Rudolf Usselmann, ASICS World Services, http://www.asics.ws Your Partner for IP Cores, Design, Verification and Synthesis
|
 |