|
Message
From: chunyang at ce.et.tudelft.nl<chunyang@c...>
Date: Fri Sep 7 20:36:05 CEST 2007
Subject: [oc] Why open processors are so much slower than commercial ones?
Agree with "In order to get the best performance, you have to optimize all part of the system".
Actually not every aspects of a processor design(note RTL coding is only a very small portion of the front-end design) is that interesting or easy to accomplish by a small amount of people. Therefore, not many people tried to do them except for the industry.
Chunyang Gou
----- Original Message ----- From: caodayong at uestc.edu.cn<caodayong@u...> To: Date: Fri Sep 7 14:02:00 CEST 2007 Subject: [oc] Why open processors are so much slower than commercial ones?
> ----- Original Message ----- > From: goran.bilski at x...<goran.bilski at x...> > To: > Date: Tue Aug 10 16:06:43 CEST 2004 > Subject: [oc] Why open processors are so much slower than > commercial ones? > > Hi, > > > > Interesting thread. > > > > As the designer of MicroBlaze, I can provide a little more > details. > > > > In order to get the best performance, you have to optimize all > part > > of > > the system. > > 1. The Instruction Set > > The instruction set has to be optimized for an FPGA design. > > ex. > > For the logical instructions, I use a LUT for each bit of > result. > > With a 4-input LUT, I need 2 inputs for the two operands and > that > > gives > > me 2 inputs for the type of logical instruction. > > With 2 inputs, I can do 4 different logical instructions. > > MicroBlaze have just 4 logical instructions, no more, no less. > > Just doing 3 instructions, won't save any area at all, > > Doing 5 logical instructions would cost twice the amount of > area. > > The actual opcode values is choosen to minimize the control > logic. > > ex. I have a result mux which selects the source for the new > value > > to > > the register file. Bit 0-1 is the actual selector for that > mux, > > which > > means 0 LUTs for that control logic. Most of the layout of the > > opcodes > > has been done for this purpose. > > Bit 4-5 determine the operation of the ALU block, etc... > > 2. The actual datapath implementation has to match with the > FPGA. > > For an ASIC design, the area cost is very different for an ALU > than > > a > > MUX but for an FPGA the area is actually the same. > > A processor design has a lot of muxes and they needs to be > minimzed > > since they cost in area and performance. Extreme pipelines > will run > > very > > slow on a FPGA design due to all muxes for resolving the > pipeline > > hazardous. > > 3. The exact HDL coding of the processor has to be optimized. > > I have done a lot of FPGA design tricks to get optimized > > performance > > and area. (The Xilinx carry-chain is extremly powerful and > normally > > under used). > > The source code for MicroBlaze exists but not as open-source. > > It can be purchased from Xilinx. > > A FPGA specific version exists and also a pure RTL version. > > The pure RTL version can be targetting to ASIC and when > implemented > > on a FPGA, the synthesis tools and PAR is quite good on > optimized > > the > > design. The results is within 10% of area and performance > compared > > to > > the FPGA specific version. > > I have only handcoded and handplaced logic that is within the > > critical > > paths, very little has been needed for the area since I tend > to > > write RTL > > code that is well suited for an FPGA. > > The best design tools is still the paper and the pen. > > Every design tricks starts with a datasheet of the CLB and
> blank
> > piece
> > of paper.
> > Göran Bilski
> >
> >
>
>
|
 |