|
Message
From: Nicolas Boulay<nicolas.boulay@g...>
Date: Fri Sep 7 14:33:47 CEST 2007
Subject: [oc] Why open processors are so much slower than commercial ones?
A high frequency processor is not necessary a fast processor. A 8 bits cpu could be fast, but a 32 bits one will not be so fast.
2007/9/7, caodayong@u... <caodayong@u...>: > > > ----- Original Message ----- > From: goran.bilski@x...<goran.bilski@x...> > To: > Date: Tue Aug 10 16:06:43 CEST 2004 > Subject: [oc] Why open processors are so much slower than > commercial ones? > > > Hi, > > > > Interesting thread. > > > > As the designer of MicroBlaze, I can provide a little more details. > > > > In order to get the best performance, you have to optimize all part > > of > > the system. > > 1. The Instruction Set > > The instruction set has to be optimized for an FPGA design. > > ex. > > For the logical instructions, I use a LUT for each bit of result. > > With a 4-input LUT, I need 2 inputs for the two operands and that > > gives > > me 2 inputs for the type of logical instruction. > > With 2 inputs, I can do 4 different logical instructions. > > MicroBlaze have just 4 logical instructions, no more, no less. > > Just doing 3 instructions, won't save any area at all, > > Doing 5 logical instructions would cost twice the amount of area. > > The actual opcode values is choosen to minimize the control logic. > > ex. I have a result mux which selects the source for the new value > > to > > the register file. Bit 0-1 is the actual selector for that mux, > > which > > means 0 LUTs for that control logic. Most of the layout of the > > opcodes > > has been done for this purpose. > > Bit 4-5 determine the operation of the ALU block, etc... > > 2. The actual datapath implementation has to match with the FPGA. > > For an ASIC design, the area cost is very different for an ALU than > > a > > MUX but for an FPGA the area is actually the same. > > A processor design has a lot of muxes and they needs to be minimzed > > since they cost in area and performance. Extreme pipelines will run > > very > > slow on a FPGA design due to all muxes for resolving the pipeline > > hazardous. > > 3. The exact HDL coding of the processor has to be optimized. > > I have done a lot of FPGA design tricks to get optimized > > performance > > and area. (The Xilinx carry-chain is extremly powerful and normally > > under used). > > The source code for MicroBlaze exists but not as open-source. > > It can be purchased from Xilinx. > > A FPGA specific version exists and also a pure RTL version. > > The pure RTL version can be targetting to ASIC and when > implemented > > on a FPGA, the synthesis tools and PAR is quite good on optimized > > the > > design. The results is within 10% of area and performance compared > > to > > the FPGA specific version. > > I have only handcoded and handplaced logic that is within the > > critical > > paths, very little has been needed for the area since I tend to > > write RTL > > code that is well suited for an FPGA. > > The best design tools is still the paper and the pen. > > Every design tricks starts with a datasheet of the CLB and blank > > piece > > of paper. > > Göran Bilski > > > > > _______________________________________________ > http://www.opencores.org/mailman/listinfo/cores >
|
 |