|
H.264/AVC Baseline Decoder: Overview
Overview
Nova is a low-power realtime H.264/AVC baseline decoder of QCIF resolution, targeting mobile applications. It is a dedicated, full hardwired and self-contained ASIC design without utilizing any GPP/DSP cores. It has been successfully verified on Xilinx Virtex-4 FPGA and 0.18um ASIC chip. The measured power consumption is 293uW@1V for 30fps QCIF decoding. From April 30 2008, Ke Xu has the Copyright for nova. If you have interests of continuing to develop this core or implementing in commercial product, please drop me an email (eexuke@y...) for discussion.
Features
1.RTL coded in Verilog-HDL.
2.Support real-time H.264/AVC baseline decoding of QCIF resolution. Can be extended to higher resolutions via minor modifications.
3.Extensively pipelining & parallelism are utilized to improve the performance and reduce power.
4.Hybrid and self-adaptive pipeline architecture to avoid unnecessary stall cycles and to improve performance:
-Self-adaptive pipeline for both intra and inter prediction.
-4×4/16×16 hybrid pipeline.
-1×4 pixel column-level parallelism.
5.Low cost intra prediction unit:
-Self-adaptive pipeline.
-Hierarchical memory organization to reduce external memory access.
-“Seed” method for plane mode computation.
-Exploring data reuse between 1×4 columns.
-Multi-function Processing Elements for all intra prediction modes processing.
6.Optimized motion compensation (inter prediction) unit:
-Self-adaptive pipeline.
-Hierarchical memory organization to reduce external memory access.
-“Variable-block-shape” to reduce redundant memory access and improve throughput.
-On-chip reference pixel buffer to explore reference pixel reuse.
-Pipelined and parallelized luma interpolator, consisting of 9 horizontal 6-tap filters, 4 vertical 6-tap filters, and 4 bilinear filters.
-Innovative chroma interpolator utilizing smallest number of adders.
7.High performance deblocking filter:
-Innovative 5-stage pipeline architecture with data/structure hazards carefully managed.
-Single-port SRAM based, no dual/two-port SRAM required.
-204cycles/MB throughput with max. frequency of 200MHz (0.18µm process). Can deliver up to 980kMB/s throughput.
8.Manually inserted latch-based clock gating to reduce power.
9.Low-power, low-cost design:
-Requires only ~1.5MHz for QCIF 30fps real time decoding.
-Only 169k logic gates.
-Measured power consumption as low as 293µW@1V in 0.18um process.
Project News
2008.05.02
Specification, test files added. 2008.04.30
Verilog source code updated. Detailed specifications, documents, and test files to be updated soon.
|