|
Message
From: Steven R. McQueen<srmcqueen@m...>
Date: Fri May 14 07:51:45 CEST 2004
Subject: [oc] Question on Modular Multipliers
I am the author of the Basic RSA project on Open Cores. It does what I originally intended it to do, but frankly, its performance is embarrassing. I have been trying to wade through papers on high radix modular multipliers, including Montgomery, and something does not seem right to me. If I understand correctly, the speed advantages in these systems come from pre-calculating the possible products (i.e. 1x1, 1x2,...4x3, 4x4 for radix 4) and adding those, instead of the traditional shift and add multiplication.
Unfortunately, it seems to me that this can only be effective for a non-modular multiplier, since massive adders and subtractors are still required to be chained together to get the correct modular result. What am I missing here?
Also, I have seen some papers on partial adders that claim to accelerate large multipliers (>512 bits) considerably. I can see how clock rates can be accelerated dramatically, but I cannot see how the throughput is improved. It seems to me that carries must still be propagated which would cancel most of the benefit of the faster clock.
I have developed a partitioned adder that doubles the speed of 1024-bit add operations at the cost of 4 times the circuitry, but this does not seem cost-effective to me, and it does not scale well. If anyone can point me to some nice, clear descriptions of high performance large number adders or multipliers, preferably with lots of examples, or can shed some light on the subject personally, I would greatly appreciate it.
Thank you!
Steve
-- Steven R. McQueen <srmcqueen@m...> McQueen Technologies, Inc.
|
 |