Conference paper

Asynchronous transpose-matrix architectures


The matrix transposition operation is a necessary step in several image/video compression and decompression algorithms, in particular the discrete cosine transform (DCT) and its inverse (IDCT), and some distributed arithmetic applications. These algorithms have to be performed at high data-rates, and with a minimum of power dissipation for portable applications. In this paper we describe how the clocked solution is usually implemented, and we present two new asynchronous architectures that perform matrix transposition. These architectures, one based on two phase signaling, one based on four phase signaling, have better characteristics than the clocked solution in terms of latency and power, at no cost in area or throughput. We discuss the characteristics of these three architectures and evaluate the relative advantages of each one.
