Matthias Kaiserswerth
IEEE/ACM Transactions on Networking
BRAMs, which are specialized memory structures distributed throughout the FPGA fabric in columns, are of particular importance. Each BRAM can hold up to 36Kbits of data. BRAMs can be used in various form factors and can be cascaded to form a larger logical memory structure. Because of the distributed organization of BRAMs, they can provide terabytes of bandwidth for memory bandwidth-intensive applications. The contrast in performance between processors and FPGAs lies in the architecture itself. Processors rely on the Von Neumann paradigm where an application is compiled and stored in instruction and data memory. They typically work on an instruction and data fetch-decode-execute- store pipeline. This means both instructions and data have to be fetched from an external memory into the processor pipeline. Although caches are used to alleviate the cost of expensive fetch operations from external memory, each cache miss incurs a severe penalty. The bandwidth between processor and memory is often the critical factor in determining the overall performance.
Matthias Kaiserswerth
IEEE/ACM Transactions on Networking
Ohad Shamir, Sivan Sabato, et al.
Theoretical Computer Science
Maurice Hanan, Peter K. Wolff, et al.
DAC 1976
Zohar Feldman, Avishai Mandelbaum
WSC 2010