## Tsutomu Sasao

# Memory-Based Logic Synthesis 

(2) Springer

Memory-Based Logic Synthesis

Tsutomu Sasao

## Memory-Based Logic Synthesis

Tsutomu Sasao<br>Kyushu Institute of Technology<br>Department of Computer Science and Electronic<br>Iizuka, Japan<br>sasao@cse.kyutech.ac.jp

ISBN 978-1-4419-8103-5
e-ISBN 978-1-4419-8104-2
DOI 10.1007/978-1-4419-8104-2
Springer New York Dordrecht Heidelberg London
Library of Congress Control Number: 2011922264
(c) Springer Science+Business Media, LLC 2011

All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

## Preface

This book describes the realization of logic functions using memories. The proposed methods can be used to impalement designs in field programmable gate arrays (FPGAs) that contain both small-scale memories, called look-up tables (LUTs), and medium-scale memories, called embedded memories.

The basis for memory-based design is functional decomposition, which replaces a large memory with smaller ones. An LUT cascade is introduced as a new architecture for logic synthesis. This book introduces the C-measure, which specifies the complexity of Boolean functions. Functions with a suitably small C-measure can be efficiently realized by LUT cascades.

This book also shows logic design methods for index generation functions. An index generation function is a mathematical model for an address table which can be used to store internet addresses. Such a table must be updated frequently, and the operation must be performed as fast as possible. In addition, this book introduces hash-based design methods, which efficiently realize index generation functions by pairs of smaller memories. Main applications include: IP address table lookup, packet filtering, terminal access controllers, memory patch circuits, virus scan circuits, fault map of memory, and pattern matching.

This book is suitable for both FPGA system designers and CAD tool developers. To read the book, a basic knowledge of logic design and discrete mathematics is required. Each chapter contains examples and exercises. Solutions for the exercises are also provided.

## Acknowledgements

This research is supported in part by the Grants in Aid for Scientific Research of JSPS, the grants of MEXT knowledge Cluster Project, and Regional Innovation Cluster Program. Many people were involved in this project: Jon T. Butler, Masayuki Chiba, Bogdan Falkowski, Yukihiro Iguchi, Kazunari Inoue, Atsumu Iseno, Yoshifumi Kawamura, Hisashi Kajihara, Munehiro Matsuura, Alan Mishchenko, Hiroki Nakahara, Kazuyuki Nakamura, Shinobu Nagayama, Marek Perkowski, Hui Qin, Marc Riedel, Takahiro Suzuki, Akira Usui, and Yuji Yano.

Most materials in this book have been presented at various conferences: IWLS, DSD, ISMVL, ICCAD, ICCD, DAC, ASPDAC, FPL, ARC, and SASIMI, as well as journals: IEEETCAD, IEICE and IJE. In many cases, reviewers comments considerably improved the quality of the materials.

Preliminary versions of this book were used as a textbook for seminars in our group. Numerous improvements were proposed by the students of Kyushu Institute of Technology: Taisuke Fukuyama, Takuya Nakashima, Takamichi Torikura, Satoshi Yamaguchi, Takuya Eguchi, Yoji Tanaka, Yosuke Higuchi, Takahiro Yoshida, and Meiji University: Atsushi Ishida, Kensuke Kitami, Koki Shirakawa, Kengo Chihara, Shotaro Hara, Akira Yoda.

Prof. Jon T. Butler read through the entire manuscript repeatedly and made important corrections and improvements.

Dr. Alan Mishchenko's comments on the final manuscript were also appreciated.

## Contents

1 Introduction ..... 1
1.1 Motivation ..... 1
1.2 Organization of the Book ..... 1
2 Basic Elements ..... 3
2.1 Memory ..... 3
2.2 Programmable Logic Array ..... 4
2.3 Content Addressable Memory ..... 4
2.4 Field Programmable Gate Array ..... 6
2.5 Remarks ..... 8
Problems ..... 8
3 Definitions and Basic Properties ..... 11
3.1 Functions ..... 11
3.2 Logical Expression ..... 11
3.3 Functional Decomposition ..... 12
3.4 Binary Decision Diagram ..... 14
3.5 Symmetric Functions ..... 18
3.6 Technology Mapping ..... 21
3.7 The Mathematical Constant $e$ and Its Property ..... 23
3.8 Remarks ..... 23
Problems ..... 24
4 MUX-Based Synthesis ..... 25
4.1 Fundamentals of MUX ..... 25
4.2 MUX-Based Realization ..... 27
4.3 Remarks ..... 31
Problems ..... 32
5 Cascade-Based Synthesis ..... 33
5.1 Functional Decomposition and LUT Cascade ..... 33
5.2 Number of LUTs to Realize General Functions ..... 35
5.3 Number of LUTs to Realize Symmetric Functions ..... 38
5.4 Remarks ..... 40
Problems ..... 40
6 Encoding Method ..... 41
6.1 Decomposition and Equivalence Class ..... 41
6.2 Disjoint Encoding ..... 42
6.3 Nondisjoint Encoding ..... 43
6.4 Remarks ..... 53
Problems ..... 53
7 Functions with Small C-Measures ..... 55
7.1 C-Measure and BDDs ..... 55
7.2 Symmetric Functions ..... 56
7.3 Sparse Functions ..... 57
7.4 LPM Functions ..... 57
7.5 Segment Index Encoder Function ..... 60
7.6 WS Functions ..... 62
7.7 Modulo Function ..... 65
7.8 Remarks ..... 67
Problems ..... 67
8 C-Measure of Sparse Functions ..... 71
8.1 Logic Functions with Specified Weights ..... 71
8.2 Uniformly Distributed Functions ..... 76
8.3 Experimental Results ..... 77
8.3.1 Benchmark Functions ..... 77
8.3.2 Randomly Generated Functions ..... 78
8.4 Remarks ..... 79
Problems ..... 80
9 Index Generation Functions ..... 81
9.1 Index Generation Functions and Their Realizations ..... 81
9.2 Address Table ..... 81
9.3 Terminal Access Controller ..... 82
9.4 Memory Patch Circuit ..... 83
9.5 Periodic Table of the Chemical Elements ..... 84
9.6 English-Japanese Dictionary ..... 85
9.7 Properties of Index Generation Functions ..... 86
9.8 Realization Using ( $p, q$ )-Elements ..... 88
9.9 Realization of Logic Functions with Weight $k$ ..... 91
9.10 Remarks ..... 93
Problems ..... 93
10 Hash-Based Synthesis ..... 95
10.1 Hash Function ..... 95
10.2 Index Generation Unit ..... 96
10.3 Reduction by a Linear Transformation ..... 100
10.4 Hybrid Method ..... 103
10.5 Registered Vectors Realized by Main Memory ..... 106
10.6 Super Hybrid Method ..... 108
10.7 Parallel Sieve Method ..... 111
10.8 Experimental Results ..... 114
10.8.1 List of English Words ..... 114
10.8.2 Randomly Generated Functions ..... 115
10.8.3 IP Address Table ..... 115
10.9 Remarks ..... 116
Problems ..... 116
11 Reduction of the Number of Variables ..... 119
11.1 Optimization for Incompletely Specified Functions ..... 119
11.2 Definitions and Basic Properties ..... 120
11.3 Algorithm to Minimize the Number of Variables ..... 122
11.4 Analysis for Single-Output Logic Functions ..... 125
11.5 Extension to Multiple-Output Functions ..... 126
11.5.1 Number of Variables to Represent Index Generation Functions ..... 127
11.5.2 Number of Variables to Represent General Multiple-Output Functions ..... 129
11.6 Experimental Results ..... 130
11.6.1 Random Single-Output Functions ..... 130
11.6.2 Random Index Generation Functions ..... 131
11.6.3 IP Address Table ..... 131
11.6.4 Benchmark Multiple-Output Functions ..... 132
11.7 Remarks ..... 133
Problems ..... 133
12 Various Realizations ..... 137
12.1 Realization Using Registers, Gates, and An Encoder ..... 137
12.2 LUT Cascade Emulator ..... 137
12.3 Realization Using Cascade and AUX Memory ..... 139
12.4 Comparison of Various Methods ..... 143
12.5 Code Converter ..... 147
12.6 Remarks ..... 149
Problems ..... 150
13 Conclusions ..... 151
Solutions ..... 153
Bibliography ..... 179
Index ..... 187

## Chapter 1 Introduction

### 1.1 Motivation

Two of the most crucial problems in VLSI design are their high design cost and long design time. A solution to these problems is to use programmable architectures. Programmable LSIs reduce the hardware development cost and time drastically, since one LSI can be used for various applications. This book considers realizations of logic functions by programmable architectures. Various methods exist to realize multiple-output logic functions by programmable architectures. Among them, memories and programmable logic arrays (PLAs) directly realize logic functions. However, when the number of input variables $n$ is large, the necessary hardware becomes too large. Thus, field programmable gate arrays (FPGAs) are widely used. Unfortunately, FPGAs require layout and routing in addition to logic design. Thus, quick reconfiguration is not so easy.

A look-up table (LUT) cascade is a series connection of memories. It efficiently realizes various classes of logic functions. Since the architecture is simple, LUT cascades are suitable for the applications where frequent update is necessary. This book explores the design and application of LUT cascades. In addition, it shows memory-based methods to realize index generation functions, which are useful for pattern matching and communication circuits.

### 1.2 Organization of the Book

This book consists of 13 chapters. Figure 1.1 shows the relation between the chapters, where the arrows show the order to read the chapters. For example, Chaps. 6 and 7 can be read after reading Chap. 5. Chapter 2 reviews the basic elements used in this book: memory, PLA, content addressable memory (CAM), and FPGA.

Chapter 3 reviews the definitions and basic properties of logic functions. Functional decomposition is the most important theory covered.


Fig. 1.1 Relation among chapters

Chapter 4 shows multiplexer-based synthesis of logic functions. This method is universal, i.e., it can realize any logic function. However, the number of LUTs required to realize the function tends to increase exponentially with $n$, the number of input variables.

Chapter 5 shows a method to realize logic functions using LUT cascades. This method is not universal, i.e., it can be applied to only a limited class of logic functions. However, the number of LUTs to realize the function tends to increase linearly with $n$, the number of input variables. The C-measure is used to estimate the circuit size.

Chapter 6 shows a method to reduce the number of LUTs by considering the encoding of the functional decomposition.

Chapter 7 shows a class of functions whose C-measures are small. Such functions are efficiently realized by LUT cascades.

Chapter 8 considers the C-measure of the functions whose number of minterms is limited. It shows that when the number of minterms is small, the C-measure is also small.

Chapter 9 introduces index generation functions, which are useful to design IP address tables, terminal access controllers for local area network, etc.

Chapter 10 introduces a hash-based synthesis of index generation functions. The method is similar to a hash table for software. Similar to the case of software realization, collisions of data may occur. However, in hardware, we can avoid collisions by using various circuits that work concurrently.

Chapter 11 shows a reduction method for the number of variables for the hash circuit. It introduces incompletely specified functions and shows an algorithm to reduce the number of variables.

Chapter 12 shows various methods to realize index generation functions.
Chapter 13 summarizes the book.

## Chapter 2 <br> Basic Elements

This chapter reviews memory, programmable logic array (PLA), content addressable memory (CAM), and field programmable gate array (FPGA).

### 2.1 Memory

Semiconductor memories have a long history of research, and various types of memories have been developed.

A dynamic random access memory (DRAM) uses only a single transistor to store one bit. However, the peripheral circuit is rather complex, and periodical refreshing is necessary. Thus, DRAM is suitable for large-scale memory, but not for a smallscale memory.

A static random access memory (SRAM) uses six transistors to store one bit. The peripheral circuit is not so complex as in DRAM, and refreshing is unnecessary. Thus, SRAM is suitable for small-scale memory, but not for large-scale memory. Both DRAM and SRAM are volatile, i.e., if the power supply is tuned off, the data are lost.

Read only memories (ROMs), however are nonvolatile and are used for storing fixed data. For rewritable ROMs, fabrication often requires a dedicated process and such ROMs are expensive.

To reduce the amount of hardware, memories are often implemented as a twodimensional structure that is shown in Fig. 2.1. The row address decoder selects one row. And the sense amplifiers read the data for the selected row. Finally, the column address decoder selects one bit from the word. In many cases, the memory has a clock. Such a memory is synchronous. Many FPGAs contain memories, and many of them are synchronous. However, asynchronous memories (i.e., memories operating without clock pulses) are also available.


Fig. 2.1 Memory architecture

### 2.2 Programmable Logic Array

PLAs were often used for implementing controllers of microprocessors [94]. A PLA consists of the AND array and the OR array as shown in Fig. 2.2. For example, a 3-input 2-output function can be implemented as shown in Fig. 2.3. In the AND array, a cross-point denotes the AND connection, while in the OR array, a crosspoint denotes the OR connection. Both static and dynamic PLAs exist. In the dynamic PLA, a clock pulse is used to read out the data. For large-scale PLAs, dynamic realizations are often used, since static CMOS realization is too large and the NMOS realization dissipates much static power. In a rewritable PLA, each cross-point consists of a switch and a memory element [3]. Let $n$ be the number of inputs, $W$ be the number of columns, and $m$ be the number of outputs. Then, the size of a PLA is approximated by $W(2 n+m)$. The number of columns $W$ can be reduced by minimizing sum-of-products (SOPs) expressions. Thus, the logic design is relatively easy.

### 2.3 Content Addressable Memory

An ordinary memory such as that introduced in Sect. 2.1 uses a memory address to access data and return the contents of the memory. On the contrary, a CAM searches using the contents of the memory, and returns the address where the supplied data were found.


Fig. 2.2 Programmable logic array (PLA)
Fig. 2.3 An example of PLA


CAMs are often used in pattern matching [5]. Two types of CAMs exist: a binary CAM (BCAM) and a ternary CAM (TCAM). In a BCAM, each cell can take one of two states: 0 and 1. BCAMs are used for exact match. In a TCAM, each cell can take one of three states: 0,1 , and X (don't care that matches both 0 and 1 ). A TCAM can be used for implementing the longest prefix match to be explained in Chap. 7, and range match. To store one bit of information, an SRAM requires 6 transistors, while a TCAM requires 16 transistors [90]. Thus, a TCAM is more expensive than SRAM. Figure 2.4 shows an example of a CAM. In the CAM, for the input (i.e., search data) 01101, the second and the third rows match. However, the priority encoder selects the lowest address, i.e., 01 . Thus, the output of the CAM is 01 , the address of the second line. When the input is 11111 , the CAM shows that there is no match.


Fig. 2.4 CAM architecture

Fig. 2.5 Comparison of SRAM and CAM


Figure 2.5 compares an SRAM with a CAM. Note that in an SRAM, only one row is activated at a time, while in a CAM, all the rows are activated at the same time. In a CAM, all the rows are charged and discharged in every search operation. Thus, a CAM dissipates more power than an SRAM with the same number of bits.

### 2.4 Field Programmable Gate Array

Most FPGAs have an island-style architecture as shown in Fig. 2.6. In this architecture, logic elements are surrounded by interconnection elements such as switch blocks, connection blocks, and wiring. Logic elements are, in most cases, look-up tables (LUTs). A $K$-LUT is a module that realizes an arbitrary $K$-variable function. Thus, logically it is a $K$-input memory. However, unlike an ordinary

Fig. 2.6 Island-style FPGA


Fig. 2.7 LUT for FPGA
Address


| ABC | $f$ |
| :---: | :---: |
| 000 | 0 |
| 001 | 0 |
| 010 | 0 |
| 011 | 1 |
| 100 | 0 |
| 101 | 1 |
| 110 | 1 |
| 111 | 1 |


(large-scale) memory presented before, it is often implemented by a register and selector as shown Fig. 2.7. No clock pulse is necessary to read out the data; therefore, it is asynchronous. In modern FPGAs, more than $90 \%$ of the chip area is for interconnections. Thus, layout design is as important as logic design. Also, layout design often requires more CPU time than logic design.

Most FPGAs use SRAMs to store configuration data. At power-up, FPGAs are blank, and an external nonvolatile memory (EEPROM, flash RAM, MRAM, or FeRAM) is used to restore the configuration data into SRAM. FPGAs need to be configured from the external nonvolatile memory at every system powerup time. The configuration time is tens to hundreds of milliseconds. Also, the

SRAM for configuration must be kept on even when the system is not used. As programmable circuits become larger, the numbers of LUTs and embedded memories increase, and the power dissipation of SRAMs due to leakage current becomes more serious. In the past, FPGAs with 4 or 5 input LUTs were believed to be the most efficient [99, 100]; however, in the current technology, FPGAs with 6 inputs are the standard $[4,7,158]$. Thus, we have the following:

Problem: Given an $n$-variable (multiple-output) function $f$, find the minimum number of 6-LUTs needed to realize $f$, where $n \leq 20$.

Unfortunately, this is a very hard problem. So, we try to obtain an upper bound by using properties or measures of the function. A property of the function should be easy to detect, such as symmetry, and measures should be efficient to calculate. Such measures include:

- The number of variables (i.e., the support size)
- The weight of the function [133]
- The C-measure of a function [133]
- The number of products in the SOP [78]
- The number of literals in the factored expression [78]
- The number of nodes in the decision diagram [107]

Chapters 4 and 5 consider the number of $K$-LUTs to realize a given function.

### 2.5 Remarks

This chapter briefly introduced memory, CAM, and FPGA. As for FPGAs, various architectures are shown in Refs. [15, 100]. As for logic synthesis of FPGAs, see Refs. [20, 78]. In recent FPGAs, the architecture is quite complicated, and a single FPGA can implement a multiprocessor system.

Many commercial FPGAs contain memories in addition to LUTs. They are called embedded memories [4] or embedded block RAMs [156].

## Problems

2.1. Design the CAM function shown in Fig. 2.4 by using a PLA.
2.2. Explain why asynchronous memories are used for LUTs in FPGAs instead of synchronous ones.
2.3. In the past, 4 -LUTs or 5 -LUTs were mainly used in FPGAs. However, nowadays, 6-LUTs are popular in FPGAs. Discuss the reason why 6-LUTs are preferred.
2.4. Explain why dynamic CMOS is used instead of static CMOS in PLAs.
2.5. Suppose that a TCAM with $n$ inputs and W words is given. Show a method to implement the same function as the TCAM by using a PLA.
2.6. Suppose that a PLA with $n$ inputs, $m$ outputs, and $W$ products is given. Show a method to implement the same function as the PLA by using a TCAM and a memory.

## Chapter 3 <br> Definitions and Basic Properties

This chapter first introduces logical expressions and functional decomposition. Then, it introduces some properties of symmetric functions.

### 3.1 Functions

Definition 3.1.1. A mapping $F: B^{n} \rightarrow\{0,1, \ldots, k-1\}$, where $B=\{0,1\}$ is a binary-input integer valued function. A mapping $F: B^{n} \rightarrow B$, where $B=\{0,1\}$ is a logic function. If $F\left(\vec{a}_{i}\right) \neq 0(i=1,2, \ldots, k)$ for $k$ different input vectors, and $F=0$ for other $\left(2^{n}-k\right)$ input vectors, then the weight of the function is $k$.

### 3.2 Logical Expression

Definition 3.2.1. Binary variables are represented by $x_{i}(i=1,2, \ldots, n)$. A literal of a variable $x_{i}$ is either $x_{i}, \bar{x}_{i}$, or the constant 1 . An AND of literals is a product, and an OR of products is a sum-of-products (SOP) expression.
Definition 3.2.2. Let $B=\{0,1\}, X=\left(x_{1}, x_{2}, \ldots, x_{t}\right)$, and $x_{i}$ can take a value in $B$. Then, we can consider that $X$ takes its value from $P=\left\{0,1, \ldots, 2^{t}-1\right\}$. Let $S$ be a subset $(S \subseteq P)$ of $P$. Then, $X^{S}$ is a literal of $X$. When $X \in S, X^{S}=1$, and when $X \notin S, X^{S}=0$. Let $S_{i} \subseteq P_{i}(i=1,2, \ldots, n)$, then $X_{1}{ }^{S_{1}} X_{2} S_{2} \cdots X_{n} S_{n}$ is a logical product. $\bigvee_{\left(S_{1}, S_{2}, \ldots, S_{n}\right)} X_{1} S_{1} X_{2} S_{2} \cdots X_{n} S_{n}$ is a SOP expression. If $S_{i}=$ $P_{i}$, then $X_{i}{ }^{S_{i}}=1$ and the logical product is independent of $X_{i}$. In this case, the literal $X_{i}{ }^{P_{i}}$ is redundant and can be omitted. A logical product is also called a term or a product term.

Example 3.2.1. When $t=2$, we have $P=\{0,1,2,3\}$, and $X=\left(x_{1}, x_{2}\right)$ takes four values. In this case, $X^{\{1\}}, X^{\{1,2\}}$, and $X^{\{0,1,2,3\}}$ are literals. Suppose that $P_{1}=$ $P_{2}=P_{3}=\{0,1,2,3\}$. Then,

$$
\begin{aligned}
F= & X_{1}{ }^{\{0,1,2,3\}} X_{2}{ }^{\{0\}} X_{3}{ }^{\{0\}} \vee X_{1}{ }^{\{0,1\}} X_{2}{ }^{\{0,1,2,3\}} X_{3}\{1,3\} \\
& \vee X_{1}{ }^{\{0,1\}} X_{2}{ }^{\{2\}} X_{3}{ }^{\{0,1,2,3\}}
\end{aligned}
$$

is a SOP. Note that $X^{\{0,1,2,3\}}=1$. In this case, $X^{\{0,1,2,3\}}$ is redundant and can be omitted. By removing redundant literals, this expression can be simplified to

$$
F=X_{2}{ }^{\{0\}} X_{3}{ }^{\{0\}} \vee X_{1}{ }^{\{0,1\}} X_{3}{ }^{\{1,3\}} \vee X_{1}{ }^{\{0,1\}} X_{2}{ }^{\{2\}}
$$

### 3.3 Functional Decomposition

Definition 3.3.1. [8] Let $f(X)$ be a logic function, and ( $X_{1}, X_{2}$ ) be a partition of the input variables, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{k}\right)$ and $X_{2}=\left(x_{k+1}, x_{k+2}\right.$, $\ldots, x_{n}$ ). The decomposition chart for $f$ is a two-dimensional matrix with $2^{k}$ columns and $2^{n-k}$ rows, where each column and row is labeled by a unique binary code, and each element corresponds to the truth value of $f$. The function represented by a column is a column function and is dependent on $X_{2}$. Variables in $X_{1}$ are bound variables, while variables in $X_{2}$ are free variables. In the decomposition chart, the column multiplicity denoted by $\mu_{k}$ is the number of different column patterns.

Example 3.3.1. Figure 3.1 shows a decomposition chart of a 4 -variable function. $\left\{x_{1}, x_{2}\right\}$ are the bound variables, and $\left\{x_{3}, x_{4}\right\}$ are the free variables. Since all the column patterns are different and there are four of them, the column multiplicity is $\mu_{2}=4$.

Theorem 3.3.1. [25] For a given function $f$, let $X_{1}$ be the bound variables, let $X_{2}$ be the free variables, and let $\mu_{k}$ be the column multiplicity of the decomposition chart. Then, the function $f$ can be realized with the network shown in Fig. 3.2. In this case, the number of signal lines connecting blocks $H$ and $G$ is $\left\lceil\log _{2} \mu_{k}\right\rceil$.

When the number of signal lines connecting two blocks is smaller than the number of input variables in $X_{1}$, we can often reduce the total amount of memory by the realization in Fig. 3.2 [51]. When $\mu_{k}=2$, it is an Ashenhurst decomposition [8] or

Fig. 3.1 Decomposition chart of a logic function

|  |  | 0 | 0 | 1 | 1 | $x_{1}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  |  | 0 | 1 | 0 | 1 | $x_{2}$ |
| 0 | 0 | 0 | 0 | 0 | 1 |  |
| 0 | 1 | 1 | 1 | 0 | 0 |  |
| 1 | 0 | 0 | 1 | 0 | 0 |  |
| 1 | 1 | 0 | 0 | 0 | 0 |  |
| $x_{3}$ | $x_{4}$ |  |  |  |  |  |

Fig. 3.2 Realization of a logic function by decomposition

a simple disjoint decomposition. When $\mu_{k}>2$, it is a Curtis decomposition [25], also called Roth-Karp decomposition [101] or a generalized decomposition. The number of functions with Curtis decompositions is much larger than those with Ashenhurst decompositions.

A function with an Ashenhurst decomposition can be written as

$$
f\left(X_{1}, X_{2}\right)=g\left(h_{1}\left(X_{1}\right), X_{2}\right)
$$

A function with a Curtis decomposition can be written as

$$
f\left(X_{1}, X_{2}\right)=g\left(h_{1}\left(X_{1}\right), h_{2}\left(X_{1}\right), \ldots, h_{m}\left(X_{1}\right), X_{2}\right)
$$

where $m=\left\lceil\log _{2} \mu\right\rceil$.
Lemma 3.3.1. Consider a decomposition chart for $f\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}\right.$, $\left.x_{2}, \ldots, x_{k}\right)$ and $X_{2}=\left(x_{k+1}, x_{k+2}, \ldots, x_{n}\right)$. Then, the column multiplicity does not change under the permutation of variables within $X_{1}$ and $X_{2}$. Thus,

$$
\mu\left(f\left(X_{1}, X_{2}\right)\right)=\mu\left(f\left(\tilde{X}_{1}, \tilde{X}_{2}\right)\right)
$$

where $\tilde{X}_{1}=\left(x_{\pi(1)}, x_{\pi(2)}, \ldots, x_{\pi(k)}\right), \tilde{X}_{2}=\left(x_{\lambda(k+1)}, x_{\lambda(k+2)}, \ldots, x_{\lambda(n)}\right)$, and, $\pi$ and $\lambda$ denote the permutation on $\{1,2, \ldots, k\}$ and $\{k+1, k+2, \ldots, n\}$, respectively.

Lemma 3.3.2. An arbitrary function of $n$ variables can be decomposed as

1. $f\left(X_{1}, X_{2}\right)=g\left(h_{1}\left(X_{1}\right), h_{2}\left(X_{1}\right), X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{n-1}\right)$ and $X_{2}=\left(x_{n}\right)$.
2. $\left.f\left(X_{1}, X_{2}\right)=g\left(h_{1}\left(X_{1}\right), h_{2}\left(X_{1}\right), h_{3}\left(X_{1}\right), h_{4}\left(X_{1}\right), X_{2}\right)\right)$, where $X_{1}=\left(x_{1}, x_{2}, \ldots\right.$, $\left.x_{n-2}\right)$ and $X_{2}=\left(x_{n-1}, x_{n}\right)$.

Figures 3.3 and 3.4 show the circuits for the above decompositions.
A functional decomposition can be of two types: A disjoint decomposition, where the bound set and the free set are disjoint, as shown in Fig. 3.5. A nondisjoint decomposition, where the bound set and free set have at least one common element. Figure 3.6 shows the case where one variable is shared by the bound set and the free set.

Fig. 3.3 Decomposition of an arbitrary function, where $X_{2}=\left(x_{n}\right)$


Fig. 3.4 Decomposition of an arbitrary function, where $X_{2}=\left(x_{n-1}, x_{n}\right)$

Fig. 3.5 Disjoint decomposition


Fig. 3.6 Nondisjoint
decomposition


### 3.4 Binary Decision Diagram

A binary decision diagram (BDD) [12] is a graphical representation of a logic function. It often has a more compact representation than other methods. Thus, BDDs are widely used in the computer-aided design of logic networks [65].

Definition 3.4.1. A BDD is a directed acyclic graph (DAG) with two terminal nodes: the 0 -terminal node and the 1 -terminal node. Each nonterminal node is labeled by an index of an input variable of the Boolean function, and has two outgoing edges: the 0 -edge and the 1 -edge. An ordered BDD (OBDD) is a BDD such that the input variables appear in a fixed order in all the paths of the graph, and each variable appears at most once in a path.

Fig. 3.7 Node elimination


Fig. 3.8 Node sharing


Definition 3.4.2. A reduced ordered BDD (ROBDD) is obtained from an OBDD by applying the following two reduction rules:

1. Eliminate all the redundant nodes whose two edges point to the same node (Fig. 3.7).
2. Share all the equivalent nodes (Fig. 3.8).

A quasi-reduced ordered BDD (QROBDD) is obtained from an OBDD by applying the second reduction rule only.

Example 3.4.1. Figure 3.9 is a ROBDD for

$$
f\left(x_{1}, x_{2}, x_{3}\right)=\left(\bar{x}_{1} \bar{x}_{2} \vee x_{1} x_{2}\right) x_{3} .
$$

On the other hand, Fig. 3.10 is a QROBDD for the same function. In Fig. 3.10, the left node for $x_{3}$ is redundant, and eliminated in the ROBDD. Note that in an QROBDD, all the paths from the root node to a terminal node encounter all the variables.

Theorem 3.4.1. Let $X=\left(X_{1}, X_{2}\right)$ be a partition of $X$. Suppose that the $Q R O B D D$ for $f(X)$ is partitioned into two blocks as shown in Fig. 3.11. Let $k$ be the number of

Fig. 3.9 Reduced ordered BDD

Fig. 3.10 Quasi-reduced OBDD

Fig. 3.11 Computation of column multiplicity for functional decomposition $f\left(X_{1}, X_{2}\right)=g\left(h\left(X_{1}\right), X_{2}\right)$

the nodes in the lower block that is adjacent to the upper block, and $\mu$ be the column multiplicity of the decomposition chart for $f=g\left(h\left(X_{1}\right), X_{2}\right)$. Then, $k=\mu$.

Example 3.4.2. Consider the function represented by the BDD in Figs. 3.9 and 3.10. Let $X_{1}=\left(x_{1}, x_{2}\right)$ and $X_{2}=\left(x_{3}\right)$. Then, the number of the nodes in the lower block that are adjacent to the upper block is two. The function can be decomposed as $f\left(X_{1}, X_{2}\right)=g\left(h\left(X_{1}\right), X_{2}\right)$, where $h\left(X_{1}\right)=\bar{x}_{1} \bar{x}_{2} \vee x_{1} x_{2}$, and $g\left(h, X_{2}\right)=h x_{3}$. Figure 3.12 shows the decomposition chart. Note that the column multiplicity is two. In Fig. 3.10, the left node for $x_{3}$ represents the constant 0 function, while the right node for $x_{3}$ represents the $x_{3}$ function. It is clear that the number of different column patterns in the decomposition chart is equal to the number of nodes for $x_{3}$ in the QROBDD [106].

Definition 3.4.3. A multiterminal BDD (MTBDD) is an extension of a BDD with multiple terminal nodes, each of which has an integer value.

An MTBDD can be used to represent a multiple-output function.
Example 3.4.3. Figure 3.13 is an MTBDD for 3-valued output function. Note that it is a reduced ordered MTBDD (ROMTBDD), but is not a quasi-reduced ordered MTBDD (QROMTBDD).

Fig. 3.12 Decomposition table of a logic function

|  | 0 | 0 | 1 | 1 | $x_{1}$ |
| :---: | :--- | :--- | :--- | :--- | :--- |
|  | 0 | 1 | 0 | 1 | $x_{2}$ |
| 0 | 0 | 0 | 0 | 0 |  |
| 1 | 1 | 0 | 0 | 1 |  |
| $x_{3}$ |  |  |  |  |  |

Fig. 3.13 Multiterminal BDD


### 3.5 Symmetric Functions

Functions that appear in arithmetic circuits, such as adders, often have symmetries. When logic functions have certain symmetries, they are often realized using fewer elements.

Definition 3.5.1. A function $f$ is totally symmetric if any permutation of the variables in $f$ does not change the function. A totally symmetric function is also called a symmetric function.

Definition 3.5.2. In a function $f\left(x_{1}, \ldots, x_{i}, \ldots, x_{j}, \ldots, x_{n}\right)$, if the function $f\left(x_{1}, \ldots, x_{j}, \ldots, x_{i}, \ldots, x_{n}\right)$ that is obtained by interchanging variables $x_{i}$ with $x_{j}$ is equal to the original function, then $f$ is symmetric with respect to $x_{i}$ and $x_{j}$. If any permutation of subset $S$ of the variables does not change the function $f$, then $f$ is partially symmetric.

Definition 3.5.3. The elementary symmetric functions of $n$ variables are

$$
\begin{aligned}
& S_{0}^{n}=\bar{x}_{1} \bar{x}_{2} \cdots \bar{x}_{n}, \\
& S_{1}^{n}=x_{1} \bar{x}_{2} \bar{x}_{3} \cdots \bar{x}_{n-1} \bar{x}_{n} \vee \bar{x}_{1} x_{2} \bar{x}_{3} \cdots \bar{x}_{n-1} \bar{x}_{n} \vee \cdots \vee \bar{x}_{1} \bar{x}_{2} \bar{x}_{3} \cdots \bar{x}_{n-1} x_{n}, \\
& \quad \cdots \cdots \cdots \cdots \cdots \cdots \\
& S_{n}^{n}=x_{1} x_{2} \cdots x_{n} .
\end{aligned}
$$

$S_{i}^{n}=1$ iff exactly $i$ out of $n$ inputs are equal to 1 . Let $A \subseteq\{0,1, \ldots, n\}$. A symmetric function $S_{A}^{n}$ is defined as follows:

$$
S_{A}^{n}=\bigvee_{i \in A} S_{i}^{n}
$$

Example 3.5.1. $f\left(x_{1}, x_{2}, x_{3}\right)=x_{1} x_{2} x_{3} \vee x_{1} \bar{x}_{2} \bar{x}_{3} \vee \bar{x}_{1} x_{2} \bar{x}_{3} \vee \bar{x}_{1} \bar{x}_{2} x_{3}$ is a totally symmetric function. $f=1$ when all the variables are 1 , or when only one variable is 1 . Thus, $f$ can be written as $S_{1}^{3} \vee S_{3}^{3}=S_{\{1,3\}}^{3}$.

Theorem 3.5.1. An arbitrary $n$-variable symmetric function $f$ is uniquely represented by elementary symmetric functions $S_{0}^{n}, S_{1}^{n}, \ldots, S_{n}^{n}$ as follows:

$$
f=\bigvee_{i \in A} S_{i}^{n}=S_{A}^{n} \text {, where } A \subseteq\{0,1, \ldots, n\}
$$

Lemma 3.5.1. There are $2^{n+1}$ symmetric functions of $n$ variables.
Definition 3.5.4. Let $S B(n, k)$ be the $n$-variable symmetric function represented by the EXOR sum of all the products consisting of $k$ positive literals:

$$
\begin{aligned}
& S B(n, 0)=1 \\
& S B(n, 1)=\sum_{\oplus}^{\oplus} x_{i} \\
& S B(n, 2)=\sum_{(i<j)}^{\oplus} x_{i} x_{j} \\
& S B(n, 3)=\sum_{(i<j<k)}^{\oplus} x_{i} x_{j} x_{k} \\
& \ldots \ldots \ldots \\
& S B(n, n)=x_{1} x_{2} \cdots x_{n} .
\end{aligned}
$$

Example 3.5.2.

$$
\begin{aligned}
& S B(4,1)=x_{1} \oplus x_{2} \oplus x_{3} \oplus x_{4} . \\
& S B(4,2)=x_{1} x_{2} \oplus x_{1} x_{3} \oplus x_{1} x_{4} \oplus x_{2} x_{3} \oplus x_{2} x_{4} \oplus x_{3} x_{4} .
\end{aligned}
$$

$S B(n, k)$ has been used as a benchmark function for an AND-EXOR logic minimizer [104]. The following two lemmas were derived by Komamiya [50] and reformulated by the author [110].
Lemma 3.5.2. Let $x_{1}, x_{2}, \ldots, x_{n}$ be binary variables and $r$ be an integer defined by $r=x_{1}+x_{2}+\cdots+x_{n}$, where + is an ordinary integer addition. Let the binary representation of $r$ be

$$
\left(y_{k}, y_{k-1}, \ldots, y_{1}, y_{0}\right)_{2}, y_{j} \in\{0,1\}(j=0,1, \ldots, k)
$$

In other words,

$$
x_{1}+x_{2}+\cdots+x_{n}=2^{k} y_{k}+2^{k-1} y_{k-1}+\cdots+2 y_{1}+y_{0}
$$

Then,

$$
y_{i}=S B\left(n, 2^{i}\right)
$$

Lemma 3.5.3. Let $0 \leq k_{1}<k_{2}<\cdots<k_{s}$, and $2^{k_{1}}+2^{k_{2}}+\cdots+2^{k_{s}} \leq n$. Then,

$$
\bigwedge_{i=1}^{s} S B\left(n, 2^{k_{i}}\right)=S B\left(n, \sum_{i=1}^{s} 2^{k_{i}}\right) .
$$

Example 3.5.3.

$$
\begin{aligned}
S B(7,1) S B(7,2) S B(7,4) & =S B(7,7), \\
S B(4,1) S B(4,2) & =S B(4,3), \\
S B(6,2) S B(6,4) & =S B(6,6) .
\end{aligned}
$$

Definition 3.5.5. $W G T n$ is an $n$-input $\left\lceil\log _{2}(n+1)\right\rceil$-output function. It counts the number of 1's in the inputs and represents it as a binary number.

Fig. 3.14 WGT7


By Lemma 3.5.2, WGT $n$ produces $S B\left(n, 2^{i}\right),\left(i=0,1,2, \ldots,\left\lceil\log _{2}(n+1)\right\rceil-1\right)$, where $\lceil a\rceil$ denotes the smallest integer greater than or equal to $a$.

Example 3.5.4. WGT7 has $x_{1}, x_{2}, \ldots, x_{7}$ as inputs and $y_{2}, y_{1}, y_{0}$ as outputs (Fig. 3.14). By Lemma 3.5.2, we have

$$
\begin{aligned}
& y_{2}=S B(7,4)=\sum_{i<j<k<l}^{\oplus} x_{i} x_{j} x_{k} x_{l} \\
& y_{1}=S B(7,2)=\sum_{i<j}^{\oplus} x_{i} x_{j} \\
& y_{0}=S B(7,1)=\sum_{i=1}^{7} x_{i}
\end{aligned}
$$

WGT 7 is also called as rd73.
The following is an expansion method for symmetric functions using $S B(n, k)$ functions:

Theorem 3.5.2. An arbitrary $n$-variable symmetric function $f$ is represented by $y_{i}=S B\left(n, 2^{i}\right),(i=0,1,2, . ., t)$ as follows:

$$
f=\sum_{\left(a_{0}, a_{1}, \ldots, a_{t}\right)}^{\bigoplus} g\left(a_{0}, a_{1}, \ldots, a_{t}\right) y_{0}^{a_{0}} y_{1}^{a_{1}} \ldots y_{t}^{a_{t}}
$$

where $g\left(a_{0}, a_{1}, \ldots, a_{t}\right)$ is 0 or 1 , and $t=\left\lceil\log _{2}(n+1)\right\rceil-1$.
Proof. A symmetric function $f$ depends only on the number of 1 's in the inputs. Since WGT $n$ counts the number of 1's in the input, we can represent $f$ as a function of $y_{0}, y_{1}, \ldots, y_{t}$.

Example 3.5.5. A 7-variable symmetric function $S_{0}^{7}$ can be represented as

$$
\begin{aligned}
S_{0}^{7} & =\bar{y}_{2} \bar{y}_{1} \bar{y}_{0} \\
& =\overline{S B(7,4)} \cdot \overline{S B(7,2)} \cdot \overline{S B(7,1)}
\end{aligned}
$$

Note that the binary representation of 0 is $(0,0,0)$. Similarly, $S_{3}^{7}$ is represented as

$$
\begin{aligned}
S_{3}^{7} & =\bar{y}_{2} y_{1} y_{0} \\
& =\overline{S B(7,4)} \cdot \operatorname{SB}(7,2) \cdot S B(7,1)
\end{aligned}
$$

Thus, $S_{\{0,3\}}^{7}$ is represented as

$$
\begin{aligned}
S_{\{0,3\}}^{7} & =S_{0}^{7} \oplus S_{3}^{7}=\bar{y}_{2} \bar{y}_{1} \bar{y}_{0} \oplus \bar{y}_{2} y_{1} y_{0} \\
& =\bar{y}_{2}\left(\bar{y}_{1} \bar{y}_{0} \oplus y_{1} y_{0}\right)=\bar{y}_{2}\left(y_{1} \oplus \bar{y}_{0}\right) \\
& =\overline{S B(7,4)} \cdot(S B(7,2) \oplus \overline{S B(7,1)}) .
\end{aligned}
$$

### 3.6 Technology Mapping

Logic synthesis using embedded memories can be considered as a special case of FPGA design, where the number of LUT inputs is large. Most existing methods use LUT-based technology mapping [33]. That is, given a combinational logic circuit, they partition it into sub-circuits depending on at most $K$ variables [22,53,153]. This method was originally used for the LUT-based FPGA synthesis. Here, we introduce the basic idea.

Definition 3.6.1. [23, 57, 71] A combinational network can be converted into a DAG, where each node represents a logic gate (LUT), a primary input (PI), or a primary output (PO). When the output of the gate $i$ is an input of gate $j$, a directed edge $(i, j)$ exists. $\operatorname{input}(v)$ denotes the set of nodes which are fanins of gate $v$. output( $v$ ) denotes the set of nodes which are fanouts of gate $v$. A cone at $v$, denoted as $C_{v}$, is a subgraph consisting of $v$ and its non-PI predecessors such that any path connecting a node in $C_{v}$ and $v$ lies entirely in $C_{v}$. Node $v$ is the root of the cone. The fanin size of a cone is the number of input edges. A cone with $K$ input edge is $K$-feasible and can be implemented with a $K$-LUT.

Example 3.6.1. In Fig. 3.15, $a, b, c, d, e$ are PIs, and $f$ denotes the PO. The cone of $v$ consists of internal nodes $v, t, u, s, p, q$, and $r$. The fanin size of $C_{v}$ is four. Thus, $C_{v}$ is 4 -feasible and can be implemented by a 4-LUT.

Fig. 3.15 Example of directed acyclic graph


Fig. 3.16 Example of 4-feasible cut-set


Definition 3.6.2. Given a network $N$ with a source $s$ and a $\operatorname{sink} t$, a cut $\left(\mathcal{N}_{1}, \mathcal{N}_{2}\right)$ is a partition of the nodes in the network such that $s \in \mathcal{N}_{1}, t \in \mathcal{N}_{2}$ and no nodes in $\mathcal{N}_{2}$ provide input to any node in $\mathcal{N}_{1}$. A cut-set of a cut is the set of all nodes $v$ such that $v \in \mathcal{N}_{1}$, and $v$ drives a node in $\mathcal{N}_{2}$. If the size of a cut-set is no more than $K$, then the cut is $K$-feasible. A fanout-free cone (FFC) at $v$, denoted $F F C_{v}$, is a cone of $v$, with output edge only originating from the root of the cone. A maximum fanout free cone (MFFC) is an FFC that maximizes the number of nodes contained in the FFC.

Example 3.6.2. In Fig. 3.16, let $a$ be a source, and $f$ be a sink. The set of nodes is partitioned into $\mathcal{N}_{1}=\{a, b, c, d, e, p, q, r, s, t\}$ and $\mathcal{N}_{2}=\{u, v, f\}$. The cut-set of the cut $\left(\mathcal{N}_{1}, \mathcal{N}_{2}\right)$ is $\{t, p, d, e\}$. It is 4-feasible. The MFFC of $f$ is $\{p, q, r, s, t, u, v, f\}$. This is because a cone includes only non-PI nodes, by Definition 3.6.2. If the output of $t$ is connected to an other output, then MMFC of $f$ would be $\{u, v, f\}$.

In a technology mapping algorithm, usually, a given circuit is converted into an equivalent two-input network. If the DAG is represented as a set of trees, then the area minimization problem can be solved optimally using dynamic programming [47]. Unfortunately, most circuits have non-tree structure: There exist many fanouts and reconvergence. If the circuit is decomposed into a set of MFFCs, then it can also be solved optimally. In the approach of [22], the circuit is first mapped into LUTs using the best available algorithm. Then, it extracts large single-output and multipleoutput fanout-free logic blocks and covers them entirely or partially by embedded memories. Since these design methods start from existing circuits, the quality of the solutions are not so good [57]. Improvements for technology mapping are shown in [71]. Especially, an efficient method to enumerate all the cuts up to $K=6$ [71] and a method to compute useful cuts for any number of inputs [72]. In this book, the major tool for the memory-based design is a functional decomposition, so we will not go into the detail of the method.

Table 3.1 Approximation error for $1-x$

| $x$ | $1-x$ | $\mathrm{e}^{-x}$ | $\operatorname{Error}(x)$ |
| :--- | :--- | :--- | :--- |
| 0.001 | 0.999 | 0.99900050 | 0.00000050 |
| 0.010 | 0.990 | 0.99004983 | 0.00005034 |
| 0.100 | 0.900 | 0.90483742 | 0.00537491 |
| 0.200 | 0.800 | 0.81873075 | 0.02341344 |

### 3.7 The Mathematical Constant $e$ and Its Property

Definition 3.7.1. The mathematical constant $e$ is defined as

$$
e=\lim _{n \rightarrow \infty}\left(1+\frac{1}{n}\right)^{n} .
$$

Lemma 3.7.1. When $0<x \ll 1,1-x$ can be approximated by $e^{-x}$.
Proof. The Taylor expansion of a function $f(x)$ is

$$
f(x)=f(0)+\frac{x}{1!} f^{(1)}(0)+\frac{x^{2}}{2!} f^{(2)}(0)+\frac{x^{3}}{3!} f^{(3)}(0)+\cdots+\frac{x^{k}}{k!} f^{(k)}(0)+\cdots .
$$

Thus, $\mathrm{e}^{x}$ can be expanded as

$$
\mathrm{e}^{x}=1+x+\frac{x^{2}}{2!}+\frac{x^{3}}{3!}+\frac{x^{4}}{4!}+\cdots
$$

When $x$ is small, we need only to consider up to the second term, and we have

$$
\mathrm{e}^{-x} \simeq 1-x
$$

Consider the approximation error of Lemma 3.7.1:

$$
\operatorname{Error}(x)=\frac{\mathrm{e}^{-x}-(1-x)}{1-x}
$$

Table 3.1 shows that when $0<x<0.1$, the approximation error is quite small. Lemma 3.7.1 will be extensively used for the calculation of approximate probability in Chaps. 9, 10, and 11.

### 3.8 Remarks

Functional decomposition is the key technique in the memory-based logic synthesis. It efficiently represents a given Boolean function with reduced total amount of memory. Most Boolean functions do not have any functional decomposition [109]. However, practical functions often have functional decompositions. Thus, an
attempt to find functional decompositions is, in many cases, rewarding. Ashenhurst decompositions can be efficiently found by BDDs $[10,60]$. However, as for Curtis decomposition, no efficient methods are known. Some heuristic methods are known [59, 147]. Excellent surveys on FPGA logic synthesis include [20, 78, 103].

## Problems

3.1. Using the definition of $S B(n, k)$, verify the following equation:

$$
S B(4,1) \cdot S B(4,2)=S B(4,3)
$$

3.2. Show that most $2^{2^{k}+2^{n-k+1}}$ functions have Ashenhurst decompositions of the form

$$
f\left(X_{1}, X_{2}\right)=g\left(h\left(X_{1}\right), X_{2}\right),
$$

where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{k}\right)$ and $X_{2}=\left(x_{k+1}, x_{k+2}, \ldots, x_{n}\right)$.
3.3. Show that most $2^{m 2^{k}+2^{n-k+m}}$ functions have Curtis decompositions of the form

$$
f\left(X_{1}, X_{2}\right)=g\left(h_{1}\left(X_{1}\right), h_{2}\left(X_{1}\right), \ldots, h_{m}\left(X_{1}\right), X_{2}\right),
$$

where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{k}\right)$ and $X_{2}=\left(x_{k+1}, x_{k+2}, \ldots, x_{n}\right)$.
3.4. Let $f(X)$ be the function that counts the number of 1 's in the inputs.

$$
\{0,1\}^{5} \rightarrow\{0,1,2,3,4,5\}
$$

That is, $f(\vec{a})$ denotes the number of 1 's in $\vec{a}$. Write the decomposition chart and obtain the column multiplicity of the decomposition ( $X_{1}, X_{2}$ ), where $X_{1}=$ $\left(x_{1}, x_{2}, x_{3}\right)$ and $X_{2}=\left(x_{4}, x_{5}\right)$.
3.5. How many functions of $n$ variables with weight $k$ exist?
3.6. Represent the symmetric function of 9 variables: $S_{\{3,4,5,6\}}^{9}$ by

$$
\begin{aligned}
& y_{3}=S B(9,8), \\
& y_{2}=S B(9,4), \\
& y_{1}=S B(9,2), \text { and } \\
& y_{0}=S B(9,1),
\end{aligned}
$$

3.7. SYM12 is a symmetric function of 12 variables that is 1 iff the number of 1 's in the inputs is between 4 and 8 . Consider the decomposition of the function SYM12 $=f\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{9}\right)$ and $X_{2}=\left(x_{10}, x_{11}, x_{12}\right)$. Show that the column multiplicity of the function with respect to ( $X_{1}, X_{2}$ ) is $\mu_{9}=8$.

## Chapter 4 <br> MUX-Based Synthesis

This chapter shows a universal method to realize an $n$-variable function using multiplexers (MUXs) and look up tables (LUTs). It also derives upper bounds on the number of LUTs to realize an $n$-variable function. Such bounds are useful to estimate the number of LUTs needed to realize a given function when we only know the number of the input variables $n$.

### 4.1 Fundamentals of MUX

Definition 4.1.1. A multiplexer with a single control input (1-MUX) is the selection circuit shown in Fig. 4.1. It performs the logical operation

$$
g\left(x, y_{0}, y_{1}\right)=\bar{x} y_{0} \vee x y_{1} .
$$

A $\boldsymbol{t}$-MUX is shown in Fig.4.2. It is a multiplexer with $t$ control inputs $\left(x_{1}, \ldots, x_{t}\right)$ and $2^{t}$ data inputs $\left(y_{0}, y_{1}, \ldots, y_{2^{t}-1}\right)$. Let $g\left(x_{1}, \ldots, x_{t}, y_{0}, y_{1}, \ldots, y_{2^{t}-1}\right)$ be the output function. Then, $g=y_{a}$ when the decimal representation of the control input $\left(x_{1}, \ldots, x_{t}\right)$ is $a$. That is, when the control input is $(0,0, \ldots, 0)$, the top data input $y_{0}$ drives the output. When the control input is $(0,0, \ldots, 1)$, the second data input $y_{1}$ drives the output. Also, when the control input is $(1,1, \ldots, 1)$, the last data input $y_{2^{t-1}}$ drives the output.

An $t$-MUX can be realized using 1-MUXs.
Example 4.1.1. A 3-MUX is realized with $2^{3}-1=7$ modules of 1 -MUXs, as shown in Fig. 4.3.

Lemma 4.1.1. A $t$-MUX is realized by using $2^{t}-1$ modules of $1-M U X s$.
Proof. This can easily be done by mathematical induction.
When $K \geq 4$, a 2-MUX can be realized with one or two K-LUTs.

Fig. 4.1 1-MUX


Fig. $4.2 t$-MUX


Fig. 4.3 3-MUX realized by 1-MUXs


Lemma 4.1.2. [57, 73] A 2-MUX can be realized by two 4-LUTs as shown in Fig. 4.4, where

$$
g=\left(y_{0} \bar{x}_{2} \vee y_{1} x_{2}\right) \bar{x}_{1} \vee x_{1} x_{2}
$$

and

$$
h=g \bar{x}_{1} \vee\left(y_{2} \bar{g} \vee y_{3} g\right) x_{1} .
$$

Proof. When $x_{1}=0$, we have $g=y_{0} \bar{x}_{2} \vee y_{1} x_{2}$ and $h=g$. Thus, $h=y_{0} \bar{x}_{2} \vee y_{1} x_{2}$.
When $x_{1}=1$, we have $g=x_{2}$ and $h=y_{2} \bar{g} \vee y_{3} g$. Thus, $h=y_{2} \bar{x}_{2} \vee y_{3} x_{2}$.
Therefore, the circuit realizes the 2-MUX function.
Figure 4.4 shows a nondisjoint decomposition. A method to derive this decomposition is considered in Chap. 6.

When $K=5$, a 2-MUX can be realized by two 5-LUTs as shown in Fig. 4.5.
When $K=6$, a 2-MUX can be realized by a single 6-LUT instead of using three 1-MUXs as shown in Fig. 4.6.

Fig. 4.4 2-MUX realized by 4-LUTs


Fig. 4.5 2-MUX realized by 5 -LUTs


Fig. 4.6 2-MUX realized by a 6-LUT


### 4.2 MUX-Based Realization

Theorem 4.2.1. An arbitrary $n$-variable function can be represented as follows:

$$
f\left(X_{1}, X_{2}\right)=\bigvee_{i \in P} g_{i}\left(X_{1}\right) X_{2}^{i},
$$

where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{k}\right)$ and $X_{2}=\left(x_{k+1}, x_{k+2}, \ldots, x_{n}\right), P=\{0,1, \ldots$, $\left.2^{n-k}-1\right\}$, and the OR is performed with respect to $2^{n-k}$ elements.

Fig. 4.7 Realization of an arbitrary 7 -variable function using 5-LUTs


Example 4.2.1. Consider the realization of a 7 -variable function $f\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{5}\right)$, and $X_{2}=\left(x_{6}, x_{7}\right) . f$ is expanded into a sum of four products:

$$
\begin{aligned}
f\left(X_{1}, X_{2}\right) & =\bigvee_{i=0}^{3} g_{i}\left(X_{1}\right) X_{2}^{i} \\
& =g_{0}\left(X_{1}\right) X_{2}^{0} \vee g_{1}\left(X_{1}\right) X_{2}^{1} \vee g_{2}\left(X_{1}\right) X_{2}^{2} \vee g_{3}\left(X_{1}\right) X_{2}^{3} .
\end{aligned}
$$

As shown in Fig. 4.7, a 2-MUX can be realized by using three 1-MUXs. The top LUT in the left most column realizes $g_{0}$, which is selected when $\left(x_{6}, x_{7}\right)=(0,0)$. The second LUT in the leftmost column realizes $g_{1}$, which is selected when $\left(x_{6}, x_{7}\right)=(0,1)$. Other LUTs are derived similarly.

Theorem 4.2.2. When $3 \leq K \leq n$, an arbitrary $n$-variable function is realized by using at most $2^{n-K}-1$ modules of 1-MUXs and $2^{n-K}$ modules of $K$-LUTs.

Proof. Consider the expansion of Theorem 4.2.1. First, realize an $(n-K)$-MUX by using 1-MUXs. By Lemma 4.1.1, we need $2^{n-K}-1$ modules of 1-MUXs. Next, by connecting $g_{i}\left(X_{1}\right)$ to the data inputs of the $(n-K)$-MUX, realize an arbitrary $n$-variable function. To realize $g_{i}\left(X_{1}\right)\left(i=0,1, \ldots, 2^{n-K}-1\right)$, we use $2^{n-k} \bmod$ ules of $K$-LUTs.

Next, consider several special cases.
Lemma 4.2.1. An arbitrary function of $n=K+1$ variables can be realized with at most three $K$-LUTs, where $K \geq 3$.

Proof. Let $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{K}\right)$ and $X_{2}=\left(x_{K+1}\right)$. Then, the function can be represented as

$$
f\left(X_{1}, X_{2}\right)=\bar{x}_{K+1} f\left(X_{1}, 0\right) \vee x_{K+1} f\left(X_{1}, 1\right)
$$

Fig. 4.8 Realization of a $k+1$
variable function


Fig. 4.9 Realization of a $\mathrm{k}+2$ variable function


Thus, $f$ can be realized by three $K$-LUTs as shown in Fig. 4.8. Note that the left cell realizes $f\left(X_{1}, 0\right)$ and $f\left(X_{1}, 1\right)$, while the right cell works as a selector (1-MUX) to realize $f$. The integer in a cell denotes the number of LUTs to realize the cell. This is an LUT cascade, which will be explained in the next chapter.

Lemma 4.2.2. An arbitrary function of $n=K+2$ variables can be realized with at most five $K$-LUTs, where $K \geq 6$.

Proof. Let $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{K}\right)$ and $X_{2}=\left(x_{K+1}, x_{K+2}\right)$. Then, the function can be represented as

$$
\begin{aligned}
f\left(X_{1}, X_{2}\right)= & \bar{x}_{k+1} \bar{x}_{K+2} f\left(X_{1}, 0,0\right) \vee \bar{x}_{K+1} x_{K+2} f\left(X_{1}, 0,1\right) \vee \\
& x_{K+1} \bar{x}_{K+2} f\left(X_{1}, 1,0\right) \vee x_{K+1} x_{K+2} f\left(X_{1}, 1,1\right) .
\end{aligned}
$$

Thus, $f$ can be realized by five $K$-LUTs, as shown in Fig. 4.9. Note that the left cell generates $f\left(X_{1}, 0,0\right), f\left(X_{1}, 0,1\right), f\left(X_{1}, 1,0\right)$, and $f\left(X_{1}, 1,1\right)$, while the right cell serves as a selector (2-MUX) to realize $f$. Figure 4.9 can be considered as a simplified representation of Fig.4.7.

Theorem 4.2.3. [110] The number of 6-LUTs to realize an arbitrary $n$-variable function $(n \geq 6) f$ is:

- $\left(2^{n-4}-1\right) / 3$ or less, when $n$ is even.
- $\left(2^{n-4}+1\right) / 3$ or less, when $n$ is odd.

Proof. Case 1: $n$ is even $(n=2 r)$ :
We realize the function $f$ in the form of Theorem 4.2.1, where $K=6$. First, realize a ( $n-6$ )-MUX by using 2 -MUXs. This requires

$$
1+4+\cdots+4^{\frac{n-6}{2}-1}=1+4+\cdots+4^{r-4}=\frac{4^{r-3}-1}{4-1}
$$

6-LUTs. Next, realize $g_{i}\left(X_{1}\right)\left(i=0,1, \ldots, 2^{n-k}-1\right)$. This requires $4^{r-3}$ modules of 6-LUTs. So, the total number of 6-LUTs is

$$
\frac{4^{r-3}-1}{3}+4^{r-3}=\frac{4^{r-2}-1}{3}=\frac{2^{n-4}-1}{3} .
$$

Case 2: $n$ is odd $(n=2 r+1)$ :
The function $f$ can be expanded into the form

$$
\begin{equation*}
f\left(X_{1}, x_{n}\right)=\bar{x}_{n} g_{0}\left(X_{1}\right) \vee x_{n} g_{1}\left(X_{1}\right), \tag{4.1}
\end{equation*}
$$

where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{n-1}\right)$. Since $g_{i}\left(X_{1}\right)(i=0,1)$ are functions with $2 r$ variables, they can be realized by $\left(4^{r-2}-1\right) / 3$ modules of 6 -LUTs. To realize the expansion (4.1), we use a 1-MUX. Thus, the total number of 6-LUTs to realize $f$ is

$$
2 \cdot \frac{4^{r-2}-1}{3}+1=\frac{2 \cdot 4^{r-2}+1}{3}=\frac{2^{n-4}+1}{3} .
$$

Example 4.2.2. The number of 6-LUTs to realize an $n$-variable function is:

- 5 or less, when $n=8$. In this case, $g_{i}\left(X_{1}\right)(i=0,1,2,3)$ are realized by four modules of 6-LUTs, while the 2 -MUX is realized by a single 6 -LUT, as shown in Fig. 4.10. In the figure, the numbers in the squares denote the numbers of necessary LUTs.
- 11 or less, when $n=9$. In this case, $g_{i}\left(X_{1}\right)(i=0,1,2, \ldots, 7)$ are realized by 8 modules of 6 -LUTs, while the 3 -MUX is realized by three 6 -LUTs as shown in Fig. 4.11.
- 21 or less, when $n=10$. In this case, $g_{i}\left(X_{1}\right)(i=0,1,2, \ldots, 15)$ are realized by 16 modules of 6 -LUTs, while the 4 -MUX is realized by using five 6 -LUTs as shown in Fig. 4.12.

Theorem 4.2.4. Consider the function $f\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, x_{3}, x_{4}\right)$ and $X_{2}=\left(x_{5}, x_{6}, \ldots, x_{K+3}, x_{K+4}\right)$. Let $\mu$ be the column multiplicity of the decomposition $f\left(X_{1}, X_{2}\right)$, where $X_{1}$ denotes the bound variables. Then, $f$ can be realized with at most $\mu+5$ modules of $K$-LUTs, where $K \geq 6$.

Proof. The function $f\left(X_{1}, X_{2}\right)$ can be expanded as

$$
f\left(X_{1}, X_{2}\right)=g_{0}\left(\vec{a}_{0}, X_{2}\right) \vee g_{1}\left(\vec{a}_{1}, X_{2}\right) \vee g_{2}\left(\vec{a}_{2}, X_{2}\right) \vee \cdots \vee g_{15}\left(\vec{a}_{15}, X_{2}\right),
$$

Fig. 4.10 Realization of an arbitrary 8 -variable function using 6-LUTs


Fig. 4.11 Realization of an arbitrary 9 -variable function using 6-LUTs


Fig. 4.12 Realization of an arbitrary 10 -variable function using 6-LUTs

where $\vec{a}_{0}=(0,0,0,0), \vec{a}_{1}=(0,0,0,1), \vec{a}_{2}=(0,0,1,0), \ldots$, and $\vec{a}_{15}=(1,1,1,1)$. Thus, $f\left(X_{1}, X_{2}\right)$ can be realized as the circuit shown in Fig. 4.12. Since the column multiplicity is $\mu$, the number of different column functions $g_{i}\left(\vec{a}_{i}, X_{2}\right)$ is $\mu$. So, in Fig. 4.12, the LUTs producing the same functions can be shared, and only $\mu$ LUTs are sufficient to produce $g_{i}\left(\vec{a}_{i}, X_{2}\right)$.

### 4.3 Remarks

This chapter derived the number of 6-LUTs to realize an $n$-variable logic function. The number of required LUTs increases exponentially with $n$. So, the MUX-based design is only practical for the functions with a small number of inputs. However, the MUX-based method sometimes produces circuits with fewer LUTs [134] than existing methods [73, 76], in particular, for random logic functions. In this chapter, we represented the function by Shannon expansion. However, if we use
pseudo-Kronecker expansion [110], we can reduce the number of LUTs by $23 \%$ [107]. In the pseudo-Kronecker expansion, we can select one from 840 possible expansions to reduce the number of LUTs.

## Problems

4.1. Consider a 10 -variable function $f(X)$. Let $\left(X_{1}, X_{2}\right)$ be a partition of the variables $X$, where $X_{1}=\left(x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{6}\right)$ and $X_{2}=\left(x_{7}, x_{8}, x_{9}, x_{10}\right)$. Assume that $\mu_{6}=16$. That is, the column multiplicity of the decomposition chart is 16 . Show that $f$ can be realized with at most nine 6-LUTs.
4.2. Let $\left(X_{1}, X_{2}\right)$ be the partition of the variables $X$, where $X_{1}=\left(x_{1}, x_{2}, x_{3}, x_{4}\right)$ and $X_{2}=\left(x_{5}, x_{6}, x_{7}, x_{8}, x_{9}, x_{10}\right)$. Suppose that the column multiplicity is 10 , i.e., $\mu_{4}=10$. Then, show that $f$ can be realized with at most 15 copies of 6-LUTs.
4.3. Show that an arbitrary logic function can be represented as

$$
\begin{equation*}
f\left(x_{1}, x_{2}, Y\right)=g_{0}(Y) \oplus x_{1} g_{1}(Y) \oplus x_{2} g_{2}(Y) \oplus x_{1} x_{2} g_{3}(Y) \tag{4.2}
\end{equation*}
$$

This is the Reed-Muller expansion. Consider the Shannon expansion:

$$
\begin{equation*}
f\left(x_{1}, x_{2}, Y\right)=\bar{x}_{1} \bar{x}_{2} f_{0}(Y) \oplus \bar{x}_{1} x_{2} f_{1}(Y) \oplus x_{1} \bar{x}_{2} f_{2}(Y) \oplus x_{1} x_{2} g_{3}(Y) \tag{4.3}
\end{equation*}
$$

Represent $g_{0}(Y), g_{1}(Y), g_{2}(Y)$, and $g_{3}(Y)$ by $f_{0}(Y), f_{1}(Y), f_{2}(Y)$, and $f_{3}(Y)$.
4.4. Consider the function $f\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{2 k}\right)$ and $X_{2}=$ $\left(x_{2 k+1}, x_{2 k+2}, \ldots, x_{2 k+6}\right)$. Let $\mu$ be the column multiplicity of the decomposition $f\left(X_{1}, X_{2}\right)$, where $X_{1}$ denotes the bound variables. Then, show that $f$ can be realized with at most

$$
\mu+\frac{4^{k}-1}{3}
$$

6-LUTs.

## Chapter 5 <br> Cascade-Based Synthesis

The previous chapter presented a multiplexer (MUX)-based realization. Although such a method is applicable to any $n$-variable function $f$, the number of LUTs necessary to realize $f$ increases as $O\left(2^{n}\right)$. This chapter considers a cascade-based logic synthesis. A cascade-based realization is applicable to only a limited class of functions. However, functions with a small C-measure can be realized by cascadebased realizations with $O(n)$ LUTs.

### 5.1 Functional Decomposition and LUT Cascade

Before considering the general case, we review special cases.
Lemma 5.1.1. An arbitrary function of $n=K+1$ variables can be realized with at most three $K$-LUTs, where $K \geq 3$.

Proof. This is the same as Lemma 4.2.1.
Lemma 5.1.2. An arbitrary function of $n=K+2$ variables can be realized with at most five $K$-LUTs, where $K \geq 6$.

Proof. This is the same as Lemma 4.2.2.
From the definition of a decomposition chart, we have the following:
Theorem 5.1.1. [8] Let $\mu_{k}(n)$ be the column multiplicity of a decomposition chart of an $n$-variable logic function with $k$ bound variables. Then,

$$
\mu_{k}(n) \leq \min \left\{2^{k}, 2^{2^{n-k}}\right\} .
$$

When circuits are designed by LUTs, functions with smaller column multiplicities tend to have smaller realizations.

Definition 5.1.1. Let $f\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ be a logic function. The profile of the function $f$ is the vector $\left(\mu_{1}, \mu_{2}, \ldots, \mu_{n}\right)$, where $\mu_{k}$ denotes the column multiplicity of the decomposition chart for $f\left(X_{1}, X_{2}\right), X_{1}=\left(x_{1}, x_{2}, \ldots, x_{k}\right)$ and
$X_{2}=\left(x_{k+1}, \ldots, x_{n}\right)$, assuming that the order of variables $\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ is fixed. The $\mathbf{C}$-measure of the function $f$ is $\max \left(\mu_{1}, \mu_{2}, \ldots, \mu_{n}\right)$ and is denoted by $\mu(f)$.

Note that the order of the variables will affect the C-measure, but we choose the natural order $\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ of the input variables.

Lemma 5.1.3. Let $f$ be an arbitrary $n$-variable function. Then,

$$
\mu(f) \leq \max _{k=1}^{n} \min \left\{2^{k}, 2^{2^{n-k}}\right\} .
$$

For any partition ( $X_{1}, X_{2}$ ) of $X$, we have the decomposition shown in Fig. 5.1. By repeatedly applying functional decompositions to a given function $f(X)=$ $f\left(X_{1}, X_{2}, \ldots, X_{s}\right)$, we have an LUT cascade [113] shown in Fig. 5.2. An LUT cascade consists of cells. The signal lines connecting adjacent cell are rails. A logic function with a small C-measure can be realized by a compact LUT cascade.

Lemma 5.1.4. [113] An arbitrary logic function $f$ can be realized by an LUT cascade, whose cells have at most $\left\lceil\log _{2} \mu(f)\right\rceil+1$ inputs, and at most $\left\lceil\log _{2} \mu(f)\right\rceil$ outputs, where $\mu(f)$ is the $C$-measure of $f$.

Lemma 5.1.5. [127] In an LUT cascade that realizes an $n$-variable function $f$, let $s$ be the number of cells; $w=\left\lceil\log _{2} \mu(f)\right\rceil$ be the maximum number of rails; $K$ be the number of inputs to a cell; $n \geq K+1$; and $K \geq\left\lceil\log _{2} \mu(f)\right\rceil+1$. Then, an LUT cascade satisfying the following condition exists:

$$
\begin{equation*}
s=\left\lceil\frac{n-w}{K-w}\right\rceil . \tag{5.1}
\end{equation*}
$$

Proof. From the design method of the LUT cascade, we have

$$
K+(K-w)(s-1) \geq n
$$

Here, $K$ on the left-hand side of the equality denotes the number of inputs of the leftmost LUT, and $(K-w)(s-1)$ denotes the sum of inputs for the remaining $(s-1)$

Fig. 5.1 Realization of a logic function by decomposition


Fig. 5.2 LUT cascade


Fig. 5.3 Realization
of a 9-variable function
with C-measure
not exceeding 8


LUTs. When the actual number of rails is smaller than $w$, we append dummy rails to make the number of rails $w$. From this, we have

$$
s-1 \geq \frac{n-K}{K-w}, \text { and } s \geq \frac{n-w}{K-w} .
$$

Since $s$ is an integer, we have (5.1). When this is the case, we can realize an LUT cascade for $f$ having $s$ cells with at most $K$ inputs.

Example 5.1.1. Let $f\left(X_{1}, X_{2}\right)$ be a 9 -variable function, where $X_{1}=\left(x_{1}, x_{2}\right.$, $\left.\ldots, x_{6}\right)$ and $X_{2}=\left(x_{7}, x_{8}, x_{9}\right)$. Let $\mu_{6}$ be the column multiplicity of the decomposition of $f$ with respect to $\left(X_{1}, X_{2}\right)$. If $\mu_{6} \leq 8$, then $f\left(X_{1}, X_{2}\right)$ can be realized with four 6-LUTs, as shown in Fig. 5.3. Note that the number of rails between two cells is $\left\lceil\log _{2} 8\right\rceil=3$, by Theorem 3.3.1. In many cases, the natural ordering of the input variables does not yield the smallest circuit. To check if $\mu_{6} \leq 8$ for all arrangements of variables, we need to compute the column multiplicities for $\binom{9}{6}=84$ combinations of variables.

Lemma 5.1.6. Consider a cascade consisting of K-LUTs.

1. When the number of the external input variables to the output LUT is one, the number of the rail inputs to the LUT is at most two.
2. When the number of the external input variables to the output LUT is two, the number of the rail inputs to the LUT is at most four.

Proof. We prove the second case only. The proof for the first case is similar. Let $x_{n-1}$ and $x_{n}$ be external input variables for the output LUT. Consider the decomposition chart, where the rail inputs $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{n-2}\right)$ denotes the set of bound variables and the external input $X_{2}=\left(x_{n-1}, x_{n}\right)$ denotes the set of free variables. In this case, the column multiplicity is at most 16 , since there exist at most $2^{2^{2}}=16$ different column functions by Theorem 5.1.1. Also by Theorem 3.3.1, the number of the rail inputs to the output LUT is at most four.

### 5.2 Number of LUTs to Realize General Functions

As for realizations by 6-LUTs, we have the following:
Theorem 5.2.1. The number of 6-LUTs needed to realize an arbitrary n-variable function with $\mu(f) \leq 32$ is $5 n-35$ or less, where $n \geq 8$.

Fig. 5.4 Realization with 6-LUTs


Proof. From Lemma 5.1.4, an arbitrary function with $\mu(f) \leq 32$ can be realized by an LUT cascade, whose cells have at most $\left\lceil\log _{2} \mu\right\rceil+1=\log _{2}(32)+1=6$ inputs, and $w=\log _{2}(32)=5$ outputs. Let $K=6$. Then, from Lemma 5.1.5, the number of cells is at most $\left\lceil\frac{n-w}{K-w}\right\rceil=\frac{n-5}{6-5}=n-5$. Note that each cell except for the rightmost cell has at most 5 outputs. From Lemma 5.1.6, the second cell from the right has at most four outputs as shown in Fig. 5.4. So, the total number of cells is at most $n-6$. Note that the second cell from the right has 4 outputs, while the rightmost cell has just one output. Therefore, the total number of LUTs is at most $5(n-8)+4+1=5 n-35$.

Theorem 5.2.2. The number of 6 -LUTs needed to realize an arbitrary $n$-variable function with $\mu(f) \leq 16$ is $2 n-11$ or less, where $n \geq 8$.

Proof. Let $w=\log _{2}(16)=4$ and $K=6$. From Lemma 5.1.5, we have $s \leq\left\lceil\frac{n-w}{K-w}\right\rceil$ $=\frac{n-4}{6-4}=\left\lceil\frac{n-4}{2}\right\rceil$.

When $n=2 r$, each cell except for the rightmost cell has at most 4 outputs. So, the total number of LUTs is at most $4 \times\left(\left\lceil\frac{n-4}{2}\right\rceil-1\right)+1=2(n-4)-4+1=2 n-11$.

When $n=2 r+1$, by Lemma 5.1.6, the rightmost cell has one external input and at most two rail inputs, and the second cell from the right has at most two outputs. So, the total number of LUTs is at most $4 \times\left(\left\lceil\frac{n-5}{2}\right\rceil-1\right)+2+1=2 n-11$.

Theorem 5.2.3. The number of $6-L U T s$ needed to realize an arbitrary $n$-variable function with $\mu(f) \leq 8$ is $n-5$ or less when $n=3 r$, and $n-4$ or less when $n \neq 3 r$, where $n \geq 8$.

Proof. Let $w=\log _{2} 8=3$ and $K=6$. From Lemma 5.1.5, we have $s \leq\left\lceil\frac{n-w}{K-w}\right\rceil=$ $\left\lceil\frac{n-3}{3}\right\rceil$.

When $n=3 r$, each cell except for the rightmost cell has at most 3 outputs. So, the total number of LUTs is at most $3 \times\left(\left\lceil\frac{n-3}{3}\right\rceil-1\right)+1=(n-3)-3+1=n-5$.

When $n=3 r+1$, the rightmost cell has one external input and at most two rail inputs, and the second cell from the rightmost one has at most 2 outputs. So, the total number of LUTs is at most $3 \times\left(\left\lceil\frac{n-4}{3}\right\rceil-1\right)+2+1=(n-4)-3+3=n-4$.

When $n=3 r+2$, the rightmost cell has two external inputs. So, the total number of LUTs is at most $3 \times\left(\left\lceil\frac{n-5}{3}\right\rceil-1\right)+3+1=(n-5)-3+4=n-4$.

Lemma 5.2.1. The number of $K$-LUTs needed to realize $a(K+1)$-variable $k$ output function $F$ with $\mu(F) \leq 2^{K-1}+2^{K-3}$ is $3 K$ or less.

Proof. Consider the circuit shown in Fig. 5.5. Since the column multiplicity is at most $2^{K-1}+2^{K-3}$, each pattern can be uniquely represented by a $K$-bit code. By using the leftmost cell $D$, generate $K$-bit codes that correspond to the column patterns. For the first $2^{K-1}$ patterns, assign $K$-bit codes with the form $(0, *, *, \ldots, *)$, where $*$ denotes either 0 or 1 . For the remaining $2^{K-3}$ patterns, assign $K$-bit codes with the form $(1,0,0, *, *, \ldots, *)$. Cell $A$ in Fig. 5.5 implements the function for the codes $(0, *, *, \ldots, *)$, while cell $B$ implements the function for the codes $(1,0,0, *, \ldots, *)$. Cell $C$ is used for a selector, which is controlled by the most significant bit of the outputs of cell $D$. In this way, an arbitrary $K$ output function can be realized by the circuit shown in Fig. 5.5. Note that cell $B$ has $K-2$ inputs, and cell $C$ has three inputs. Cells $B$ and $C$ can be merged and realized by a $K$-input LUT. Thus, the total number of LUTs to implement this circuit is at most $3 K$.

Theorem 5.2.4. Let $K \geq 6$ and $n \geq K+3$. The number of $K$-LUTs to realize an $n$-variable function $f$ with $\mu(f) \leq 2^{K-1}+2^{K-3}$ is $2 K(n-K)-5 K+9$ or less.

Proof. Consider the cascade shown in Fig.5.6. Since $\left\lceil\log _{2} \mu\right\rceil \leq K$, the function can be realized by a cascade with at most $K$-rails. Also, by Lemma 5.2.1, each of the intermediate cells can be realized with at most $2 K$ LUTs. By Lemma 5.1.6, the rightmost cell has at most four rail inputs and two external inputs. Also, note that the second cell from the right can be implemented with at most 8 LUTs. Note that

Fig. 5.5 Realization of ( $K+1$ )-variable $K$-output function


Fig. 5.6 Cascade realization in the proof
the number of outputs is at most four, and the number of inputs is $K+1$. We can prove that 8 LUTs is sufficient in a similar way to the proof of Lemma 5.2.1. The leftmost cell can be implemented with $K$ LUTs. Thus, the total number of LUTs is $K+2 K(n-K-3)+9=2 K(n-K)-5 K+9$.

## Theorem 5.2.5.

- The number of 6-LUTs needed to realize an arbitrary n-variable function with $\mu(f) \leq 40$ is $12 n-93$ or less, where $n \geq 9$.
- The number of 7-LUTs needed to realize an arbitrary n-variable function with $\mu(f) \leq 80$ is $14 n-124$ or less, where $n \geq 10$.
- The number of 8-LUTs needed to realize an arbitrary n-variable function with $\mu(f) \leq 160$ is $16 n-159$ or less, where $n \geq 11$.


### 5.3 Number of LUTs to Realize Symmetric Functions

When the given function is symmetric, it can be realized more efficiently than a general function [106,112]. Efficient algorithms to detect symmetric functions exist, e.g., $[70,91]$.

Lemma 5.3.1. Let $f$ be a symmetric function of $n$-variables. Then, $\mu(f) \leq n+1$.
Proof. Consider the partition of variables $\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{k}\right)$ and $X_{2}=\left(x_{k+1}, x_{k+2}, \ldots, x_{n}\right)$. Let $\mu_{k}$ be the column multiplicity of the decomposition. Since $f$ is symmetric, the column labels with the same weights have the same column patterns in the decomposition chart. Thus, the column multiplicity is at most $k+1$. From this, we have the lemma.

Example 5.3.1. Consider a symmetric function $f\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, x_{3}\right)$ and $X_{2}=\left(x_{4}, x_{5}, x_{6}\right)$. This is an example for $k=3$. In this case, the number of columns is $2^{3}=8$. The column label with weight 0 is $(0,0,0)$. The column labels with weight 1 are $(1,0,0),(0,1,0)$, and $(0,0,1)$. They have the same column functions. The column labels with weight 2 are $(1,1,0),(1,0,1)$, and $(0,1,1)$. They have the same column functions. And the column label with weight 3 is $(1,1,1)$. Thus, the number of different column patterns is at most four.

In Chap. 7, tighter bounds are derived.
Theorem 5.3.1. The number of $K$-LUTs needed to realize an $n$-variable symmetric function is:

- 4 or less, when $n=9$ and $K=6$. Figure 5.3 shows the realization.
- 7 or less, when $n=12$ and $K=6$. Figure 5.7 shows the realization.
- 13 or less, when $n=15$ and $K=6$. Figure 5.8 shows the realization.

Example 5.3.2. Consider SYM12 [110], a symmetric function of 12 variables. SYM12 is 1 iff the number of 1's in the inputs is between 4 and 8 .

Fig. 5.7 Realization of a symmetric function of 12 variables by 6-LUTs


Fig. 5.8 Realization of a symmetric function of 15 variables by 6-LUTs


Fig. 5.9 SYM12 realized by 6-LUTs


1. The column multiplicity of the function with respect to the bound set composed of $K=6$ variables is $\mu_{6}=7$. Thus, the number of rails between two blocks is $\left\lceil\log _{2} 7\right\rceil=3$.
2. We decompose the function with respect to the bound set: $x_{1}, x_{2}, \ldots, x_{6}$. Realize the first cell using 6-LUTs, which corresponds to the leftmost cell in Fig. 5.9.
3. The remaining function has $12-6+3=9$ variables. The bound variables for the second decomposition are three outputs of the leftmost cell, and $x_{7}, x_{8}, x_{9}$. In this case, the column multiplicity is $\mu_{9}=8$ (Problem 3.8). Thus, the number of rails between two blocks is $\left\lceil\log _{2} 8\right\rceil=3$.
4. We decompose the remaining function with the bound set: three outputs of the leftmost cell, and $x_{7}, x_{8}, x_{9}$. Realize the second cell using 6-LUTs, which corresponds to the middle cell in Fig. 5.9.
5. The remaining function has $9-6+3=6$ variables. Since the number of remaining variables is equal to $K=6$, realize the function by a 6 -LUT, which corresponds to the rightmost cell in Fig. 5.9.
6. In this way, SYM12 is realized by $3+3+1=7$ LUTs of 6 -inputs. Note that this realization is different from that shown in Fig. 5.7.

### 5.4 Remarks

In this chapter, we showed a method to realize a given function by using a cascade of LUTs. This method is only applicable to the functions whose C-measures are small.

The C-measure of a logic function $f$ is related to the size of its BDD. Sizes of BDDs for various classes of functions are considered in [154]. Classes of functions having small C-measures are considered in Chap. 7.

LSIs for LUT cascades have been fabricated [86-88]. This chapter is based on [134].

## Problems

5.1. Compare the tree-type realization in Fig. 5.7 with the cascade realization in Fig. 5.9. Discuss their advantages and disadvantages.
5.2. Consider the 4 -bit adder, where $x_{3}, x_{2}, x_{1}, x_{0}$ and $y_{3}, y_{2}, y_{1}, y_{0}$ denote the inputs, and $z_{4}, z_{3}, z_{2}, z_{1}, z_{0}$ denote the outputs. Design the adder using an LUT cascade. Use 6-LUTs. Show the expression of the output function for each LUT.
5.3. Design a 12 -input 4 -output circuit that counts the number of 1 's in the inputs and represents this by a binary number (i.e., WGT12) by 6-LUTs.
5.4. Consider a set of three functions $f_{i}$ with 7 variables. Assume that $\mu\left(f_{i}\right) \leq 40$ for $i=1,2,3$. Realize these functions by 6-LUTs, using the design method shown in the proof of Lemma 5.2.1.
5.5. Let $\mu_{k}(n)$ be the column multiplicity of a decomposition chart of an $n$-variable function with $k$ bound variables. Show the following relations:

$$
\begin{gathered}
\mu_{k+1}(n) \leq 2 \mu_{k}(n) \\
\mu_{k-1}(n) \leq \mu_{k}^{2}(n)
\end{gathered}
$$

5.6. Enumerate the 8 -variable functions whose C -measures are 32 .

## Chapter 6 <br> Encoding Method

This chapter shows a method to reduce the number of LUTs needed to realize logic functions with nonstandard encodings. In these encodings, intermediate variables in functional decomposition are represented with fewer variables. This method offers a way to find a nondisjoint decomposition.

### 6.1 Decomposition and Equivalence Class

Definition 6.1.1. Let $f\left(X_{1}, X_{2}\right)$ be a logic function and ( $X_{1}, X_{2}$ ) be a partition of $X .\left|X_{1}\right|$ denotes the number of variables in $X_{1}$. Let $B=\{0,1\}$. When $n_{1}=\left|X_{1}\right|$ and $n_{2}=\left|X_{2}\right|$, an equivalence relation $\sim$ on $B^{n_{1}}$ is defined as follows: $\vec{a} \sim \vec{b} \Longleftrightarrow$ $f\left(\vec{a}, X_{2}\right)=f\left(\vec{b}, X_{2}\right)$, where $\vec{a}, \vec{b} \in B^{n_{1}}$. Let the equivalence classes of $B^{n_{1}}$ be $\Psi_{0}, \Psi_{1}, \ldots, \Psi_{\mu-1}$. In this case, $\mu$ is equal to the column multiplicity in the decomposition chart of $f$ with the partition $\left(X_{1}, X_{2}\right) . \Psi_{i}$ is also used to represent the corresponding logic function.

Example 6.1.1. Consider the function

$$
f\left(X_{1}, X_{2}\right)=y_{0} \bar{x}_{1} \bar{x}_{2} \vee y_{1} \bar{x}_{1} x_{2} \vee y_{2} x_{1} \bar{x}_{2} \vee y_{3} x_{1} x_{2},
$$

where $X_{1}=\left(x_{1}, x_{2}, y_{0}, y_{1}\right)$ and $X_{2}=\left(y_{2}, y_{3}\right)$. The decomposition chart is shown in Fig. 6.1. The logic functions for the various equivalence classes are

$$
\begin{aligned}
& \Psi_{0}=\bar{x}_{1} \bar{x}_{2} \bar{y}_{0} \vee \bar{x}_{1} x_{2} \bar{y}_{1}=\bar{x}_{1}\left(\bar{x}_{2} \bar{y}_{0} \vee x_{2} \bar{y}_{1}\right), \\
& \Psi_{1}=x_{1} x_{2}, \\
& \Psi_{2}=x_{1} \bar{x}_{2}, \text { and } \\
& \Psi_{3}=\bar{x}_{1} \bar{x}_{2} y_{0} \vee \bar{x}_{1} x_{2} y_{1}=\bar{x}_{1}\left(\bar{x}_{2} y_{0} \vee x_{2} y_{1}\right) .
\end{aligned}
$$

Note that $\Psi_{0}$ denotes the logic function for the equivalence class of the column vector $(0,0,0,0)^{t}$, where the symbol $t$ denotes the transpose of the vector. Similarly, $\Psi_{1}$ corresponds to $(0,1,0,1)^{t}, \Psi_{2}$ corresponds to $(0,0,1,1)^{t}$, and $\Psi_{3}$ corresponds to $(1,1,1,1)^{t}$.

|  |  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | $x_{1}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $x_{2}$ |
|  |  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $y_{0}$ |
|  |  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | $y_{1}$ |
| 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |  |
| 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |  |
| 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |  |
| $y_{2}$ | $y_{3}$ |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |

Fig. 6.1 Example of a decomposition chart

### 6.2 Disjoint Encoding

Definition 6.2.1. In a functional decomposition, the minimum length encoding uses $\left\lceil\log _{2} \mu\right\rceil$ bits to encode equivalence classes: $\Psi_{0}, \Psi_{1}, \ldots, \Psi_{\mu-1}$, where $\lceil a\rceil$ denotes the minimum integer greater than $a$.

Since $\Psi_{i}(i=0,1, \ldots, \mu-1)$ represents equivalence classes, $\Psi_{i}$ has the following properties: $\Psi_{i} \cdot \Psi_{j}=0(i \neq j)$ and $\vee_{i=0}^{\mu-1} \Psi_{i}=1$.

Let the intermediate variables be $h_{1}, h_{2}, \ldots, h_{u}$, where $u=\left\lceil\log _{2} \mu\right\rceil$. Suppose that $\left(h_{u}, h_{u-1}, \ldots, h_{2}, h_{1}\right)$ denotes the index showing the equivalence class. In this encoding, all the vectors in an equivalence class are assigned the same codes. Such an encoding is a disjoint encoding. ${ }^{1}$

Example 6.2.1. Consider the function in Example 6.1.1. Note that $\mu=4$. When we use disjoint encoding:
$\Psi_{0}$ is coded by 00,
$\Psi_{1}$ is coded by 01,
$\Psi_{2}$ is coded by 10, and
$\Psi_{3}$ is coded by 11

In this case, we have

$$
\begin{aligned}
& h_{1}=\Psi_{1} \vee \Psi_{3}=x_{1} x_{2} \vee \bar{x}_{1}\left(\bar{x}_{2} y_{0} \vee x_{2} y_{1}\right), \text { and } \\
& h_{2}=\Psi_{2} \vee \Psi_{3}=x_{1} x_{2} \vee \bar{x}_{1}\left(\bar{x}_{2} y_{0} \vee x_{2} y_{1}\right) .
\end{aligned}
$$

Note that

$$
\begin{aligned}
\bar{h}_{2} \bar{h}_{1} & =\Psi_{0}, \\
\bar{h}_{2} h_{1} & =\Psi_{1}, \\
h_{2} \bar{h}_{1} & =\Psi_{2}, \text { and } \\
h_{2} h_{1} & =\Psi_{3} .
\end{aligned}
$$

[^0]However, even if $x_{1}$ is realized instead of the function $h_{2}=\Psi_{2} \vee \Psi_{3}$, we can still represent the equivalence class as follows:

$$
\begin{aligned}
& \bar{x}_{1} \bar{h}_{1}=\bar{x}_{1}\left(\bar{x}_{2} \bar{y}_{0} \vee x_{2} \bar{y}_{1}\right)=\Psi_{0}, \\
& \bar{x}_{1} h_{1}=\bar{x}_{1}\left(\bar{x}_{2} y_{0} \vee x_{2} y_{1}\right)=\Psi_{3}, \\
& x_{1} \bar{h}_{1}=x_{1} \bar{x}_{2}=\Psi_{2}, \text { and } \\
& x_{1} h_{1}=x_{1} x_{2}=\Psi_{1} .
\end{aligned}
$$

The above example shows that an appropriate encoding can simplify intermediate variables. The next section shows a systematic method to simplify intermediate variables. Here, we assume that a function is simpler if it can be represented with fewer variables.

### 6.3 Nondisjoint Encoding

In a disjoint encoding, all the vectors in an equivalence class are assigned to the same code. However, in general, we can use the code where the vectors in the same equivalence class may be assigned to different codes, as long as the vectors in the different classes are assigned to different codes. Such an encoding is a nondisjoint encoding. This often simplifies intermediate variables.

Various methods exist to encode equivalence classes: $\Psi_{0}, \Psi_{1}, \ldots$, and $\Psi_{\mu-1}$. In this chapter, we use the encoding that simplifies the intermediate variable $h_{u}$. If we can design an encoding such that $h_{u}\left(X_{1}\right)=x_{i}$, then the LUT for $h_{u}$ is not needed, since $x_{i}$ is available as an input variable.

Example 6.3.1. Consider a 7 -variable function $f\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}\right.$, $\left.x_{2}, x_{3}, x_{4}\right)$ and $X_{2}=\left(x_{5}, x_{6}, x_{7}\right)$. Assume that $f$ is partially symmetric with respect to $X_{1}$. In this case, the column multiplicity $\mu$ of the decomposition chart for $f\left(X_{1}, X_{2}\right)$ is at most 5 , since it is sufficient to identify if $0,1,2,3$, and 4 of $x_{1}, x_{2}, x_{3}$, and $x_{4}$ are 1 . Since $\mu \leq 5$ and $\left\lceil\log _{2} \mu\right\rceil \leq 3, f$ can be realized as shown in Fig. 6.2.

Assume $\mu=5$. In this case, the circuit requires three intermediate variables: $h_{1}$, $h_{2}$, and $h_{3}$.

Fig. 6.2 Realization using disjoint encoding


Table 6.1 Disjoint encoding

| $h_{3}$ | $h_{2}$ | $h_{1}$ | Number of 1 's in $X_{1}$ |
| :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 |
| 0 | 1 | 0 | 2 |
| 0 | 1 | 1 | 3 |
| 1 | 0 | 0 | 4 |

## Disjoint Encoding:

Table 6.1 shows the encoding for $h_{1}, h_{2}$, and $h_{3}$. Note that $\left(h_{3}, h_{2}, h_{1}\right)$ shows the number of 1 's in the inputs. In this case, we have

$$
\begin{aligned}
& h_{3}=S B(4,4)=x_{1} x_{2} x_{3} x_{4}, \\
& h_{2}=S B(4,2)=x_{1} x_{2} \oplus x_{1} x_{3} \oplus x_{1} x_{4} \oplus x_{2} x_{3} \oplus x_{2} x_{4} \oplus x_{3} x_{4}, \text { and } \\
& h_{1}=S B(4,1)=x_{1} \oplus x_{2} \oplus x_{3} \oplus x_{4} .
\end{aligned}
$$

Note that Table 6.1 represents WGT4, which is realized by the block A in Fig. 6.2. Since Table 6.1 uses five code words in disjoint encoding, the network for $A$ requires three 4-LUTs.

## The Encoding that Simplifies an Intermediate Variable:

For $h_{3}$, we realize $x_{1}$ instead of $S B(4,4)$ :

$$
\begin{aligned}
h_{3} & =x_{1}, \\
h_{2} & =S B(4,2), \text { and } \\
h_{1} & =S B(4,1)
\end{aligned}
$$

In this case, $\left(h_{3}, h_{2}, h_{1}\right)$ shows the number of 1's in $X_{1}$, where

$$
\begin{aligned}
& \left(h_{3}, h_{2}, h_{1}\right)=(0,0,0) X_{1} \text { has no } 1, \\
& \left(h_{3}, h_{2}, h_{1}\right)=(-, 0,1) X_{1} \text { has one } 1, \\
& \left(h_{3}, h_{2}, h_{1}\right)=(-, 1,0) X_{1} \text { has two 1's, } \\
& \left(h_{3}, h_{2}, h_{1}\right)=(-, 1,1) X_{1} \text { has three } 1 \text { 's, and } \\
& \left(h_{3}, h_{2}, h_{1}\right)=(1,0,0) X_{1} \text { has four 1's. }
\end{aligned}
$$

Since the function $h_{3}=x_{1}$ is available as an input variable, no LUT is necessary for $h_{3}$, as shown in Fig. 6.3. The network for $A^{\prime}$ realizes a 4-input 2-output function and can be implemented by only two 4-LUTs. This encoding uses eight different code words.

Fig. 6.3 Realization using encoding that simplifies $h$


Note that the nondisjoint encoding shown in the above example converted a disjoint decomposition (Fig. 6.2) into a nondisjoint decomposition (Fig. 6.3). Thus, a disjoint encoding corresponds to a disjoint decomposition, while a nondisjoint encoding corresponds to a nondisjoint decomposition. However, not every nondisjoint encoding leads to a nondisjoint decomposition.

Theorem 6.3.1. [58] Consider the decomposition

$$
f\left(X_{1}, X_{2}\right)=g\left(h_{1}\left(X_{1}\right), h_{2}\left(X_{1}\right), \ldots, h_{u}\left(X_{1}\right), X_{2}\right) .
$$

Let $\Psi_{i}\left(X_{1}\right),(i=0,1, \ldots, \mu-1)$ be the equivalence classes of the decomposition. Let $x_{j} \in X_{1}$. If the number of different nonzero functions $\bar{x}_{j} \Psi_{i},(i=$ $0,1,2, \ldots, \mu-1)$ is equal to or less than $2^{u-1}$, and the number of different nonzero functions $x_{j} \Psi_{i},(i=0,1,2, \ldots, \mu-1)$ is equal to or less than $2^{u-1}$, then $h_{u}\left(X_{1}\right)$ can be represented as $h_{u}\left(X_{1}\right)=x_{j}$.

Proof. Algorithm 6.3.1 shows the method to simplify intermediate variables.

## Algorithm 6.3.1. (Simplification of an Intermediate Variable)

1. Let $\Psi_{i}(i=0,1, \ldots, \mu-1)$ and $x_{j} \in X_{1}$ satisfy the condition of Theorem 6.3.1. To $\bar{x}_{j} \Psi_{i} \neq 0$, assign code $v$ from 0 to $2^{u-1}-1$.
2. To $x_{j} \Psi_{i} \neq 0$, if the function $\bar{x}_{j} \Psi_{i}$ is already assigned a code $v$ in the previous step, assign a code $v+2^{u-1}$.
3. If there exists a function $x_{j} \Psi_{i}$ which has not been assigned a code yet, assign an unused code $t$, where $2^{u-1} \leq t<2^{u}$.

We use examples to show the algorithm.
Example 6.3.2. Consider the decomposition $f\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, x_{3}\right)$. Let the equivalence classes for the decomposition be

$$
\begin{aligned}
& \Psi_{0}=\bar{x}_{1} \bar{x}_{2} \bar{x}_{3}, \\
& \Psi_{1}=\bar{x}_{1} x_{2} \vee \bar{x}_{2} x_{3} \vee \bar{x}_{3} x_{1}, \text { and } \\
& \Psi_{2}=x_{1} x_{2} x_{3} .
\end{aligned}
$$

## Disjoint Encoding:

Since $\mu=3$, a disjoint encoding requires $u=\left\lceil\log _{2} 3\right\rceil=2$ intermediate variables:

$$
\begin{aligned}
h_{1} & =\Psi_{1}=\bar{x}_{1} x_{2} \vee \bar{x}_{2} x_{3} \vee \bar{x}_{3} x_{1}, \text { and } \\
h_{2} & =\Psi_{2}=x_{1} x_{2} x_{3} .
\end{aligned}
$$

## Encoding that Simplifies an Intermediate Variable:

Next, let us use Algorithm 6.3.1 to simplify an intermediate variable. First, confirm that $\Psi_{i},(i=0,1,2)$ satisfies the conditions of Theorem 6.3.1:

$$
\begin{aligned}
& \bar{x}_{1} \Psi_{0}=\bar{x}_{1} \bar{x}_{2} \bar{x}_{3} \\
& \bar{x}_{1} \Psi_{1}=\bar{x}_{1}\left(x_{2} \vee \bar{x}_{2} x_{3}\right), \\
& \bar{x}_{1} \Psi_{2}=0
\end{aligned}
$$

and

$$
\begin{aligned}
& x_{1} \Psi_{0}=0 \\
& x_{1} \Psi_{1}=x_{1}\left(\bar{x}_{2} x_{3} \vee \bar{x}_{3}\right), \text { and } \\
& x_{1} \Psi_{2}=x_{1} x_{2} x_{3} .
\end{aligned}
$$

Since the number of nonzero functions is $2 \leq 2^{u-1}$, where $u=\left\lceil\log _{2} 3\right\rceil=2$, the conditions are satisfied. Next, assign codes to the columns:

> To $\bar{x}_{1} \Psi_{0}$ assign 00,
> to $\bar{x}_{1} \Psi_{1}$ assign 01, to $x_{1} \Psi_{1}$ assign 11 , and to $x_{1} \Psi_{2}$ assign 10.

Thus, we have the following intermediate variables:

$$
\begin{aligned}
& h_{1}=\bar{x}_{1} \Psi_{1} \vee x_{1} \Psi_{1}=\bar{x}_{1}\left(x_{2} \vee \bar{x}_{2} x_{3}\right) \vee x_{1}\left(\bar{x}_{2} x_{3} \vee \bar{x}_{3}\right)=\Psi_{1} \\
& h_{2}=x_{1} \Psi_{1} \vee x_{1} \Psi_{2}=x_{1}
\end{aligned}
$$

Note that

$$
\begin{aligned}
& \bar{h}_{2} \bar{h}_{1}=\bar{x}_{1} \bar{x}_{2} \bar{x}_{3}=\Psi_{0} \\
& h_{2} \bar{h}_{1}=x_{1} x_{2} x_{3}=\Psi_{2} .
\end{aligned}
$$

Therefore, $\Psi_{i}(i=0,1,2)$ can be represented by $\left(h_{2}, h_{1}\right)$. In this encoding, only the LUTs for $h_{1}$ are needed. No LUT is necessary to implement $h_{2}=x_{1}$, since $x_{1}$ is available as an input.

Example 6.3.3. Consider a function $f\left(X_{1}, X_{2}\right)$, where $f$ is partially symmetric with respect to $X_{1}=\left(x_{1}, x_{2}, x_{3}, x_{4}\right)$. Also, assume that the column multiplicity for the decomposition $f\left(X_{1}, X_{2}\right)$ is five. In this case, the equivalence classes for the decomposition for $f$ with the partition $\left(X_{1}, X_{2}\right)$ are

$$
\begin{aligned}
& \Psi_{0}=S_{0}^{4}\left(x_{1}, x_{2}, x_{3}, x_{4}\right)=\bar{x}_{1} \bar{x}_{2} \bar{x}_{3} \bar{x}_{3}, \\
& \Psi_{1}=S_{1}^{4}\left(x_{1}, x_{2}, x_{3}, x_{4}\right), \\
& \Psi_{2}=S_{2}^{4}\left(x_{1}, x_{2}, x_{3}, x_{4}\right), \\
& \Psi_{3}=S_{3}^{4}\left(x_{1}, x_{2}, x_{3}, x_{4}\right), \\
& \Psi_{4}=S_{4}^{4}\left(x_{1}, x_{2}, x_{3}, x_{4}\right)=x_{1} x_{2} x_{3} x_{4} .
\end{aligned}
$$

Since $\mu=5$, e variables. Note that

$$
\begin{aligned}
& \bar{x}_{1} \Psi_{4}=0, \text { and } \\
& x_{1} \Psi_{0}=0
\end{aligned}
$$

In this case $\Psi_{i},(i=0,1, \ldots, 4)$ satisfies the conditions of Theorem 6.3.1. Thus, an intermediate variable $h_{3}$ can be simplified to $x_{1}$. Next, assign codes to columns as follows:

$$
\begin{aligned}
& \text { To } \bar{x}_{1} \Psi_{0} \text { assign } 000, \\
& \text { to } \bar{x}_{1} \Psi_{1} \text { assign } 001, \\
& \text { to } \bar{x}_{1} \Psi_{2} \text { assign } 010, \\
& \text { to } \bar{x}_{1} \Psi_{3} \text { assign } 011, \\
& \text { to } x_{1} \Psi_{1} \text { assign } 101, \\
& \text { to } x_{1} \Psi_{2} \text { assign } 110 \text {. } \\
& \text { to } x_{1} \Psi_{3} \text { assign } 111, \text { and } \\
& \text { to } x_{1} \Psi_{4} \text { assign } 100 .
\end{aligned}
$$

Using these codes, we derive the following expressions for the intermediate variables:

$$
\begin{aligned}
& h_{1}=\bar{x}_{1} \Psi_{1} \vee \bar{x}_{1} \Psi_{3} \vee x_{1} \Psi_{1} \vee x_{1} \Psi_{3}=\Psi_{1} \vee \Psi_{3} \\
& h_{2}=\bar{x}_{1} \Psi_{2} \vee \bar{x}_{1} \Psi_{3} \vee x_{1} \Psi_{2} \vee x_{1} \Psi_{3}=\Psi_{2} \vee \Psi_{3} \\
& h_{3}=x_{1} \Psi_{1} \vee x_{1} \Psi_{2} \vee x_{1} \Psi_{3} \vee x_{1} \Psi_{4}=x_{1}
\end{aligned}
$$

Note that in this case,

$$
\begin{aligned}
\bar{h}_{3} \bar{h}_{2} \bar{h}_{1} & =\Psi_{0}, \\
\bar{h}_{2} h_{1} & =\Psi_{1}, \\
h_{2} \bar{h}_{1} & =\Psi_{2}, \\
h_{2} h_{1} & =\Psi_{3}, \text { and } \\
h_{3} \bar{h}_{2} \bar{h}_{1} & =\Psi_{4} .
\end{aligned}
$$

Fig. 6.4 WGT7


Fig. 6.5 Realization of WGT7 with 8 LUTs

In this encoding, only the LUTs for $h_{1}$ and $h_{2}$ are needed. No LUT is necessary to implement $h_{3}=x_{1}$, since $x_{1}$ is available as an input.

Example 6.3.4. Realize WGT7 using LUTs with up to 5 inputs.
(Solution) WGT7 has seven inputs and three outputs as shown in Fig. 6.4. It counts the number 1 of 1 's in the input, and represents it by a binary number $\left(y_{2}, y_{1}, y_{0}\right)$, where $y_{2}=S B(7,4), y_{1}=S B(7,2)$, and $y_{0}=S B(7,1)$. Let $X$ be partitioned as $\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, x_{3}, x_{4}\right)$ and $X_{2}=\left(x_{5}, x_{6}, x_{7}\right)$. Note that the functions are symmetric with respect to $X_{1}$ and $X_{2}$. The column multiplicity of the decomposition chart ( $X_{1}, X_{2}$ ) is five. So, the straightforward realization produces the network shown in Fig. 6.5, where WGT4 is a 4-input bit-counting circuit and produces three functions:

$$
\begin{aligned}
& h_{3}=S B(4,4) \\
& h_{2}=x_{1} x_{2} x_{3} x_{4}, \\
& h_{1}=S B(4,2)=x_{1}\left(x_{2} \oplus x_{3} \oplus x_{4}\right) \oplus x_{2}\left(x_{3} \oplus x_{4}\right) \oplus x_{3} x_{4}, \text { and } \\
&=x_{1} \oplus x_{2} \oplus x_{3} \oplus x_{4}
\end{aligned}
$$

Also, WGT3 is a 3-input bit-counting circuit (i.e., a full adder) and produces two functions:

$$
\begin{aligned}
& h_{5}=S B(3,2)=x_{5} x_{6} \oplus x_{6} x_{7} \oplus x_{7} x_{5}, \text { and } \\
& h_{4}=S B(3,1)=x_{5} \oplus x_{6} \oplus x_{7} .
\end{aligned}
$$

$G_{1}$ adds two 2-bit numbers $\left(h_{2}, h_{1}\right)$ and $\left(h_{5}, h_{4}\right)$, producing the two least significant bits of the sum. And $G_{2}$ adds a 2-bit number $\left(h_{5}, h_{4}\right)$ and a 3-bit number ( $h_{3}, h_{2}, h_{1}$ ), producing the most significant bit of the sum. In Fig. 6.5, WGT4 has three outputs and requires three LUTs. Note that


Fig. 6.6 Realization of WGT7 with 7 LUTs

$$
\begin{aligned}
& y_{2}=h_{3} \oplus h_{4} h_{5} \oplus h_{1} h_{4}\left(h_{2} \oplus h_{5}\right), \\
& y_{1}=h_{2} \oplus h_{5} \oplus h_{1} h_{4}, \text { and } \\
& y_{0}=h_{1} \oplus h_{4} .
\end{aligned}
$$

Since each output of $G_{1}$ and $G_{2}$ in Fig. 6.5 requires one LUT, we need 8 LUTs in total. However, if $h_{3}=S B(4,4)$ is replaced by $x_{1}$ as shown in Fig. 6.6, we need only 7 LUTs. In this case, we use the relation $h_{3}=x_{1} \bar{h}_{1} \bar{h}_{2}$, and

$$
y_{2}=x_{1} \bar{h}_{1} \bar{h}_{2} \oplus h_{2} h_{5} \oplus h_{1} h_{4}\left(h_{2} \oplus h_{5}\right) .
$$

Example 6.3.5. Realize WGT8 by LUTs with up to 5 inputs.
(Solution) WGT8 realizes the four functions $\mathrm{SB}(8,8), \mathrm{SB}(8,4), \mathrm{SB}(8,2)$, and $\mathrm{SB}(8,1)$. Let $X$ be partitioned as $X=\left(X_{1}, X_{2}, X_{3}\right)$, where $X_{1}=\left(x_{1}, x_{2}, x_{3}, x_{4}\right)$, $X_{2}=\left(x_{5}, x_{6}\right)$, and $X_{3}=\left(x_{7}, x_{8}\right)$.

First, realize

$$
\begin{aligned}
& S B(4,4)=x_{1} x_{2} x_{3} x_{4}, \\
& S B(4,2)=x_{1}\left(x_{2} \oplus x_{3} \oplus x_{4}\right) \oplus x_{2}\left(x_{3} \oplus x_{4}\right) \oplus x_{3} x_{4}, \text { and } \\
& S B(4,1)=x_{1} \oplus x_{2} \oplus x_{3} \oplus x_{4} .
\end{aligned}
$$

Then, realize

$$
\begin{aligned}
& S B(6,4)=S B(4,4) \oplus S B(4,3)\left(x_{5} \oplus x_{6}\right) \oplus S B(4,2) x_{5} x_{6}, \\
& S B(6,2)=S B(4,2) \oplus S B(4,1)\left(x_{5} \oplus x_{6}\right) \oplus x_{5} x_{6}, \text { and } \\
& S B(6,1)=S B(4,1) \oplus x_{5} \oplus x_{6} .
\end{aligned}
$$

Note that we use the relation in Lemma 3.5.3:

$$
S B(4,3)=S B(4,2) S B(4,1) .
$$

Fig. 6.7 Realization of WGT8


Finally, realize

$$
\begin{aligned}
& S B(8,8)=S B(6,6) x_{7} x_{8}, \\
& S B(8,4)=S B(6,4) \oplus S B(6,3)\left(x_{7} \oplus x_{8}\right) \oplus S B(6,2) x_{7} x_{8}, \\
& S B(8,2)=S B(6,2) \oplus S B(6,1)\left(x_{7} \oplus x_{8}\right) \oplus x_{7} x_{8}, \text { and } \\
& S B(8,1)=S B(6,1) \oplus x_{7} \oplus x_{8} .
\end{aligned}
$$

In this case, we use the relation in Lemma 3.5.3:

$$
\begin{aligned}
& S B(6,6)=S B(6,4) S B(6,2), \text { and } \\
& S B(6,3)=S B(6,2) S B(6,1) .
\end{aligned}
$$

Thus, WGT8 is realized as Fig. 6.7. However, if we use the relation

$$
S B(4,4)=x_{1} x_{2} x_{3} x_{4}=x_{1} \overline{S B(4,2)} \overline{S B(4,1)}
$$

the LUT for $S B(4,4)$ can be replaced by a variable $x_{1}$. Thus, WGT8 requires only 9 LUTs.

Example 6.3.6. Realize the 9-input symmetric function SYM9 using LUTs with up to 5 inputs.
(Solution) SYM9 is represented as

$$
f=S_{\{3,4,5,6\}}^{9}\left(x_{1}, x_{2}, \ldots, x_{9}\right)
$$

$f=1$ if and only if the number of 1 's in the input is $3,4,5$, or 6 . Suppose that the function is decomposed as $f\left(X_{1}, X_{2}, X_{3}\right)$, where $X_{1}=\left(x_{1}, x_{2}, x_{3}, x_{4}, x_{5}\right)$, $X_{2}=\left(x_{6}, x_{7}\right)$, and $X_{3}=\left(x_{8}, x_{9}\right)$. When $X_{1}$ is the set of bound variables, the equivalence classes are

$$
\begin{aligned}
& \Psi_{0}=S_{0}^{5}\left(x_{1}, x_{2}, x_{3}, x_{4}, x_{5}\right)=\bar{x}_{1} \bar{x}_{2} \bar{x}_{3} \bar{x}_{4} \bar{x}_{5}, \\
& \Psi_{1}=S_{1}^{5}\left(x_{1}, x_{2}, x_{3}, x_{4}, x_{5}\right), \\
& \Psi_{2}=S_{2}^{5}\left(x_{1}, x_{2}, x_{3}, x_{4}, x_{5}\right), \\
& \Psi_{3}=S_{3}^{5}\left(x_{1}, x_{2}, x_{3}, x_{4}, x_{5}\right), \\
& \Psi_{4}=S_{4}^{5}\left(x_{1}, x_{2}, x_{3}, x_{4}, x_{5}\right), \text { and } \\
& \Psi_{5}=S_{5}^{5}\left(x_{1}, x_{2}, x_{3}, x_{4}, x_{5}\right)=x_{1} x_{2} x_{3} x_{4} x_{5} .
\end{aligned}
$$

Suppose that the first cell realizes WGT5. Next, consider the decomposition where $X_{1}$ and $X_{2}$ are bound variables. Suppose that the second cell realizes WGT7 as shown in Fig. 6.8, where

$$
\begin{aligned}
& y_{2}=S B(7,4), \\
& y_{1}=S B(7,2), \text { and } \\
& y_{0}=S B(7,1) .
\end{aligned}
$$

From the definition of SYM9, we have Table 6.2 showing the function of the rightmost cell. As shown in Fig. 6.8, the network for Table 6.2 requires only one LUT. Thus, SYM9 is realized by seven 5-LUTs.

Fig. 6.8 Realization for SYM9

Table 6.2 Truth table for the rightmost cell


| $y_{2}$ | $y_{1}$ | $y_{0}$ | $x_{8}$ | $x_{9}$ | f |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | - | - | 0 |
| 0 | 0 | 1 | 1 | 1 | 1 |
| 0 | 1 | 0 | 1 | - | 1 |
| 0 | 1 | 0 | - | 1 | 1 |
| 0 | 1 | 1 | - | - | 1 |
| 1 | 0 | 0 | - | - | 1 |
| 1 | 0 | 1 | 0 | - | 1 |
| 1 | 0 | 1 | - | 0 | 1 |
| 1 | 1 | 0 | 0 | 0 | 1 |
| 1 | 1 | 1 | - | - | 0 |

Next, consider whether any output function of WGT7 can be simplified. Table 6.2 shows that when WGT7 produces $(0,1,1)$ and $(1,0,0)$, the value of the function is 1 , independent of the values of $x_{8}$ and $x_{9}$. Thus, the equivalence classes are

$$
\begin{aligned}
& \Psi_{0}=S_{0}^{7}\left(x_{1}, x_{2}, \ldots, x_{7}\right)=\bar{x}_{1} \bar{x}_{2} \ldots \bar{x}_{6} \bar{x}_{7}, \\
& \Psi_{1}=S_{1}^{7}\left(x_{1}, x_{2}, \ldots, x_{7}\right), \\
& \Psi_{2}=S_{2}^{7}\left(x_{1}, x_{2}, \ldots, x_{7}\right), \\
& \Psi_{3}=S_{\{3,4\}}^{7}\left(x_{1}, x_{2}, \ldots, x_{7}\right), \\
& \Psi_{4}=S_{5}^{7}\left(x_{1}, x_{2}, \ldots, x_{7}\right), \\
& \Psi_{5}=S_{6}^{7}\left(x_{1}, x_{2}, \ldots, x_{7}\right), \text { and } \\
& \Psi_{6}=S_{7}^{7}\left(x_{1}, x_{2}, \ldots, x_{7}\right)=x_{1} x_{2} \ldots x_{6} x_{7} .
\end{aligned}
$$

In this case, we cannot simplify any output of WGT7 by using Theorem 6.3.1, since the number of nonzero functions in $x_{1} \Psi_{i}$ and $\bar{x}_{1} \Psi_{i}$, where $(i=0,1,2$, $3,4,5,6)$, is both six.

Example 6.3.7. Design a 2-MUX using 4-LUTs, where a 2-MUX realizes the function

$$
f\left(y_{0}, y_{1}, y_{2}, y_{3}, x_{1}, x_{2}\right)=y_{0} \bar{x}_{1} \bar{x}_{2} \vee y_{1} \bar{x}_{1} x_{2} \vee y_{2} x_{1} \bar{x}_{2} \vee y_{3} x_{1} x_{2} .
$$

(Solution) Let $\left(X_{1}, X_{2}\right)$ be a partition of the input variables, where $X_{1}=$ ( $y_{0}, y_{1}, x_{1}, x_{2}$ ) and $X_{2}=\left(y_{2}, y_{3}\right)$. The equivalence classes for the decomposition are

$$
\begin{aligned}
& \Psi_{0}=\bar{x}_{1}\left(\bar{x}_{2} \bar{y}_{0} \vee x_{2} \bar{y}_{1}\right), \\
& \Psi_{1}=\bar{x}_{1}\left(\bar{x}_{2} y_{0} \vee x_{2} y_{1}\right), \\
& \Psi_{2}=x_{1} \bar{x}_{2}, \text { and } \\
& \Psi_{3}=x_{1} x_{2} .
\end{aligned}
$$

Since $\mu=4$, disjoint encoding requires two intermediate variables $h_{1}$ and $h_{2}$. Note that

$$
\begin{aligned}
& x_{1} \Psi_{0}=0 \\
& x_{1} \Psi_{1}=0
\end{aligned}
$$

and

$$
\begin{aligned}
& \bar{x}_{1} \Psi_{2}=0 \\
& \bar{x}_{1} \Psi_{3}=0 .
\end{aligned}
$$

Thus, $\Psi_{i}(i=0,1,2,3)$ satisfies the conditions of Theorem 6.3.1. Next, assign codes ( $h_{2}, h_{1}$ ) to columns as follows:

$$
\begin{aligned}
& \text { To } \bar{x}_{1} \Psi_{0}=\Psi_{0} \text { assign a code } 00 \\
& \text { to } \bar{x}_{1} \Psi_{1}=\Psi_{1} \text { assign a code } 01, \\
& \text { to } x_{1} \Psi_{2}=\Psi_{2} \text { assign a code } 10, \text { and } \\
& \text { to } x_{1} \Psi_{3}=\Psi_{3} \text { assign a code } 11
\end{aligned}
$$

In this case, we have the following intermediate variables:

$$
\begin{aligned}
& h_{2}=x_{1} \Psi_{2} \vee x_{1} \Psi_{3}=x_{1}, \\
& h_{1}=\bar{x}_{1} \Psi_{1} \vee x_{1} \Psi_{3}=\bar{x}_{1}\left(\bar{x}_{2} y_{0} \vee x_{2} y_{1}\right) \vee x_{1} x_{2}=g .
\end{aligned}
$$

Note that $f$ can be represented as

$$
\begin{aligned}
f\left(y_{0}, y_{1}, y_{2}, y_{3}, x_{1}, y_{2}\right) & =\Psi_{1} \vee \Psi_{2} y_{2} \vee \Psi_{3} y_{3}, \\
& =\bar{h}_{2} h_{1} \vee h_{2} \bar{h}_{1} y_{2} \vee h_{2} h_{1} y_{3}, \\
& =\bar{x}_{1} g \vee x_{1}\left(\bar{g} y_{2} \vee g y_{3}\right) .
\end{aligned}
$$

In this way, a 2-MUX can be realized by two 4-LUTs as shown in Fig. 4.4. This is a method to find a nondisjoint decomposition of Lemma 4.1.2.

### 6.4 Remarks

In this chapter, nondisjoint encoding was introduced. It reduces the number of LUTs in a cascade realization by deriving a nondisjoint decomposition. Nondisjoint encoding for symmetric functions was considered in [112]. Experimental results show that with nondisjoint encodings, the number of LUTs can be reduced by $10-30 \%$ [35, 68, 69], when multiple-output functions were implemented as encoded characteristic function for nonzero outputs (ECFNs). This chapter is based on [112]. Encoding method in the decomposition that minimized the support is considered in [21,58, 155]

## Problems

6.1. Consider the decomposition $f\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, x_{3}, x_{4}\right)$ and $X_{2}=\left(x_{5}, x_{6}, x_{7}, x_{8}\right)$. Let the equivalence classes of the decomposition be

$$
\begin{aligned}
& \Psi_{0}=\bar{x}_{1} \bar{x}_{2} \bar{x}_{3}, \\
& \Psi_{1}=\bar{x}_{1} x_{2} \bar{x}_{3},
\end{aligned}
$$



Fig. 6.9 Simplification of intermediate variables

$$
\begin{aligned}
& \Psi_{2}=x_{1} \bar{x}_{3} x_{4} \\
& \Psi_{3}=x_{1} \bar{x}_{3} \bar{x}_{4} \\
& \Psi_{4}=x_{3}
\end{aligned}
$$

Simplify an intermediate variable if possible.
6.2. Realize the 8 -variable symmetric function $f=S_{\{0,8\}}^{8}$ using 5-LUTs. Note that $f=1$ iff $\sum_{i=1}^{8} x_{i}=0$ or 8 .
6.3. By extending Theorem 6.3.1, obtain the condition that the intermediate variables $h_{u}$ can be represented by $x_{j}$, and $h_{u-1}$ can be represented by $x_{k}$, where $j \neq k$. For example, in Fig. 6.9, two intermediate variables $h_{3}$ and $h_{2}$ are replaced by input variables $x_{1}$ and $x_{3}$, respectively.

## Chapter 7 <br> Functions with Small C-Measures

Recall that the C -measure of a function is the maximum column multiplicity among a set of functional decompositions $f\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{k}\right)$ and $X_{2}=\left(x_{k+1}, x_{k+2}, \ldots, x_{n}\right)$. The C-measure tends to increase exponentially with the number of input variables, $n$. However, many practical functions have small C-measures. This chapter considers classes of functions whose C-measures are small. Such functions can be efficiently realized by LUT cascades.

### 7.1 C-Measure and BDDs

The column multiplicity of a decomposition chart is equal to the width of the quasi-reduced multi-terminal binary decision diagram (QRMTBDD). So, the C-measure of a logic function is equal to the maximum width of the MTBDD for the given ordering of the input variables.

Example 7.1.1. Consider two functions:

$$
f_{1}\left(x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{6}\right)=x_{1} x_{2} \vee x_{3} x_{4} \vee x_{5} x_{6}
$$

and

$$
f_{2}\left(x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{6}\right)=x_{1} x_{4} \vee x_{2} x_{5} \vee x_{3} x_{6} .
$$

Figure 7.1 shows the BDDs for $f_{1}$ and $f_{2}$. In this case, the C -measures of $f_{1}$ and $f_{2}$ are 3 and 8 , respectively. Note that $f_{2}$ can be obtained from $f_{1}$ by permuting the input variables. We consider these functions to be different.

Lemma 7.1.1. Consider a pair of $2 n$ variable functions:

$$
f_{1}\left(x_{1}, x_{2}, \ldots, x_{2 n-1}, x_{2 n}\right)=x_{1} x_{2} \vee x_{3} x_{4} \vee \cdots \vee x_{2 n-1} x_{2 n}
$$

and

$$
f_{2}\left(x_{1}, x_{2}, \ldots x_{2 n-1}, x_{2 n}\right)=x_{1} x_{n+1} \vee x_{2} x_{n+2} \vee \cdots \vee x_{n} x_{2 n}
$$

Then, $\mu\left(f_{1}\right)=3$ and $\mu\left(f_{2}\right)=2^{n}$.


Fig. 7.1 BDD for equivalent functions

For certain function, the C-measure is large, but when variables are permuted, the C-measure becomes smaller. For the other function, the C-measure is large for any permutation of the input variables. For a given logic function and variable ordering, the C-measure is easy to obtain and is uniquely defined.

### 7.2 Symmetric Functions

Functions that appear in arithmetic circuits often have symmetries. When logic functions have some symmetries, they are often realized using fewer elements.

Definition 7.2.1. A function $f$ is a totally symmetric function if any permutation of the variables in $f$ does not change the function. A totally symmetric function is also called a symmetric function.

Lemma 7.2.1. Let $f$ be a symmetric function of $n$ variables. Then,

$$
\mu(f) \leq \max _{k=1}^{n} \min \left\{k+1,2^{n-k+1}\right\} .
$$

Proof. Consider the decomposition chart of $f$, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{k}\right)$ denotes the bound variables and $X_{2}=\left(x_{k+1}, x_{k+2}, \ldots, x_{n}\right)$ denotes the free variables. The number of different column functions is at most $k+1$, since the column pattern depends only on the number of 1's in the bound variables. Also, by Lemma 3.5.1, the number of column functions is at most $2^{n-k+1}$, since the column functions are symmetric functions of $n-k$ variables.

Theorem 7.2.1. Let $f$ be a symmetric function $f$ of $n$ variables. Then,

$$
\begin{aligned}
& \mu(f) \leq n, \quad(n \geq 4) . \\
& \mu(f) \leq n-1,(n \geq 9) . \\
& \mu(f) \leq n-2,(n \geq 18) . \\
& \mu(f) \leq n-3,(n \geq 35) .
\end{aligned}
$$

Proof. From Lemma 7.2.1, $\mu_{k}=\max _{k=1}^{n} \min \left\{k+1,2^{n-k+1}\right\}$. Consider the cases, where $k=n, n-1, n-2, n-3$ and $n-4$. In these cases, we have

$$
\begin{aligned}
\mu_{n} & =\min \left\{n+1,2^{1}\right\}=2,(n \geq 1), \\
\mu_{n-1} & =\min \left\{n+0,2^{2}\right\}=4,(n \geq 4), \\
\mu_{n-2} & =\min \left\{n-1,2^{3}\right\}=8,(n \geq 9), \\
\mu_{n-3} & =\min \left\{n-2,2^{4}\right\}=16,(n \geq 18), \text { and } \\
\mu_{n-4} & =\min \left\{n-3,2^{5}\right\}=32,(n \geq 35) .
\end{aligned}
$$

From the above, we have the theorem.

### 7.3 Sparse Functions

An integer function whose number of nonzero output values is much smaller than the total number of input combinations is called sparse. Sparse functions can be efficiently realized by an LUT cascade.

Theorem 7.3.1. Let $f$ be an integer valued function or a logic function with weight $k$. Then, $\mu(f) \leq k+1$.

Proof. Consider a decomposition chart with the maximum column multiplicity. In this case, each nonzero element corresponds to a unique column pattern. Also, there can be a column with all zero elements. Thus, there is no decomposition chart with greater column multiplicity. Hence, we have the theorem.

Sparse functions are considered in Chap. 8.

### 7.4 LPM Functions

The longest prefix match ( $\mathbf{L P M}$ ) problem is to determine the output port address from a list of prefix vectors stored in memory based on the longest match. It is solved by the internet routers to forward packets of data.
Definition 7.4.1. [125] The LPM table stores distinct ternary vectors of the form $V E C_{1} \cdot V E C_{2}$, where $V E C_{1}$ is a string of 0 's and 1 's, and $V E C_{2}$ is a string of *'s.

To assure that the longest prefix address is produced, LPM entries are stored in descending prefix length. The first match determines the LPM tables output. The corresponding LPM function is a logic function $\vec{f}: B^{n} \rightarrow B^{m}$, where $\vec{f}(\vec{x})$ is the smallest index of an entry that is identical to $\vec{x}$ except possibly for don't care values. If no such entry exists, $\vec{f}(\vec{x})=0^{m}$. A circuit that realizes the LPM function is an LPM index generator.

Example 7.4.1. Consider the LPM table shown in Table 7.1. In the third row $V E C_{1}=01$ and $V E C_{2}={ }^{* *}$, while in the last row $V E C_{1}=0$ and $V E C_{2}={ }^{* * *}$. Table 7.2 shows the corresponding LPM function. The output is the index corresponding to the index of the longest prefix that matches the input.

Example 7.4.2. In Internet Protocol version 4 (IPv4), an IP address is represented by 32 bits or 4 bytes. An IP address is often represented by four decimal numbers, each representing a byte. For example, 66.249 .122 .7 corresponds to the 32 -bit binary number
01000010.11111001 .01111010 .00000111 .

Table 7.1 LPM table

| Vector |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- |
| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ |  |
| 1 | Index |  |  |  |
| 0 | 0 | 0 | 0 | 1 |
| 0 | 1 | 0 | $*$ | 2 |
| 1 | $*$ | $*$ | $*$ | 3 |
| 0 | $*$ | $*$ | $*$ | 4 |

Table 7.2 LPM function
truth table

| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $f_{2}$ | $f_{1}$ | $f_{0}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 0 | 0 | 0 | 1 | 1 | 0 | 1 |
| 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0 | 0 | 1 | 1 | 1 | 0 | 1 |
| 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 1 | 0 | 1 | 0 |
| 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0 | 1 | 1 | 1 | 0 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | 1 | 0 | 0 |
| 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 1 | 0 | 1 | 1 | 1 | 0 | 0 |
| 1 | 1 | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | 0 | 1 | 0 | 0 |
| 1 | 1 | 1 | 1 | 1 | 0 | 0 |

That is, 66 in the decimal number represents the binary number 01000010; 249 in the decimal number represents the binary number 11111001; 122 in the decimal number represents the binary number 01111010; and 7 in the decimal number represents the binary number 00000111.

Consider the IP forwarding table shown in Fig. 7.2. It finds the longest prefix that matches the incoming destination address, and produces the corresponding next hop address. For example, when the input address is 66.249 .122 .7 , the LPM is the first entry. Note that in this case, both the first and the second entries match. The first entry matches three bytes ( 66,249 , and 122 ), while the second entry matches only two bytes (66 and 249). Thus, we select the first element, and its specified next hop address 161.4.2.22.

Ternary content addressable memory, (TCAM) explained in Chap. 2, directly realizes an LPM function.

Example 7.4.3. Figure 7.2 is an example of a forwarding table, and Fig. 7.3 shows the TCAM for the table. In the TCAM, the entries are sorted in the order of decreasing prefix length. When the incoming address is 66.249 .122 .7 , both the first and the second prefixes match. The priority encoder selects the least index.

| 66.249.122.7 | Prefix | Next-hop |
| :---: | :---: | :---: |
|  | 66.249.122 | 161.4.2.22 |
|  | 66.249 | 161.4.2.4 |
|  | 65.52 | 125.33.32.98 |
|  | 58 | 120.3.3.1 |
|  | 202 | 10.0.0.111 |

Fig. 7.2 Example of IP look-up


Fig. 7.3 Longest prefix matching by TCAM

Theorem 7.4.1. [125] Let $f$ be an LPM function with $k$ vectors. Then, $\mu(f) \leq$ $k+1$.

Proof. Consider a decomposition chart where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{t}\right)$ denotes the bound variables and $X_{2}=\left(x_{t+1}, x_{t+2}, \ldots, x_{n}\right)$ denotes the free variables. We prove the theorem for the case where all $k$ entries in the LPM table map to distinct output values. This is the worst case, since forcing certain entries to have the same output value can only reduce the column multiplicity.

We prove the theorem by counting the number of distinct columns in the decomposition chart as LPM vectors are added to the LPM table. Reorder the LPM vectors so that those vectors with the most * entries are first and those with the fewest are last. An empty decomposition chart has a unique column pattern (all 0's). Let the first vector be $\vec{\alpha}=\left(a_{1}, a_{2}, \ldots, a_{m}, *, *, \ldots, *\right)$, where $a_{j} \in B=\{0,1\}$. If $m>t$, then the first vector changes only a proper subset of elements in one column. If $m=t(m<t)$, then the new vector changes all elements in one (or more) complete column(s) to the vector's output value in the LPM table. In either case, at most one distinct column pattern is added to the decomposition chart.

Because the second vector has no more * entries than the first vector, adding it will change columns only among a subset of the two distinct columns so far in the decomposition chart. Let the new vector be $\vec{\beta}=\left(b_{1}, b_{2}, \ldots, b_{m^{\prime}}, *, *, \ldots, *\right)$, where $b_{j} \in B$. If $b_{i}=a_{i}$, for all $1 \leq i \leq m^{\prime}$, then a subset of the columns created by adding $\bar{\alpha}$ to the empty decomposition chart are changed. Otherwise, a subset of the columns containing all 0 's are changed. In either case, at most one additional column pattern is added. This process continues until all vectors are exhausted. In all, at most $k+1$ column patterns are created. The theorem follows.

### 7.5 Segment Index Encoder Function

Definition 7.5.1. [116] A Segment Index Encoder (SIE) function $g(X)$ is a mapping: $g: I \rightarrow I$, where $I$ is a set of non-negative integers, and $a \leq b$ implies $g(a) \leq g(b)$. Segment index logic function $f$ is the SIE function represented by binary variables. We assume that the integer represented by $X=$ $\left(x_{n-1}, x_{n-2}, \ldots, x_{1}, x_{0}\right)$ is $\sum_{i=0}^{n-1} 2^{i} x_{i}$.

Example 7.5.1. Table 7.3 shows an example of a SIE function. Note that it is a monotone-increasing function with respect to $X=\sum_{i=0}^{3} 2^{i} x_{i}$.

Theorem 7.5.1. [116] Let $k$ be the number of segments in an segment index function $g$. Let $f$ be the corresponding segment index logic function. Then, $\mu(f) \leq k$. We assume that the integer represented by $X=\left(x_{n-1}, x_{n-2}, \ldots\right.$, $\left.x_{1}, x_{0}\right)$ is $\sum_{i=0}^{n-1} 2^{i} x_{i}$.

Proof. Consider the decomposition chart of a segment index logic function $f\left(X_{1}, X_{2}\right)$. Let $X_{1}=\left(x_{n-p-1}, x_{n-p-2}, \ldots, x_{0}\right)$ and $X_{2}=\left(x_{n-1}, x_{n-2}, \ldots, x_{n-p}\right)$.

Table 7.3 Example of segment index encoder function

| $x_{3}$ | $x_{2}$ | $x_{1}$ | $x_{0}$ | $f$ |
| :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 1 | 1 |
| 0 | 0 | 1 | 0 | 1 |
| 0 | 0 | 1 | 1 | 1 |
| 0 | 1 | 0 | 0 | 2 |
| 0 | 1 | 0 | 1 | 2 |
| 0 | 1 | 1 | 0 | 2 |
| 0 | 1 | 1 | 1 | 2 |
| 1 | 0 | 0 | 0 | 2 |
| 1 | 0 | 0 | 1 | 3 |
| 1 | 0 | 1 | 0 | 3 |
| 1 | 0 | 1 | 1 | 3 |
| 1 | 1 | 0 | 0 | 4 |
| 1 | 1 | 0 | 1 | 4 |
| 1 | 1 | 1 | 0 | 4 |
| 1 | 1 | 1 | 1 | 5 |

Fig. 7.4 Partition of the domain


Assume, without loss of generality, that both the columns and rows are labeled in ascending order of the values of $X_{1}$ and $X_{2}$, respectively. Because $g(X)$ is a monotone-increasing function, in scanning left-to-right and then top-to-bottom, the values of $g(X)$ will never decrease. An increase causes two columns to be distinct. Conversely, if no increase occurs anywhere across two adjacent columns, they are identical. In a monotone-increasing $k$-valued output function, there are $k-1$ dividing lines among $2^{n}$ output values. Dividing lines among values divide columns in the decomposition chart. Thus, there can be at most $k$ distinct columns.

The SIE functions are used in numerical function generators (NFGs).
We developed an architecture and a synthesis method for programmable NFGs for elementary functions such as trigonometric, logarithmic, square root, and reciprocal functions [116, 122, 130]. As shown in Fig. 7.4, a given domain of the function is partitioned into nonuniform segments. For each segment, the function is approximated by a linear function using the architecture shown in Fig. 7.5. By Theorem 7.5.1, when the number of segments is small, the SIE has a small C-measure, and can be efficiently realized by an LUT cascade. In this way, we can implement fast and compact NFGs for a wide range of functions.

Fig. 7.5 Numerical function generator using first-order approximation


### 7.6 WS Functions

A weighted-sum function (WS function) [120, 127] is a mathematical model of bit-counting circuits [110], radix converters [43, 119], and distributed arithmetic [121].
Definition 7.6.1. An $n$-input WS function $F(X)$ computes

$$
W S(X)=\sum_{i=1}^{n} w_{i} \cdot x_{i}
$$

where $X=\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ is the input vector, $W=\left(w_{1}, w_{2}, \ldots, w_{n}\right)$ is the weight vector, and $w_{i}(i=1,2, \ldots, n)$ is a positive or negative integer. Let $F=\left(f_{q-1}, f_{q-2}, \ldots, f_{0}\right)$ be the binary representation of the WS function. Then, we have

$$
W S(X)=\sum_{i=0}^{q-1} f_{i}(X) \cdot 2^{i}
$$

WS functions are used in radix converters and digital filters.
Theorem 7.6.1. Let $F(X)$ be a WS function with the weight vector $W=\left(w_{1}\right.$, $\left.w_{2}, \ldots, w_{n}\right)$. Let $\left(X_{1}, X_{2}\right)$ be a partition of $X=\left(x_{1}, x_{2}, \ldots, x_{n}\right)$, where $X_{1}=\left(x_{1}\right.$, $\left.x_{2}, \ldots, x_{k}\right)$ and $X_{2}=\left(x_{k+1}, x_{k+2}, \ldots, x_{n}\right)$. Consider the decomposition chart of $F$, where $X_{1}$ denotes the bound variables and $X_{2}$ denotes the free variables. Then, the column multiplicity of the decomposition chart is at most $U B 1=1+$ $\sum_{j=1}^{k}\left|w_{j}\right|$.

Table 7.4 Decomposition chart for a WS function

| $X_{1}=\left(x_{1}, x_{2}, x_{3}\right)$ |  |  |  |  |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| 00 | 0 | $w_{3}$ | $w_{2}$ | $w_{2}+w_{3}$ | $w_{1}$ | $w_{1}+w_{3}$ | $w_{1}+w_{2}$ | $w_{1}+w_{2}+w_{3}$ |
| 01 |  | $w_{3}+$ | $w_{2}+$ | $w_{2}+w_{3}+$ | $w_{1}+$ | $w_{1}+w_{3}+$ | $w_{1}+w_{2}+$ | $w_{1}+w_{2}+w_{3}+$ |
|  | $w_{5}$ | $w_{5}$ | $w_{5}$ | $w_{5}$ | $w_{5}$ | $w_{5}$ | $w_{5}$ | $w_{5}$ |
| 10 |  | $w_{3}+$ | $w_{2}+$ | $w_{2}+w_{3}+$ | $w_{1}+$ | $w_{1}+w_{3}+$ | $w_{1}+w_{2}+$ | $w_{1}+w_{2}+w_{3}+$ |
|  | $w_{4}$ | $w_{4}$ | $w_{4}$ | $w_{4}$ | $w_{4}$ | $w_{4}$ | $w_{4}$ | $w_{4}$ |
| 11 |  | $w_{3}+$ | $w_{2}+$ | $w_{2}+w_{3}+$ | $w_{1}+$ | $w_{1}+w_{3}+$ | $w_{1}+w_{2}+$ | $w_{1}+w_{2}+w_{3}+$ |
|  | $w_{4}+w_{5}$ | $w_{4}+w_{5}$ | $w_{4}+w_{5}$ | $w_{4}+w_{5}$ | $w_{4}+w_{5}$ | $w_{4}+w_{5}$ | $w_{4}+w_{5}$ | $w_{4}+w_{5}$ |

Table 7.5 A decomposition chart of a WS function (integer representation)

|  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $x_{1}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $x_{2}$ |
|  |  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | $x_{3}$ |
| 0 | 0 | 0 | 3 | 2 | 5 | 1 | 4 | 3 | 6 |  |
| 0 | 1 | 5 | 8 | 7 | 10 | 6 | 9 | 8 | 11 |  |
| 1 | 0 | 4 | 7 | 6 | 9 | 5 | 8 | 7 | 10 |  |
| 1 | 1 | 9 | 12 | 11 | 14 | 10 | 13 | 12 | 15 |  |
| $x_{4}$ | $x_{5}$ |  |  |  |  |  |  |  |  |  |

Proof. Consider the decomposition chart for $W S\left(X_{1}, X_{2}\right)$. In the first row, $X_{2}=$ $(0,0, \ldots, 0)$. Note that the column multiplicity is equal to the number of different values in the first row. For example, Table 7.4 shows the case of $n=5$ and $k=3$. Consider the case where all the weights are positive. In this case, the number of different values is at most $U B 1$, since $W S$ takes values from 0 to $\sum_{j=1}^{k} w_{j}$.

Consider the case where some of the weights are negative. Assume that $w_{1}, w_{2}, \ldots, w_{t}$ are negative, and $w_{t+1}, w_{t+2}, \ldots, w_{k}$ are positive. Then, the $W S$ takes values from $\sum_{j=1}^{t} w_{j}$ to $\sum_{j=t+1}^{k} w_{j}$. In this case, the number of different values is at most $1+\sum_{j=1}^{t}\left|w_{j}\right|+\sum_{j=t+1}^{k} w_{j}=1+\sum_{j=1}^{k}\left|w_{j}\right|$. From these, we can conclude that the column multiplicity of the decomposition chart is at most $U B 1$.

Example 7.6.1. Consider the WS function with $n=5$ and $W=\left(w_{1}, w_{2}, w_{3}\right.$, $\left.w_{4}, w_{5}\right)=(1,2,3,4,5)$. Let $X_{1}=\left(x_{1}, x_{2}, x_{3}\right)$ and $X_{2}=\left(x_{4}, x_{5}\right)$. In this case, $U B 1=1+w_{1}+w_{2}+w_{3}=1+1+2+3=7$. Table 7.5 shows the decomposition chart of the function. Note that the column multiplicity of the decomposition chart is 7 . So, the bound $U B 1$ is tight.

Definition 7.6.2. A threshold function $f\left(x_{0}, x_{1}, \ldots, x_{n-1}\right)$ satisfies the relation: $f=1$ if $\sum_{i=1}^{n} w_{i} x_{i} \geq T$, and $f=0$ otherwise, where $\left(w_{0}, w_{1}, \ldots, w_{n-1}\right)$ is the weight vector and T is the threshold.

Although, a threshold function is not a WS function, we can estimate the column multiplicity of a threshold function from the theory of WS functions.

Theorem 7.6.2. The column multiplicity of a decomposition chart of the threshold function with the weight vector $\left(w_{0}, w_{1}, \ldots, w_{n-1}\right)$ is at most

$$
U B=1+\sum_{i=0}^{n-1}\left|w_{i}\right| .
$$

Proof. The column multiplicity of a decomposition chart for $f$ is no greater than that of the WS function having the same weight vector. By Theorem 7.6.1, the column multiplicity of the WS function is at most $U B$. Hence, we have the theorem.

When the sum of the weights is large, a monolithic cascade realization of a WS function can be large. In this case, we can partition the outputs into groups, and realize each group separately.

Theorem 7.6.3. [120, 127] Let $F_{\mathrm{LSB}}(X)$ be the logic function that represents the least significant $q$ bits of a WS function. Then, $F_{\mathrm{LSB}}(X)$ can be realized by an LUT cascade consisting of cells with $q+1$ inputs and $q$ outputs.

A $2 q$-output WS function can be decomposed into a pair of WS functions as follows: Let, $w_{i}$ be a weight of a $2 q$-output WS function. Then, $w_{i}$ can be written as

$$
w_{i}=2^{q} w_{A i}+w_{B i}
$$

where $w_{A i}$ denotes most significant $q$ bits, and $w_{B i}$ denotes least significant $q$ bits. In this case, we can realize the $2 q$-output WS function using a pair of WS functions and an adder, as shown in Fig. 7.6.

This is an arithmetic decomposition of a WS function. With this method, we can efficiently realize a WS function with cascades and adders [120, 127].


Fig. 7.6 Arithmetic decomposition of $2 q$-output WS function

### 7.7 Modulo Function

Definition 7.7.1. Let $X=\left(x_{n-1}, x_{n-2} \ldots, x_{1}, x_{0}\right)$ be the input variables. A modulo $\mathbf{m}$ function is a mapping $B^{n} \rightarrow M$, where $M=\{0,1, \ldots, m-1\}$. It computes

$$
f(X)=\left(\sum_{i=0}^{n-1} 2^{i} x_{i}\right)(\bmod m)
$$

Theorem 7.7.1. The $C$-measure of the modulo $m$ function is $m$.
The modulo $m$ function can be realized by an LUT cascade with $\left\lceil\log _{2} m\right\rceil$ rails. The depth of the circuit is $O(n)$.

Next, consider the decomposition of the modulo $m$ function:

$$
f\left(X_{1}, X_{2}\right)=g\left(h_{1}\left(X_{1}\right), h_{2}\left(X_{2}\right)\right)
$$

where $X_{1}=\left(x_{n-1}, x_{n-2} \ldots, x_{n-k}\right)$, and $X_{2}=\left(x_{n-k-1}, x_{n-k-2} \ldots, x_{0}\right)$. In this case, $h_{1}\left(X_{1}\right)$ computes

$$
h_{1}\left(x_{n-1}, x_{n-2} \ldots, x_{n-k}\right)=\left(\sum_{i=n-k}^{n-1} 2^{i} x_{i}\right)(\bmod m)
$$

while $h_{2}\left(X_{2}\right)$ computes

$$
h_{2}\left(x_{n-k-1}, x_{n-k-2} \ldots, x_{0}\right)=\left(\sum_{i=0}^{n-k-1} 2^{i} x_{i}\right)(\bmod m) .
$$

Also,

$$
g\left(h_{1}, h_{2}\right)=h_{1}+h_{2}(\bmod m) .
$$

Note that the C-measures of $h_{1}\left(X_{1}\right)$ and $h_{2}\left(X_{2}\right)$ are also $m$. By decomposing recursively, we have a tree-type modulo circuit with depth $O\left(\log _{2} n\right)$.

Lemma 7.7.1. Let $a, b, c$, and $d$ be integers satisfying the relation $a=b c+d$, then

$$
a(\bmod m)=[b(\bmod m)] \times[c(\bmod m)]+[d(\bmod m)] .
$$

Example 7.7.1. Let $m=19$.

$$
\begin{aligned}
1001 & =20 \times 50+1 \\
20(\bmod 19) & =1 \\
50(\bmod 19) & =20 \times 2+10(\bmod 19) \\
& =2+10=12
\end{aligned}
$$

Fig. 7.7 Mod 3 circuit with LUTs


$$
1001(\bmod 19)=1 \times 12+1=13
$$

Example 7.7.2. Realize the modulo function by $k$-LUTs, where $m=3, n=16$, and $k=4$.
(Solution): The input variables are partitioned into
$X_{1}=\left(x_{15}, x_{14}, x_{13}, x_{12}\right), X_{2}=\left(x_{11}, x_{10}, x_{9}, x_{8}\right), X_{3}=\left(x_{7}, x_{6}, x_{5}, x_{4}\right)$, and $X_{4}=\left(x_{3}, x_{2}, x_{1}, x_{0}\right)$.
Figure 7.7 shows the circuit to compute the modulo $m$. The circuit A1 computes:

$$
\begin{aligned}
h_{1}\left(x_{15}, x_{14}, x_{13}, x_{12}\right) & =\left(\sum_{i=12}^{15} 2^{i} x_{i}\right)(\bmod 3) \\
& =\left(2^{12} \sum_{i=0}^{3} 2^{i} x_{i+12}\right)(\bmod 3) \\
& =\left(\sum_{i=0}^{3} 2^{i} x_{i+12}\right)(\bmod 3) .
\end{aligned}
$$

The circuit A2 computes:

$$
\begin{aligned}
h_{2}\left(x_{11}, x_{10}, x_{9}, x_{8}\right) & =\left(\sum_{i=8}^{11} 2^{i} x_{i}\right)(\bmod 3) \\
& =\left(2^{8} \sum_{i=0}^{3} 2^{i} x_{i+8}\right)(\bmod 3) \\
& =\left(\sum_{i=0}^{3} 2^{i} x_{i+8}\right)(\bmod 3)
\end{aligned}
$$

The circuit A3 computes:

$$
\begin{aligned}
h_{3}\left(x_{7}, x_{6}, x_{5}, x_{4}\right) & =\left(\sum_{i=4}^{7} 2^{i} x_{i}\right)(\bmod 3) \\
& =\left(2^{4} \sum_{i=0}^{3} 2^{i} x_{i+4}\right)(\bmod 3) \\
& =\left(\sum_{i=0}^{3} 2^{i} x_{i+4}\right)(\bmod 3) .
\end{aligned}
$$

The circuit A4 computes:

$$
h_{4}\left(x_{3}, x_{2}, x_{1}, x_{0}\right)=\left(\sum_{i=0}^{3} 2^{i} x_{i}\right)(\bmod 3)
$$

The circuits A5 and A6 compute:

$$
\begin{aligned}
& h_{5}=\left(h_{1}+h_{2}\right)(\bmod 3) \\
& h_{6}=\left(h_{3}+h_{4}\right)(\bmod 3) .
\end{aligned}
$$

And, finally, A7 computes:

$$
\left(h_{5}+h_{6}\right)(\bmod 3) .
$$

Note that in the above equations, we used the relations:

$$
2^{12} \equiv 2^{8} \equiv 2^{4} \equiv 1(\bmod 3)
$$

### 7.8 Remarks

Functions with small C-measures have efficient LUT cascade realizations. Practical functions often have small C-measures, although the fractions of functions with small C-measures approaches zero as the number of variables increases. An efficient method to obtain the C-measure of a logic function is to construct the BDD of the function. Functions that have small BDD representations are considered in [154].

## Problems

7.1. Show the profile of a 9 -variable symmetric function.
7.2. Let $f(X)$ be

$$
f=\left\lceil\frac{\sum_{i=0}^{4} 2^{i} x_{i}}{3}\right\rceil
$$

and let $\left(X_{1}, X_{2}\right)$ be a partition of $X=\left(x_{4}, x_{3}, x_{2}, x_{1}, x_{0}\right)$. Write the decomposition chart when $X_{1}=\left(x_{2}, x_{1}, x_{0}\right)$ and $X_{2}=\left(x_{4}, x_{3}\right)$, and obtain the column multiplicity. Also, do the same thing for $X_{1}=\left(x_{3}, x_{2}, x_{1}, x_{0}\right)$ and $X_{2}=\left(x_{4}\right)$.
7.3. Let $X=\left(x_{n-1}, x_{n-2} \ldots, x_{1}, x_{0}\right)$ be the input variables. An interval function $I N(X: A, B)$ is defined as

$$
I N(X: A, B)= \begin{cases}1 & (\text { If } A \leq X \leq B) \\ 0 & \text { (Otherwise) } .\end{cases}
$$

Here, $X$ is considered as an integer:

$$
X=\sum_{i=0}^{n-1} 2^{i} x_{i}
$$

Show that the C-measure of the function is at most three.
7.4. Suppose that the given function $f$ is represented as a sum of $k$ products having two literals. Obtain an upper bound on $\mu(f)$.
7.5. Consider the function $f\left(X_{1}, X_{2}, X_{3}, X_{4}\right)$, where $X_{1}=\left(x_{1}, x_{2}, x_{3}\right), X_{2}=$ $\left(x_{4}, x_{5}\right), X_{3}=\left(x_{6}\right)$, and $X_{4}=\left(x_{7}, x_{8}\right)$. Let $f$ be partially symmetric with respect to $X_{i}$, where $i=1,2,3,4$.

1. Let $\left(X_{1}, X_{2}\right)$ be the bound variables and $\left(X_{3}, X_{4}\right)$ be the free variables. Then, obtain an upper bound on the column multiplicity of the decomposition.
2. Let $\left(X_{1}, X_{2}, X_{3}\right)$ be the bound variables and $\left(X_{4}\right)$ be the free variables. Then, obtain an upper bound on the column multiplicity of the decomposition.
7.6. Let $\left(X_{1}, X_{2}, X_{3}\right)$ be a partition of the input variables $X$. Let $\mu_{i}$ be the column multiplicity of the decomposition of the function $f(X)$, where $X_{i}$ is the set of bound variables $(i=1,2)$. Then, show that the column multiplicity of the decomposition of $f$, where $\left(X_{1}, X_{2}\right)$ is the set of bound variables, is at most $\mu_{1} \mu_{2}$.
7.7. A priority encoder function has $n$ inputs $\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ and $\left\lceil\log _{2}(n+1)\right\rceil$ outputs. If $x_{1}=1$, then the output is 1 . Otherwise, if $x_{i}=1$ and $x_{j}=0$ for all $j$ such that $(1 \leq j<i)$, then the output is $i$. If all the inputs are 0 , then the output is 0 . Show that the C-measure of the priority encoder function is $n+1$.
7.8. Let $X=\left(x_{0}, x_{1}, x_{2}, x_{3}, x_{4}\right)$ and $W=(1,2,3,4,5)$. Consider the threshold function $f(X)$, where $W$ is the weight vector and $T=6$ is the threshold.
3. Write the decomposition chart, where $X_{1}=\left(x_{2}, x_{1}, x_{0}\right)$ denotes the bound variables and $X_{2}=\left(x_{4}, x_{3}\right)$ denotes the free variables.
4. Obtain the column multiplicity.
7.9. Realize the $n$-input modulo $m$ function with $k$-LUT, where $n=32$, $m=17$, and $k=10$.
7.10. Show that

$$
\begin{aligned}
& 2^{n k} \equiv(-1)^{n} \quad\left(\bmod 2^{k}+1\right) \\
& 2^{n k} \equiv 1 \quad\left(\bmod 2^{k}-1\right)
\end{aligned}
$$

7.11. Let $f(X)$ be a threshold function with the weight vector $\left(w_{1}, w_{2}, \ldots, w_{n}\right)$, where $w_{i}>0$. Let $\left(X_{1}, X_{2}\right)$ be a partition of $X$, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{k}\right)$ and $X_{2}=\left(x_{k+1}, x_{k+2}, \ldots, x_{n}\right)$. Then, show that the column multiplicity of $f$ with respect to the decomposition $\left(X_{1}, X_{2}\right)$ is at most $1+\sum_{i=1}^{k} w_{i}$.

## Chapter 8 <br> C-Measure of Sparse Functions

Let $f$ be a function of $n$ variables. Then, the C-measure of $f$ tends to increase exponentially with $n$. However, for the functions with a fixed weight $u$, ( $u \ll$ $2^{n-1}$ ), C-measure increase as $O(n)$. Thus, functions with small weights have small C -measures. This chapter considers the C -measure of functions with small weights.

### 8.1 Logic Functions with Specified Weights

In this section, we derive upper bounds on C-measures for functions with small weights. Then, we derive the number of LUTs to realize functions with small weights.

Definition 8.1.1. The weight of a logic function $f$, denoted by $u$, is the number of the binary vectors $\vec{a}$ such that $f(\vec{a})=1$.

Let $u$ be the weight of an $n$-variable function. If $u$ is much smaller than $2^{n}$, then $f$ is a sparse function.

Theorem 8.1.1. [8] Let $\mu_{k}(f)$ be the column multiplicity of a decomposition chart of an $n$-variable logic function $f$ with $k$ bound variables. Then,

$$
\mu_{k}(f) \leq \min \left\{2^{k}, 2^{2^{n-k}}\right\} .
$$

Example 8.1.1. Table 8.1 shows the profiles

$$
\left(\mu_{1}, \mu_{2}, \mu_{3}, \mu_{4}, \mu_{5}, \mu_{6}, \mu_{7}, \mu_{8}, \mu_{9}, \mu_{10}\right)
$$

of 10 randomly generated functions with $n=10$ and $u=512$. Since one-half of the truth table entries are 1, such functions are not considered to be sparse. The last row (AVG) denotes the average of the column multiplicities. The upper bound profile obtained by Theorem 8.1.1 is

$$
(2,4,8,16,32,64,128,16,4,2)
$$

Table 8.1 Profiles of 10 -variable random functions with weight 512

| NF | $\mu_{1}$ | $\mu_{2}$ | $\mu_{3}$ | $\mu_{4}$ | $\mu_{5}$ | $\mu_{6}$ | $\mu_{7}$ | $\mu_{8}$ | $\mu_{9}$ | $\mu_{10}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | ---: | :--- | :--- | :--- |
| $f_{0}$ | 2 | 4 | 8 | 16 | 32 | 64 | 98 | 16 | 4 | 2 |
| $f_{1}$ | 2 | 4 | 8 | 16 | 32 | 64 | 106 | 16 | 4 | 2 |
| $f_{2}$ | 2 | 4 | 8 | 16 | 32 | 64 | 99 | 16 | 4 | 2 |
| $f_{3}$ | 2 | 4 | 8 | 16 | 32 | 64 | 100 | 16 | 4 | 2 |
| $f_{4}$ | 2 | 4 | 8 | 16 | 32 | 64 | 97 | 16 | 4 | 2 |
| $f_{5}$ | 2 | 4 | 8 | 16 | 32 | 64 | 102 | 16 | 4 | 2 |
| $f_{6}$ | 2 | 4 | 8 | 16 | 32 | 64 | 97 | 16 | 4 | 2 |
| $f_{7}$ | 2 | 4 | 8 | 16 | 32 | 64 | 95 | 16 | 4 | 2 |
| $f_{8}$ | 2 | 4 | 8 | 16 | 32 | 64 | 98 | 16 | 4 | 2 |
| $f_{9}$ | 2 | 4 | 8 | 16 | 32 | 64 | 104 | 16 | 4 | 2 |
| AVG | 2.0 | 4.0 | 8.0 | 16.0 | 32.0 | 64.0 | 99.6 | 16.0 | 4.0 | 2.0 |

Table 8.2 Profiles of 10-variable random functions with weight 64

| NF | $\mu_{1}$ | $\mu_{2}$ | $\mu_{3}$ | $\mu_{4}$ | $\mu_{5}$ | $\mu_{6}$ | $\mu_{7}$ | $\mu_{8}$ | $\mu_{9}$ | $\mu_{10}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $f_{0}$ | 2 | 4 | 8 | 16 | 26 | 31 | 21 | 12 | 4 | 2 |
| $f_{1}$ | 2 | 4 | 8 | 16 | 28 | 27 | 22 | 10 | 4 | 2 |
| $f_{2}$ | 2 | 4 | 8 | 16 | 26 | 30 | 20 | 10 | 4 | 2 |
| $f_{3}$ | 2 | 4 | 8 | 16 | 27 | 30 | 19 | 11 | 4 | 2 |
| $f_{4}$ | 2 | 4 | 8 | 16 | 27 | 28 | 21 | 9 | 4 | 2 |
| $f_{5}$ | 2 | 4 | 8 | 16 | 27 | 30 | 19 | 8 | 4 | 2 |
| $f_{6}$ | 2 | 4 | 8 | 16 | 29 | 30 | 19 | 8 | 3 | 2 |
| $f_{7}$ | 2 | 4 | 8 | 16 | 29 | 28 | 19 | 10 | 4 | 2 |
| $f_{8}$ | 2 | 4 | 8 | 16 | 28 | 29 | 20 | 9 | 4 | 2 |
| $f_{9}$ | 2 | 4 | 8 | 16 | 28 | 31 | 18 | 9 | 4 | 2 |
| AVG | 2.0 | 4.0 | 8.0 | 16.0 | 27.5 | 29.4 | 19.8 | 9.6 | 3.9 | 2.0 |

Except for $\mu_{7}$, the values of profiles in Table 8.1 are equal to the upper bounds given by Theorem 8.1.1. Note that

$$
\mu\left(f_{1}\right)=\max _{k=1}^{10}\left\{\mu_{k}\left(f_{1}\right)\right\}=106,
$$

while

$$
\mu\left(f_{7}\right)=\max _{k=1}^{10}\left\{\mu_{k}\left(f_{7}\right)\right\}=95
$$

Example 8.1.2. Table 8.2 shows the profiles of 10 randomly generated functions with $n=10$ and $u=64$. Since the fraction of 1 's in the truth table is $6.25 \%$, such functions are considered to be sparse. In this case, the bound given by Theorems 8.1.1 is not tight for $\mu_{5}, \mu_{6}, \mu_{7}$, and $\mu_{8}$.

In general, when the weight $u$ of a function is much smaller than $2^{n}$, the bounds on the column multiplicity given by Theorem 8.1.1 are not tight and are not so useful. However, if the weight $u$ of the function is given, then we can derive tighter bounds. From here, we derive a tighter bound using a combinatorial argument.
Lemma 8.1.1. Consider boxes arranged as a rectangle with $t$ rows and many columns. Assume that we distribute u nondistinct balls to these boxes so that each
box has at most one ball. Let $\lambda(t, u)$ be the maximum number of distinct column patterns. Then,

$$
\lambda(t, u)=\sum_{i=0}^{\zeta}\binom{t}{i}+r
$$

where $\zeta$ is the integer satisfying the relation:

$$
\sum_{i=1}^{\zeta} i\binom{t}{i} \leq u<\sum_{i=1}^{\zeta+1} i\binom{t}{i}
$$

and

$$
r=\left\lfloor\frac{u-\sum_{i=1}^{\zeta} i\binom{t}{i}}{\zeta+1}\right\rfloor
$$

Note that $\lambda(t, u)$ is monotone increasing for $0 \leq u \leq t \cdot 2^{t-1}$ and takes the constant value $2^{t}$ when $u \geq t \cdot 2^{t-1}$.

Example 8.1.3. Consider boxes arranged as a rectangle with $t=4$ rows and many columns. Assume that we distribute $u=10$ nondistinct balls so that each box has at most one ball. When the balls are distributed as shown in Fig. 8.1, the maximum number of patterns occur. In this case, the first column has no ball; in the second to fifth columns, each column has just one ball; and in the last three columns, each column has two balls. The number of different column patterns can be enumerated as follows. Since,

$$
\sum_{i=1}^{1} i\binom{4}{i}=4 \leq 10<\sum_{i=1}^{2} i\binom{4}{i}=16
$$

we have $\zeta=1$, which shows that all the column patterns with weight 0 and weight 1 occur. Also,

$$
r=\left\lfloor\frac{10-\sum_{i=1}^{1} i\binom{4}{i}}{2}\right\rfloor=3
$$

implies that there are three patterns with weight $\zeta+1=2$. Thus, we have

$$
\lambda(4,10)=\sum_{i=0}^{1}\binom{4}{i}+r=1+4+3=8
$$

which shows the number of different column patterns.

Fig. 8.1 Maximum number of column patterns for the function with weight $u=10$

| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 |
| 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |

Table 8.3 Values for $\lambda(t, u)$

|  |  | u |  |  |  |  |  |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | ---: | ---: | ---: | ---: | ---: |
|  | t | 23 | 47 | 55 | 95 | 111 | 147 | 191 | 223 | 239 | 375 |
| $\mu_{n-2}$ | 4 | 13 | 16 | 16 | 16 | 16 | 16 | 16 | 16 | 16 | 16 |
| $\mu_{n-3}$ | 8 | $\mathbf{1 6}$ | 28 | $\mathbf{3 2}$ | 47 | 52 | $\mathbf{6 4}$ | 79 | 90 | 94 | $\mathbf{1 2 8}$ |
| $\mu_{n-4}$ | 16 | 20 | $\mathbf{3 2}$ | 36 | 56 | $\mathbf{6 4}$ | 82 | 104 | 120 | $\mathbf{1 2 8}$ | 176 |
| $\mu_{n-5}$ | 32 | 24 | 40 | 44 | $\mathbf{6 4}$ | 72 | 90 | 112 | $\mathbf{1 2 8}$ | 136 | 204 |
| $\mu_{n-6}$ | 64 | 24 | 48 | 56 | 80 | 88 | 106 | $\mathbf{1 2 8}$ | 144 | 152 | 220 |

Example 8.1.4. Table 8.3 shows the values of $\lambda(t, u)$ for $t=4,8,16,32$, and various values of $u$.

Theorem 8.1.2. Let $\mu_{k}(n, u)$ be the column multiplicity of a decomposition chart for an $n$-variable function with weight $u$ and $k$ bound variables. Then,

$$
\mu_{k}(n, u) \leq \lambda\left(2^{n-k}, u\right) .
$$

Note that when $u \leq 2^{n-k}, \lambda(t, u)=u+1$, while when $u>2^{n-k}, \lambda\left(2^{n-k}, u\right)<$ $u+1$. Thus, we have the following:

Corollary 8.1.1. For any logic function $f$ with weight $u, \mu(f) \leq u+1$.
For $\mu_{k}(n, u)$, Theorem 8.1.2 gives better bounds than $2^{2^{n-k}}$ and Corollary 8.1.1 when

$$
2^{n-k}<u<2^{2^{n-k}+(n-k)-1} .
$$

From Theorem 8.1.2, we have the following:
Theorem 8.1.3. The number of 6 -LUTs needed to realize an $n$-variable function with weight $u$ is:

- 10 or less, when $n=9$ and $u \leq 55$
- 15 or less, when $n=10$ and $u \leq 47$

Proof. The profile of a 9-variable function given by Theorem 8.1.1 is

$$
\begin{aligned}
& \left(\mu_{1}, \mu_{2}, \mu_{3}, \mu_{4}, \mu_{5}, \mu_{6}, \mu_{7}, \mu_{8}, \mu_{9}\right) \\
& \quad=(2,4,8,16,32,64,16,4,2) .
\end{aligned}
$$

Let $n=9$ and $u=55$. By Theorem 8.1.2, we have $\mu_{n-3}=\mu_{6} \leq \lambda\left(2^{n-6}, 55\right)=$ 32. Thus, $\mu(f) \leq 32$, and by Theorem 5.2.1, $f$ can be realized with at most 10 LUTs.

The profile of a 10 -variable function given by Theorem 8.1.1 is

$$
\begin{aligned}
& \left(\mu_{1}, \mu_{2}, \mu_{3}, \mu_{4}, \mu_{5}, \mu_{6}, \mu_{7}, \mu_{8}, \mu_{9}, \mu_{10}\right) \\
& \quad=(2,4,8,16,32,64,128,16,4,2) .
\end{aligned}
$$

Let $n=10$ and $u=47$. By Theorem 8.1.2, we have $\mu_{n-3}=\mu_{7} \leq \lambda\left(2^{n-7}, 47\right)=$ 28 , and $\mu_{n-4}=\mu_{6} \leq \lambda\left(2^{n-6}, 47\right)=32$. Thus, $\mu(f) \leq 32$, and by Theorem 5.2.1, $f$ can be realized with at most 15 LUTs.

Theorem 8.1.4. The number of 7 -LUTs needed to realize an $n$-variable function with weight $u$ is:

- 23 or less, when $n=12$ and $u \leq 95$
- 17 or less, when $n=11$ and $u \leq 111$
- 11 or less, when $n=10$ and $u \leq 147$

Proof. From Theorems 8.1.1 and 8.1.2, and Table 8.3, we have the following profiles for 7-LUTs:

When $u \leq 95,(2,4,8,16,32,64,64,56,47,16,4,2)$.
When $u \leq 111,(2,4,8,16,32,64,64,52,16,4,2)$.
When $u \leq 147,(2,4,8,16,32,64,64,16,4,2)$.
In a similar way to the proof of Theorem 8.1.3, we have the numbers of LUTs.
Theorem 8.1.5. The number of 8 -LUTs needed to realize an $n$-variable function with weight u is:

- 33 or less, when $n=14$ and $u \leq 191$
- 26 or less, when $n=13$ and $u \leq 223$
- 19 or less, when $n=12$ and $u \leq 239$
- 12 or less, when $n=11$ and $u \leq 375$

Proof. For 8-LUTs, we have the following profiles:
When $u \leq 119,(2,4,8,16,32,64,128,128,112,104,79,16,4,2)$.
When $u \leq 223,(2,4,8,16,32,64,128,128,120,90,16,4,2)$.
When $u \leq 239$, (2, 4, 8, 16, 32, 64, 128, 128, 94, 16, 4, 2).
When $u \leq 375,(2,4,8,16,32,64,128,128,16,4,2)$.
In a similar way to the proof of Theorem 8.1.3, we have the numbers of LUTs.
Here, we compare the quality of bounds derived in this section with previous ones.

Theorem 8.1.6. [78, 110] The number of $K$-LUTs to realize an arbitrary n-variable function is at most

$$
2^{n-K+1}-1
$$

When $n$ is even, and $K=6$, we have a better bound as follows:
Theorem 8.1.7. [110] Let $n$ be even. The number of 6 -LUTs to realize an arbitrary $n$-variable function is at most

$$
\frac{2^{n-4}-1}{3}
$$

Proof. This is the same as Theorem 4.2.3.

Theorem 8.1.8. [78] Let $f$ be represented by a sum-of-products expression with $m$ literals and p products. The number of $K$-LUTs to realize $f$ is at most

$$
\left\lfloor\frac{m+p(K-3)}{K-1}\right\rfloor+\left\lceil\frac{p-1}{K-1}\right\rceil .
$$

Example 8.1.5. Consider the case of $K=6, n=10$, and $u=47$. The upper bound given by Theorem 8.1.7 is

$$
\frac{2^{n-4}-1}{3}=\frac{63}{3}=21
$$

On the other hand, the bound given by Theorem 8.1.3 is 15 .
When the function is random, we can assume that $p=u=47$. Because the function is sparse and random, the minterms tend to occur as isolated minterms. Thus, $p$ tends to be the same as $u=47$ [105]. Thus, we have $m=p n=47 \times 10$. In this case, the upper bound given by Theorem 8.1.8 is

$$
\left\lfloor\frac{m+p(K-3)}{K-1}\right\rfloor+\left\lceil\frac{p-1}{K-1}\right\rceil=122+10=132
$$

This example shows that the upper bound given by Theorem 8.1.8 is useless for random functions.

### 8.2 Uniformly Distributed Functions

In the previous section, we considered upper bounds on column multiplicities. In this section, however, we consider upper bounds on the average column multiplicities. These bounds are only valid when $n$ and $u$ are sufficiently large. It is assumed that 1 's in the truth table occur randomly.

Definition 8.2.1. A set of functions is uniformly distributed, if the probability of occurrence of any function is the same as any other function.
For example, there are $\binom{16}{4}=1820$ different 4 -variable functions with 4 true minterms. If the functions are uniformly distributed, the probability of the occurrence of any one of them is $\frac{1}{1820}$.
Theorem 8.2.1. Consider a decomposition chart with $k$ bound variables that realizes a set of uniformly distributed functions of $n$-variables with weight $u$. The average number of different column functions with weight $i$ in the decomposition chart is at most

$$
\min \left\{1, N \alpha^{i} \beta^{M-i}\right\} \times\binom{ M}{i},
$$

where $N=2^{k}, M=2^{n-k}, \alpha=u / 2^{n}$, and $\beta=1-\alpha$.

Proof. Consider a column of the decomposition chart. Suppose that the upper $i$ elements take value 1 , while the lower $M-i$ elements take value 0 . The probability of such a column is given by

$$
\alpha^{i} \beta^{M-i} .
$$

Since there exist $\binom{M}{i}$ different ways to choose the rows with 1 , the probability that a column function with weight $i$ occurs is

$$
\binom{M}{i} \alpha^{i} \beta^{M-i}
$$

Also, the number of different column functions with weight $i$ is at most $\binom{M}{i}$. Thus, we have the theorem.

Note that Theorem 8.2.1 gives upper bounds on the average column multiplicity $\mu_{k}$. Thus, there may exist functions whose column multiplicities are greater than the bounds given by Theorem 8.2.1. However, the fraction of such functions approaches to zero as $n$ increases.

### 8.3 Experimental Results

We developed a program to derive the bounds on the column multiplicities given by Theorems 8.1.1 and 8.2.1. Also, we have obtained statistical data for functions with $n=10$ and $n=16$.

### 8.3.1 Benchmark Functions

An interesting problem is whether the bounds obtained in Sect. 8.1 are applicable to benchmark functions. The answer is yes when the weights of benchmark functions are in a range. The fraction of such functions is not so large, but we did find some. For selected benchmark functions, we counted the number of variables $n$, and the number of true minterms $u$. For multiple output functions, each output is examined separately. $f_{i}$ denotes the $i$-th output, where the index starts from 0 . The results are as follows:

Theorem 8.1.3 is applicable to the following benchmark functions: apex $4, f_{1}$ ( $n=9, u=55$ ); pdc, $f_{23}(n=9, u=43)$; spla, $f_{43}(n=10, u=28)$; spla, $f_{44}$ ( $n=10, u=44$ ); amd, $f_{14}(n=10, u=44)$.

Theorem 8.1.4 is applicable to the following benchmark functions: pdc, $f_{9}$ ( $n=$ $10, u=95)$; signet, $f_{3}(n=10, u=95)$; pdf, $f_{31},(n=10, u=120)$.

Theorem 8.1.5 is applicable to the following benchmark functions: pdc, $f_{8}$ ( $n=$ $11, u=333)$; pdc, $f_{33}(n=11, u=340)$; signet, $f_{4}(n=11, u=132)$; in2, $f_{3}$ ( $n=12, u=236$ ); ti, $f_{5}(n=13, u=160)$.

### 8.3.2 Randomly Generated Functions

### 8.3.2.1 10-Variable Random Functions

First, we randomly generated 100 functions with $u=512$ and $n=10$. Table 8.4 shows the average column multiplicities (AVG), the maximum column multiplicities (MAX), the upper bound derived by Theorem 8.2.1, and the upper bound derived by Theorem 8.1.1 for each $\mu_{k}$. In this case, except for $\mu_{6}$ and $\mu_{7}$, Theorem 8.1.1 gives tight bounds. Also, Theorems 8.1.1 and 8.2.1 give identical bounds.

Second, we randomly generated 10,000 functions with $u=64$ and $n=10$. Table 8.5 shows the average column multiplicities (AVG), the maximum column multiplicities (MAX), the upper bounds derived by Theorem 8.2.1, and the upper bounds derived by Theorem 8.1.1 for each $\mu_{k}$. In this case, for $\mu_{k}$, where $k=$ $5,6,7,8$, Theorem 8.2.1 gives better bounds than Theorem 8.1.1.

Table 8.5 also shows that the average C-measure of 10 -variable functions with weight 64 , is less than 30 . The maximum column multiplicity occurs when $k=6$ and $\mu_{6}=37$. In these experimental results, the order of the input variables is fixed. If we optimize the order of the variables, then we can reduce the column multiplicity. We also optimized the order of the variables and confirmed that, in all 10,000 cases, the column multiplicities are at most 32 . Thus, we have the following:

Conjecture 8.3.1. For most 10 -variable functions with weight $u \leq 64$, the C-measure is 32 or less, if we optimize the ordering of the input variables.

From Theorem 5.2.1, we have
Conjecture 8.3.2. The number of 6-LUTs needed to realize most of 10 -variable functions with weight $u \leq 64$ is 15 or less.

As shown in Theorem 8.1.3, the number of 6-LUTs needed to realize an arbitrary 10 -variable function $f$ with weight $u \leq 47$ is 15 or less. Thus, Conjecture 8.3.2 is true for $u \leq 47$.

Table 8.4 Average and maximum profiles of 10 -variable functions with weight 512

|  | $\mu_{1}$ | $\mu_{2}$ | $\mu_{3}$ | $\mu_{4}$ | $\mu_{5}$ | $\mu_{6}$ | $\mu_{7}$ | $\mu_{8}$ | $\mu_{9}$ | $\mu_{10}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $A V G$ | 2 | 4 | 8 | 16.00 | 32.00 | 63.96 | 100.86 | 16 | 4 | 2 |
| $M A X$ | 2 | 4 | 8 | 16 | 32 | 64 | 111 | 16 | 4 | 2 |
| Theorem 8.2.1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 16 | 4 | 2 |
| Theorem 8.1.1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 16 | 4 | 2 |

Table 8.5 Average and maximum profiles of 10 -variable functions with weight 64

|  | $\mu_{1}$ | $\mu_{2}$ | $\mu_{3}$ | $\mu_{4}$ | $\mu_{5}$ | $\mu_{6}$ | $\mu_{7}$ | $\mu_{8}$ | $\mu_{9}$ | $\mu_{10}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $A V G$ | 2 | 4 | 8 | 15.97 | 27.97 | 29.89 | 18.36 | 8.76 | 3.88 | 2 |
| $M A X$ | 2 | 4 | 8 | 16 | 32 | 37 | 25 | 13 | 4 | 2 |
| Theorem 8.2.1 | 2 | 4 | 8 | 16 | 27.44 | 33.75 | 19.88 | 10.51 | 4 | 2 |
| Theorem 8.1.1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 16 | 4 | 2 |

Table 8.6 Profiles for 16-variable functions

| $u$ | $\mu_{3}$ | $\mu_{4}$ | $\mu_{5}$ | $\mu_{6}$ | $\mu_{7}$ | $\mu_{8}$ | $\mu_{9}$ | $\mu_{10}$ | $\mu_{11}$ | $\mu_{12}$ | $\mu_{13}$ | $\mu_{14}$ | $\mu_{15}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| A16 | 7.73 | 11.30 | 13.73 | 15.24 | 15.95 | 16.20 | 16.05 | 15.42 | 13.90 | 11.23 | 8.11 | 4.97 | 3 |
| C16 | 7.50 | 11.06 | 13.56 | 15.16 | 16.04 | 16.51 | 16.75 | 16.88 | 16.94 | 16.97 | 9.01 | 5.01 | 3 |
| A32 | 8 | 14.79 | 21.52 | 26.37 | 28.94 | 29.64 | 28.83 | 26.28 | 21.46 | 15.06 | 8.93 | 5.02 | 3.00 |
| C32 | 8 | 13.99 | 21.11 | 26.18 | 29.32 | 31.09 | 32.03 | 32.51 | 32.76 | 17.12 | 9.05 | 5.02 | 3.01 |
| A64 | 8 | 15.97 | 28.69 | 41.48 | 50.10 | 53.10 | 50.52 | 42.05 | 29.29 | 17.19 | 9.23 | 5.07 | 3.00 |
| C64 | 8 | 16 | 26.99 | 41.23 | 51.36 | 57.65 | 61.19 | 63.07 | 33.95 | 17.48 | 9.22 | 5.09 | 3.03 |
| A128 | 8 | 16 | 31.84 | 56.42 | 80.18 | 91.11 | 83.70 | 60.99 | 35.79 | 18.69 | 9.57 | 5.23 | 3.05 |
| C128 | 8 | 16 | 32 | 53.00 | 81.49 | 101.76 | 114.35 | 72.27 | 36.73 | 18.84 | 9.87 | 5.37 | 3.13 |
| A256 | 8 | 16 | 32 | 65.53 | 110.64 | 147.29 | 136.88 | 88.38 | 46.95 | 23.92 | 12.27 | 6.32 | 3.44 |
| C256 | 8 | 16 | 32 | 63.84 | 105.05 | 162.09 | 174.95 | 91.84 | 47.34 | 24.23 | 12.45 | 6.49 | 3.50 |
| A512 | 8 | 16 | 32 | 64 | 126.57 | 214.49 | 235.14 | 155.10 | 83.05 | 41.66 | 19.59 | 8.69 | 3.85 |
| C512 | 8 | 16 | 32 | 64 | 126.69 | 209.33 | 262.51 | 156.60 | 86.08 | 44.89 | 22.57 | 10.94 | 4 |
| A1K | 8 | 16 | 32 | 64 | 128 | 251.86 | 391.99 | 327.91 | 192.81 | 89.33 | 33.93 | 11.07 | 4 |
| C1K | 8 | 16 | 32 | 64 | 128 | 252.46 | 407.99 | 332.19 | 214.99 | 120.75 | 38.65 | 11.25 | 4 |
| A2K | 8 | 16 | 32 | 64 | 128 | 256 | 499.80 | 658.92 | 455.91 | 176.87 | 48.40 | 12.72 | 4 |
| C2K | 8 | 16 | 32 | 64 | 128 | 256 | 504.20 | 626.67 | 568.03 | 188.19 | 49.43 | 12.95 | 4 |
| A4K | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 975.44 | 1019.90 | 393.73 | 86.27 | 15.10 | 4 |
| C4K | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1008.54 | 1095.05 | 431.43 | 99.76 | 15.25 | 4 |
| A8K | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1023.92 | 1834.20 | 1055.60 | 150.52 | 15.98 | 4 |
| C8K | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 | 2020.45 | 1063.58 | 163.00 | 16 | 4 |
| A16K | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 | 2047.48 | 2738.27 | 238.61 | 16 | 4 |
| C16K | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 | 2048 | 4055.95 | 256 | 16 | 4 |
| A32K | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 | 2048 | 3971.75 | 256 | 16 | 4 |
| C32K | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 | 2048 | 4096 | 256 | 16 | 4 |

### 8.3.2.2 16-Variable Random Functions

In the case of $n=16$, for each weight $u=2^{i}$, where $i=4-15$, we generated 100 functions and obtained the average of profiles.

Table 8.6 compares the profiles of random functions and the average profiles derived by Theorem 8.2.1. For example, the row for $A 16$ denotes the average profile of randomly generated functions with weight 16 , while the row for C 16 denotes the calculated profile using Theorem 8.2.1. To save space, some fractional numbers are denoted by integers: 2.00 is denoted by 2 . Also, the values for $\mu_{1}, \mu_{2}$ and $\mu_{16}$ are omitted from the table, since $\mu_{1}=2, \mu_{2}=4$, and $\mu_{16}=2$, for all cases.

Table 8.6, shows that when $u$ is small (say $u=16$ ), Theorem 8.2.1 gives better bounds than Theorem 8.1.1, while when $u$ is large (say $u=32,768$ ), Theorem 8.1.1 gives better bounds. In Table 8.6, calculated bounds denoted by integers were obtained by Theorem 8.1.1, while calculated bounds denoted by fractional numbers were obtained by Theorem 8.2.1.

### 8.4 Remarks

In this chapter, we considered sparse functions, where $u$, the weight of the function, is much smaller than $2^{n-1}$. However, when 0 and 1 are interchanged, the theory also holds. Thus, these bounds are also useful for the functions where the fraction of 0 's in the truth table is much smaller than the fraction of 1 's. This chapter is based on [133].

## Problems

8.1. Obtain the value of $\lambda(8,20)$.
8.2. Suppose that a function is given as a sum-of-products expression. Show an efficient method to calculate the weigh for $f$.
8.3. Show a $\left(2^{n}+n\right)$-variable function whose C -measure is $2^{2^{n}}$.
8.4. How many 6 -input LUTs are necessary to realize

1. An arbitrary 8 -variable function?
2. An arbitrary 9 -variable symmetric function?
3. An arbitrary 10 -variable function?
4. An arbitrary 10 -variable function with weight 47 ?

## Chapter 9 <br> Index Generation Functions

This chapter introduces index generation functions with various applications. Then, it shows a method to realize index generation functions by LUTs.

### 9.1 Index Generation Functions and Their Realizations

Definition 9.1.1. Consider a set of $k$ different binary vectors of $n$ bits. These vectors are registered vectors. For each registered vector, assign a unique integer from 1 to $k$. A registered vector table shows, for each registered vector, its index. An index generation function produces the corresponding index if the input matches a registered vector, and produces 0 otherwise. $k$ is the weight of the index generation function. An index generation function represents a mapping: $B^{n} \rightarrow\{0,1,2, \ldots, k\}$. An index generator is a circuit that realizes an index generation function.

Example 9.1.1. Table 9.1 shows a registered vector table with $k=4$ vectors. The corresponding index generation function is shown in Table 9.2.

Here, we assume that $k$ is much smaller than $2^{n}$, the total number of input combinations, i.e., the index generation function is sparse.

Index generators are used in address tables for the internet routers, terminal access controller (TAC) for local area networks, databases, memory patch circuits, electronic dictionaries, password lists, etc.

### 9.2 Address Table

IP addresses used in the internet are often represented with 32-bit numbers. An address table for a router stores IP addresses and corresponding indices. We assume that the number of addresses in the table is at most 40,000 . Thus, the number of inputs is 32 and the number of outputs is 16 , which can handle 65,536 addresses. Note that the address table must be updated frequently.

Table 9.1 Registered vector table

| Vector |  |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :---: |
| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | Index |  |
| 1 | 0 | 0 | 1 | $\mathbf{1}$ |  |
| 1 | 1 | 1 | 1 | $\mathbf{2}$ |  |
| 0 | 1 | 0 | 1 | $\mathbf{3}$ |  |
| 1 | 1 | 0 | 0 | $\mathbf{4}$ |  |

Table 9.2 Index generation function

| Input |  |  |  | Output |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $y_{1}$ | $y_{2}$ | $y_{3}$ |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 1 | 0 | 1 | 1 |
| 0 | 1 | 0 | 1 | 0 | 0 | 0 |
| 0 | 1 | 1 | 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 |
| 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 0 | 1 | 0 |

### 9.3 Terminal Access Controller

A TAC for a local area network checks whether the requested terminal has permission to access Web addresses outsize the local area network, e-mail, FTP, Telnet, etc.

In Fig. 9.1, eight terminals are connected to the TAC. Some can access all the resources. Others can access only limited resources because of security issue. The TAC checks whether the requested computer has permission to access the Web, e-mail, FTP, Telnet, or not. Each terminal has its unique MAC address represented by 48 bits. We assume that the number of terminals in the table is at most 255. To implement the TAC, we use an index generator and a memory. The memory stores the details of the terminals. The number of inputs for the index generator is 48 and the number of outputs is 8 . Note that the table for the TAC must be updated frequently.

Example 9.3.1. Figure 9.2 shows an example of the TAC. The first terminal has the MAC address 53:03:74:59:03:02. It is allowed to access everything, including the Web outside the local area network, e-mail, FTP, and Telnet. The second one is allowed to access both the Web and e-mail. The third one is allowed to access only the Web. And, the last one is allowed to access only e-mail. The index generated


Fig. 9.1 Terminal access controller (TAC)


Fig. 9.2 Index generator for TAC
by the index generator is used as an address to read the memory which stores the permissions. If we implement the TAC by a single memory, we need 256 Tera words, since the number of inputs is 48 . To reduce the size of the memory, we use an index generator to produce the index, and an additional memory to store the permission data for each internal address.

### 9.4 Memory Patch Circuit

The firmware of an embedded system is usually implemented by Read-Only Memories (ROMs). After shipping the product, it is often necessary to modify a part of the ROM, for example to upgrade to a later version. To convert the address of the ROM


Fig. 9.3 Memory patch circuit
to the address of the patch memory, we use the index generator shown in Fig. 9.3 [28, 29, 74].

The index generator stores addresses (vectors) of the ROM to be updated, and their corresponding indices. The patch memory stores the updated data of the ROM. When the address does not match any elements in the index generation function, the output of the ROM is sent to the output bus. In this case, the output the patch memory is disabled. When the address matches to an element in the index generation function, the index generator produces the corresponding index, and the corresponding data of the patch memory is sent to the output bus. In this case, the output of the ROM is disabled. This method can be also used to improve the yield of large-scale memory, which can be "patched" instead of discarded.

### 9.5 Periodic Table of the Chemical Elements

Consider Table 9.3, which shows a part of periodic table of the chemical elements. Figure 9.4 shows its implementation. This stores, Atomic Number (integer), Chemical Symbol, Density, Specific Heat, State, and Category. We assume that the number of elements in the database is at most 127. The database consists of two circuits:

1. A circuit to produce the Atomic Number from a Chemical Symbol.
2. A circuit to produce Density, State and Category from an Atomic Number.

The first circuit is implemented by an index generator, and the second circuit is implemented by an ordinary memory. Chemical Symbol consists of two characters from the 26-letter English alphabet (uppercase and lowercase) and special symbols (blank, etc.). Since each character requires 6 bits, to represent 2 characters, we need $2 \times 6=12$ bits. On the other hand, to represent an Atomic Number, we need only 7 bits, since the total number of elements in the table is at most 127 . In this case,

Table 9.3 Periodic table of the chemical elements

| Atomic number | Chemical symbol | English name | Density | Specific heat | State | Category |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | H | Hydrogen | 0.082 | 14.304 | Gas | Other nonmetal |
| 2 | He | Helium | 0.147 | 5.188 | Gas | Noble gas |
| 3 | Li | Lithium | 0.534 | 3.489 | Solid | Alkali metal |
| 4 | Be | Beryllium | 1.848 | 1.824 | Solid | Alkaline earth metal |
| 5 | B | Boron | 2.34 | 1.025 | Solid | Metalloid |
| 6 | C | Carbon | 3.51 | 0.513 | Solid | Other nonmetal |
| 7 | N | Nitrogen | 1.2506 | 1.042 | Gas | Other nonmetal |
| 8 | O | Oxygen | 1.429 | 0.916 | Gas | Other nonmetal |
| 9 | F | Fluorine | 1.696 | 0.824 | Gas | Halogen |
| 10 | Ne | Neon | 0.8999 | 1.029 | Gas | Noble gas |
| 11 | Na | Sodium | 0.971 | 1.227 | Solid | Alkali metal |
| 12 | Mg | Magnesium | 1.738 | 1.025 | Solid | Alkaline earth metal |
| 13 | Al | Aluminum | 2.6989 | 0.902 | Solid | Other metal |
| 14 | Si | Silicon | 2.33 | 0.712 | Solid | Metalloid |
| 15 | P | Phosphorus | 1.82 | 0.757 | Solid | Other nonmetal |
| 16 | S | Sulfur | 2.07 | 0.732 | Solid | Other nonmetal |
| 17 | Cl | Chlorine | 3.214 | 0.477 | Gas | Halogen |
| 18 | Ar | Argon | 1.848 | 0.138 | Gas | Noble gas |
| 19 | K | Potassium | 0.862 | 0.766 | Solid | Alkali metal |
| 20 | Ca | Calcium | 1.55 | 0.653 | Solid | Alkaline earth metal |



Fig. 9.4 Circuit for the periodic table
the number of possible input combinations is $2^{12}$, while the number of the registered vectors is at most 127 . Note that each chemical element has its unique atomic number.

### 9.6 English-Japanese Dictionary

For simple English-Japanese communication, we prepare a dictionary consisting of 1,500 English words. To make a list of 1,500 English words using a single memory or a single circuit is unrealistic. Therefore, we partition the list into three groups, so that each list contains at most 500 words. Let the names of the three lists be Word list $A$, Word list $B$, and Word list $C$. The maximum number of letters in the word


Fig. 9.5 Implementation of English-Japanese dictionary
lists is 13 , but we only consider the first 8 letters. For English words consisting of fewer than 8 letters, we append blanks to make the length of words 8 . We represent each alphabetic character by 5 bits. So, all the English words are represented by 40 bits. We assume that each group has at most 500 English words, and each word has unique address from 1 to 500 . The address is represented by 9 bits.

Figure 9.5 shows the English-Japanese dictionary consisting of the index generator and a memory. In this dictionary, the index generator finds the index for the English word, and the memory produces the Japanese translation. Note that, in Japanese, 80 outputs are needed to represent the Chinese characters and KANA characters.

### 9.7 Properties of Index Generation Functions

The index generators in Sects. 9.2 and 9.3 have common properties:

1. The values of the nonzero outputs are distinct.
2. The number of nonzero output values is much smaller than the total number of the input combinations.
3. High-speed circuits are required.
4. Data must be updated.

The last condition is very important in communication networks. This means that index generators must be programmable.

Example 9.7.1. Consider the decomposition chart in Fig. 9.6. It shows an input index generation function $F(X)$ with weight 7. $X_{1}=\left(x_{1}, x_{2}, x_{3}, x_{4}\right)$ denotes the bound variables, and $X_{2}=\left(x_{5}\right)$ denotes the free variable. Note that the column multiplicity of this decomposition chart is 7 .

Lemma 9.7.1. The C-measure of an index generation function with weight $k$ is at most $k+1$.

Proof. Since the number of nonzero outputs is $k$, the column multiplicity never exceeds $k+1$.

|  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | $x_{1}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $x_{2}$ |
|  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $x_{3}$ |
|  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | $x_{4}$ |
| 0 | 0 | 1 | 0 | 0 | 3 | 0 | 4 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 7 | 0 | 0 | 0 |  |
| $x_{5}$ |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |

Fig. 9.6 Decomposition chart for $F$

Lemma 9.7.2. Let $F$ be an index generation function with weight $k$. Then, there exists a functional decomposition

$$
F\left(X_{1}, X_{2}\right)=G\left(H\left(X_{1}\right), X_{2}\right)
$$

where $G$ and $H$ are index generation functions, and the weight of $G$ is $k$, and the weight of $H$ is at most $k$.

Proof. Consider a decomposition chart, in which $X_{1}$ denotes the bound variables, and $X_{2}$ denotes the free variables. Let $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{p}\right)$, where $p \geq\left\lceil\log _{2}(k+1)\right\rceil$. Let $H$ be a function where the input variables are $X_{1}$, and the output values are defined as follows: Consider the decomposition chart, where assignments of values to $X_{1}$ label columns (i.e., bound variables). For the assignments to $X_{1}$ corresponding to columns with only zero elements, $H=0$. For other inputs, the outputs are distinct integers from 1 to $w_{h}$, where $w_{h}$ denotes the number of columns that have nonzero element(s). Since $w_{h} \leq k$, the weight of $H$ is at most $k$, and the number of output values of $H$ is at most $k+1$. On the other hand, the function $G$ is obtained from $F$ by reducing some columns that have all zero outputs in the decomposition chart. Thus, the number of nonzero outputs in $G$ is equal to the number of nonzero outputs in $F$. Thus, $G$ is also an index generation function with weight $k$.

Example 9.7.2. Consider the decomposition chart in Fig. 9.6. Let the function $F(X)$ be decomposed as $F\left(X_{1}, X_{2}\right)=G\left(H\left(X_{1}\right), X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, x_{3}, x_{4}\right)$ and $X_{2}=\left(x_{5}\right)$. Table 9.4 shows the function $H$. It is a 4 -variable 3-output index generation logic function with weight 6 . The decomposition chart for the function $G$ is shown in Fig. 9.7. As shown in this example, the functions obtained by decomposing the index generation function $F$ are also index generation functions, and the weights of $F$ and $G$ are both 7 .

Table 9.4 Truth table for $H$

| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $y_{1}$ | $y_{2}$ | $y_{3}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 1 | 0 | 0 | 1 |
| 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 | 0 | 1 | 1 |
| 0 | 1 | 0 | 1 | 0 | 0 | 0 |
| 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 1 | 1 | 0 |
| 1 | 1 | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 0 | 0 | 0 |

Fig. 9.7 Decomposition chart for $G$

|  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $y_{1}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $y_{2}$ |
|  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | $y_{3}$ |
| 0 | 0 | 1 | 0 | 3 | 4 | 5 | 0 | 0 |  |
| 1 | 0 | 0 | 2 | 0 | 0 | 6 | 7 | 0 |  |
| $x_{5}$ |  |  |  |  |  |  |  |  |  |

### 9.8 Realization Using ( $p, q$ )-Elements

Definition 9.8.1. A $(\boldsymbol{p}, \boldsymbol{q})$-element realizes an arbitrary $p$-input $q$-output logic function. Its memory size is $q 2^{p}$.

Theorem 9.8.1. An arbitrary two-valued input n-variable index generation function with weight $k$ can be realized as a multilevel network of $(p, q)$-elements. The number of such elements is at most $\left\lceil\frac{n-q}{p-q}\right\rceil$, where $p>q$ and $q=\left\lceil\log _{2}(k+1)\right\rceil$.

Proof. An index generation logic function $F$ with weight $k$ can be decomposed as

$$
F\left(X_{1}, X_{2}\right)=G\left(H\left(X_{1}\right), X_{2}\right)
$$

where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{p}\right)$. In this case, by Lemma 9.7.2, $G\left(X^{\prime}{ }_{1}, X_{2}\right)$ is also an index generation logic function with weight $k$. Note that the number of input variables for $G$ is reduced to $n-(p-q)$, since the number of output variables of $H$ is $q=\left\lceil\log _{2}(k+1)\right\rceil$. By iterating this operation $\left\lceil\frac{n-p}{p-q}\right\rceil$ times, we can reduce the number of variables to $p$ or fewer. Thus, the index generator can be realized by using only $(p, q)$-elements. The number of elements is at most $\left\lceil\frac{n-p}{p-q}\right\rceil+1=\left\lceil\frac{n-q}{p-q}\right\rceil$.

Fig. 9.8 Realization of index generation function $F$


Example 9.8.1. The number of nonzero outputs in the 5 -variable index generation function $F(X)$ shown in Fig. 9.6 is $k=7$. Since $q=\left\lceil\log _{2}(k+1)\right\rceil=\left\lceil\log _{2}(7+\right.$ $1)\rceil=3$, the index generator can be realized by two $(4,3)$ elements as shown in Fig.9.8.

When realizing an index generator by $(p, q)$-elements, increasing $p$ decreases the number of $(p, q)$-elements, but increases the total amount of memory. On the other hand, decreasing $p$ increases the number of $(p, q)$-elements, but decreases the total amount of memory. The next theorem shows a strategy to design index generators using $(p, q)$-elements. It finds a value of $p$ that minimizes the least upper bound on the total amount of memory without increasing the number of elements.

Theorem 9.8.2. When an index generator is implemented as a multilevel network of ( $p, q$ )-elements, the least upper bound on the total amount of memory is minimized when $p-q=1$ or $p-q=2$.

Proof. When an index generation function is decomposed into $(p, q)$-elements, for each decomposition, we can reduce the number of input variables by $r=p-q$. To reduce $n$ inputs into $q$, we need $s=\left\lceil\frac{n-q}{r}\right\rceil$ functional decompositions. To realize the index generator, we need $s(p, q)$-elements. Thus, the total amount of memory necessary to implement the index generator is $M E M=s \cdot 2^{p} q$. When $n$ is sufficiently large, $M E M$ can be approximated by $\left(\frac{2^{r}}{r}\right) \cdot(n-q) \cdot 2^{q} q$. Since $n$ and $q$ are fixed for a given problem, only $r$ can be changed. Note that $\frac{2^{r}}{r}$ takes its minimum when $r=1$ or $r=2$. Hence we have the theorem.

Since networks with fewer levels are desirable, we often select $r=p-q=2$ to design the index generator.

Theorem 9.8 .1 shows that we can design an index generator as a multilevel network of $(p, q)$-elements by iterations of functional decompositions.

The next Example 9.8.2 shows that we can generate various multilevel logic networks, including cascades.

Example 9.8.2. Let us design index generators, where the number of inputs is $n=48$ and the weight is $k=255$. Since $q=\left\lceil\log _{2}(255+1)\right\rceil=8$, when $p=10$, the total amount of memory is minimized, and also the number of levels is minimized. For each $(p, q)$-element, we can reduce the number of input lines by two. So, by using $20(p, q)$-elements, we can reduce the number of inputs into


Fig. 9.9 Cascade realization of index generator


Fig. 9.10 Index generator $(p=10)$
8. For example, we have the LUT cascade as shown in Fig. 9.9. Or, we have the multilevel logic network as shown in Fig. 9.10, where the number of levels is 10 . In this case, the variables are permuted during functional decompositions. Note that both structures require the same amount of memory: 160 K bits. We can further reduce the number of levels by using elements with more inputs. Figure 9.11 shows an example with $p=11$ and $q=8$. In this case, the number of elements is $(48-8) /(11-8)=14$, the number of levels is 8 , and the total amount of memory is 212 Kibits, where 1 Kibit denotes $2^{10}=1024$ bits. Figure 9.12 show an example with $p=12$ and $q=8$. In this case, the number of elements is $(48-8)$ / $(12-8)=10$, the number of levels is 5 , and the total amount of memory is 320 Kibits.

Theorem 9.8 .2 shows the strategy for general index generators. It minimizes the least upper bound on the total amount of memory. For a particular index generator, the total amount of memory can be minimum for cases other than $p-q=2$. The next example illustrates this.

Example 9.8.3. Consider the 6-variable index generation function $F(X)$ shown in Table 9.5. Let the function $F(X)$ be decomposed as $F\left(X_{1}, X_{2}\right)=G\left(H\left(X_{1}\right), X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, x_{3}, x_{4}\right)$ and $X_{2}=\left(x_{5}, x_{6}\right)$. The column multiplicity of the decomposition chart in Table 9.5 is 2. Table 9.6 is the truth table of $H$, and Table 9.7 is the truth table of $G$. This index generator can be implemented as Fig. 9.13. In this case, the weight of the function is $k=7$, but $H$ is realized by a (4,1)-element.


Fig. 9.11 Index generator $(p=11)$
Fig. 9.12 Index generator ( $p=12$ )


Table 9.5 Index generation function $F$

| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $x_{5}$ | $x_{6}$ | $f_{1}$ | $f_{2}$ | $f_{3}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | - | - | - | - | - | 0 | 0 | 0 |
| - | 0 | - | - | - | - | 0 | 0 | 0 |
| - | - | 0 | - | - | - | 0 | 0 | 0 |
| - | - | - | 0 | - | - | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 |
| 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |

### 9.9 Realization of Logic Functions with Weight $k$

Up to now, we have considered the realization of index generation functions. Next, we consider the realization of general logic functions.

Table 9.6 Truth table for $H$

| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $y_{1}$ |
| :---: | :---: | :---: | :---: | :---: |
| 1 | 1 | 1 | 1 | 1 |
| - | 0 | 0 | 0 | 0 |
| 0 | - | 0 | 0 | 0 |
| 0 | 0 | - | 0 | 0 |
| 0 | 0 | 0 | - | 0 |

Table 9.7 Truth table for $G$

| $y_{1}$ | $x_{5}$ | $x_{6}$ | $f_{1}$ | $f_{2}$ | $f_{3}$ |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 0 | 1 | 1 |
| 1 | 1 | 1 | 1 | 0 | 0 |

Fig. 9.13 Index generator for Table 9.5


Theorem 9.9.1. An arbitrary two-valued $n$-variable $u$-output function with weight $k$ is realized as a multilevel network of $(p, q)$-elements. The number of elements needed is at most $\left\lceil\frac{n-q}{p-q}\right\rceil+\left\lceil\frac{u}{q}\right\rceil$, where $p>q$ and $q=\left\lceil\log _{2}(k+1)\right\rceil$.

Proof. An arbitrary logic function with weight $k$ can be realized as a cascade of an index generator and a decoder, where the index generator produces unique indices for $k$ input combinations, and the decoder converts each index into corresponding outputs. The number of inputs of the decoder is at $\operatorname{most}\left\lceil\log _{2}(k+1)\right\rceil$. By Theorem 9.8.1, the index generator can be realized with $s_{1}=\left\lceil\frac{n-q}{p-q}\right\rceil$ elements. Also, note that the decoder can be realized by $s_{2}$ modules of $q$-input $q$-output elements, where $s_{2}$ is given by $s_{2}=\left\lceil\frac{u}{q}\right\rceil$. Thus, total number of elements is $s_{1}+s_{2}=\left\lceil\frac{n-q}{p-q}\right\rceil+\left\lceil\frac{u}{q}\right\rceil$.

Corollary 9.9.1. An arbitrary two-valued n-variable single-output logic function with weight $k$ is realized as a multilevel network of $(p, q)$-elements. The number of elements needed is at most $\left\lceil\frac{n-q}{p-q}\right\rceil$, where $p>q$ and $q=\left\lceil\log _{2}(k+1)\right\rceil$.

Fig. 9.14 Index generator with weight $k \leq 32$


### 9.10 Remarks

Index generation functions are multiple-output logic functions useful for pattern matching and communication circuits. Logic synthesis for index generation functions is considered in Chaps. 10 and 11. This chapter is based on [124].

## Problems

9.1. Design an index generator where the number of inputs is $n=32$, and the weight is $k=63$. Use 8 -input 6 -output LUTs. Show both the cascade and minimum-delay realizations.
9.2. An index generation function can be directly implemented by a PLA. Discuss the advantage and disadvantage to implement an index generation function using memory instead of a PLA. Consider the case of $n=48$ and $k=400$.
9.3. When $k \leq 32$, an arbitrary index generation function of 12 variables with weight $k$ can be realized by the circuit structure shown in Fig. 9.14. Compare this realization with an LUT cascade realization with respect to the size and speed. Assume that we use 6-LUTs.
9.4. Design an index generator for $n=16$ and $k=3$ by using (4, 2)-elements.

## Chapter 10 <br> Hash-Based Synthesis

In Chap. 5, we considered cascade-based realizations of logic functions with small C-measure. When the weight $k$ of the function satisfies the relation $\left\lceil\log _{2}(k+1)\right\rceil<$ $K$, the function can be efficiently realized by a cascade of $K$-input cells. Also, as shown in Chap. 9, an index generation function can be implemented by a multilevel network of $(p, q)$-elements, or by a multilevel network with $K$-LUTs. However, when $\left\lceil\log _{2}(k+1)\right\rceil \geq K$, such methods are not directly applicable.

This chapter presents the hybrid method, the super hybrid method, and the parallel sieve method. These methods efficiently implement index generation functions using memories. They are particularly suitable for FPGA realizations, since most FPGAs have both LUTs and embedded memories inside. These methods use pairs of smaller memories to implement most of the registered vectors.

### 10.1 Hash Function

Hash functions are often used in software implementations. To show the idea, consider the following:

Example 10.1.1. Assume that one needs to find a name of an employee from his or her 10 -digit telephone number, in a company with 5,000 employees. A straightforward method to do this is to build a complete table of 10 -digit telephone numbers showing the names of the employees. However, this method is unrealistic, since the table has $10^{10}$ entries, most of which are empty. To avoid such a problem, a hash table can be used. Let $x$ be the telephone number, and consider the hash function ${ }^{1}$ :

$$
\operatorname{hash}(x)=x(\bmod 9973)
$$

In this case, the name of the employee can be found from the hash table with 9973 entries, since the value of $\operatorname{hash}(x)$ is between 0 and 9972. When two or more

[^1]Table 10.1 Registered vector table

| Index | Vector |  |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2 | 0 | 1 | 0 | 0 | 1 | 0 |
| 3 | 0 | 0 | 1 | 0 | 1 | 0 |
| 4 | 0 | 0 | 1 | 1 | 1 | 0 |
| 5 | 0 | 0 | 0 | 0 | 0 | 1 |
| 6 | 1 | 1 | 1 | 0 | 1 | 1 |
| 7 | 0 | 1 | 0 | 1 | 1 | 1 |

Table 10.2 Example of an index generation function

| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $x_{5}$ | $x_{6}$ | f |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 0 | 1 | 0 | 0 | 1 | 0 | 2 |
| 0 | 0 | 1 | 0 | 1 | 0 | 3 |
| 0 | 0 | 1 | 1 | 1 | 0 | 4 |
| 0 | 0 | 0 | 0 | 0 | 1 | 5 |
| 1 | 1 | 1 | 0 | 1 | 1 | 6 |
| 0 | 1 | 0 | 1 | 1 | 1 | 7 |

different employees have the same hash value, a collision occurs. In such a case, the employees with the same hash value are represented by a linked list. Note that using a hash table, the number of digits for table look-up is reduced from 10 to 4 .

This chapter shows a hash method for hardware implementation. With this method, the number of variables can be reduced and the size of memories can be also reduced.

Besides index generation functions, this design method can implement an $n$-variable function where the number of nonzero outputs $k$ is much smaller than $2^{n}$.

Example 10.1.2. Table 10.1 shows a registered vector table consisting of 7 vectors. The corresponding index generation logic function shown in Table 10.2 produces a 3-bit number (e.g., 001) of the matched vector. When no entry matches the input vector, the function produces 000 .

### 10.2 Index Generation Unit

Figure 10.1 shows the Index Generation Unit (IGU). The programmable hash circuit has $n$ inputs and $p$ outputs. It is used to rearrange the nonzero elements. We consider two types of programmable hash circuits. The first type is the double-input hash circuit shown in Fig. 10.2. It performs a linear transformation $y_{i}=x_{i} \oplus x_{j}$ or $y_{i}=x_{i}$, where $i \neq j$. It uses a pair of multiplexers for each variable $y_{i}$. The upper multiplexers have the inputs $x_{1}, x_{2}, \ldots, x_{n}$. The register with $\left\lceil\log _{2} n\right\rceil$ bits specifies which variable to select by the multiplexer. The lower multiplexers have the inputs $x_{1}, x_{2}, \ldots, x_{n}$, except for $x_{i}$. For the $i$-th input, the constant input 0 is


Fig. 10.1 Index generation unit (IGU)

Fig. 10.2 Double-input hash circuit


Fig. 10.3 Single-input hash circuit

connected instead of $x_{i}$. By setting $y_{i}=x_{i} \oplus 0$, we can implement $y_{i}=x_{i}$. The second type of a programmable hash circuit is the single-input hash circuit shown in Fig. 10.3. It consists of only $p$ multiplexers, and selects $p$ variables from $n$ input variables. Note that both types of hash circuits produce only specific kinds of hash
functions. We have found that these functions are suitable for our application. The main memory has $p$ inputs and $\left\lceil\log _{2}(k+1)\right\rceil$ outputs. The main memory produces correct outputs only for registered vectors. However, it may produce incorrect outputs for nonregistered vectors, because the number of input variables is reduced. In an index generation function, if the input vector is nonregistered, then it should produce $00 \ldots 0$. To check whether the main memory produces the correct output or not, we use the AUX memory. The AUX memory has $\left\lceil\log _{2}(k+1)\right\rceil$ inputs and ( $n-p$ ) outputs: It stores the $X_{2}$ part of the registered vectors for each index. The comparator checks if the inputs are the same as the registered vector or not. If they are the same, the main memory produces a correct output. Otherwise, the main memory produces a wrong output, and the input vector is nonregistered. Thus, the output AND gates produce $00 \ldots 0$, showing that the input vector is nonregistered. Note that the main memory produces the correct outputs only for the registered vectors.

Example 10.2.1. Consider the registered vectors in Table 10.3. The number of variables is four, but only two variables $x_{1}$ and $x_{4}$ are necessary to distinguish these four registered vectors. Figure 10.4 shows the IGU. In this case, the programmable hash circuit produces $Y_{1}=\left(x_{1}, x_{4}\right)$ from $X=\left(x_{1}, x_{2}, x_{3}, x_{4}\right)$. The main memory stores the indices for $X_{1}=Y_{1}=\left(x_{1}, x_{4}\right)$, and the AUX memory stores the values of $X_{2}=\left(x_{2}, x_{3}\right)$ for the corresponding registered vector.

Table 10.3 Index
generation function

| Inputs |  |  |  | Index |  |
| :--- | :--- | :--- | :--- | :--- | :--- |
|  | $x_{1}$ | $x_{3}$ | $x_{4}$ | f |  |
| 0 | 0 | 1 | 0 |  | 1 |
| 0 | 1 | 1 | 1 |  | 2 |
| 1 | 1 | 0 | 0 |  | 3 |
| 1 | 1 | 1 | 1 | 4 |  |



Fig. 10.4 When the input vector is registered

## When the input vector is registered

Suppose that a registered vector $\left(x_{1}, x_{2}, x_{3}, x_{4}\right)=(1,1,0,0)$ is applied to the IGU in Fig. 10.4. First, the programmable hash circuit selects two variables, $x_{1}$ and $x_{4}$, and produces the value $X_{1}=\left(x_{1}, x_{4}\right)=(1,0)$. Second, the main memory produces the corresponding index $(0,1,1)$. Third, the AUX memory produces the values of $X_{2}=\left(x_{2}, x_{3}\right)=(1,0)$ corresponding registered vector $(1,1,0,0)$. Fourth, the comparator confirms that the values of $X_{2}=\left(x_{2}, x_{3}\right)$ of the input vector is equal to the output of the AUX memory. And, finally, the AND gate produces the index for the input vector.

## When the input vector is not registered

Suppose that a nonregistered vector $\left(x_{1}, x_{2}, x_{3}, x_{4}\right)=(1,0,1,0)$ is applied to the IGU in Fig. 10.5. In this case, the main memory also produces the index $(0,1,1)$, and the AUX memory produces the values of $X_{2}=\left(x_{2}, x_{3}\right)$ for the corresponding registered vector $(1,1,0,0)$. However, in this case, the comparator indicates that $X_{2}=\left(x_{2}, x_{3}\right)=(0,1)$ is different from the output $X_{2}=\left(x_{2}, x_{3}\right)$ of the AUX memory. Thus, the AND gate produces zero output, which shows that the input vector is not registered.

Unfortunately, not all index generation functions have the nice properties of Example 10.2.1. So, we decompose the given function into two:

1. A function that is implemented by an IGU.
2. The remaining part.


Fig. 10.5 When the input vector is not registered

Given an index generation function $f\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{p}\right)$ and $X_{2}=\left(x_{p+1}, x_{p+2}, \ldots, x_{n}\right)$, we decompose it into two disjoint subfunctions:

$$
f\left(X_{1}, X_{2}\right)=\hat{f}_{1}\left(Y_{1}, X_{2}\right) \vee f_{2}\left(X_{1}, X_{2}\right)
$$

where each column of the decomposition chart for $\hat{f}_{1}\left(Y_{1}, X_{2}\right)$ has at most one nonzero element. In this case, $\hat{f}_{1}\left(Y_{1}, X_{2}\right)$ can be implemented by an IGU, where the inputs to the main memory is $Y_{1}=\left(y_{1}, y_{2}, \ldots, y_{p}\right)$. Since $f_{2}\left(X_{1}, X_{2}\right)$ has fewer nonzero elements than the original function, it is simpler to implement.

Theorem 10.2.1. Consider the IGU in Fig. 10.1. Assume that $Y_{1}=\left(y_{1}, y_{2}, \ldots, y_{p}\right)$, where $y_{i}=x_{i} \oplus x_{j}$ for $j \in\{p+1, p+2, \ldots, n\}$, or $y_{i}=x_{i}$, are applied to the input to the main memory. If the main memory of an IGU implements the function $\tilde{g}\left(Y_{1}\right)$, where $\tilde{g}\left(Y_{1}\right)$ produces the nonzero value when the column $Y_{1}$ of the decomposition chart for $\hat{f}_{1}\left(Y_{1}, X_{2}\right)$ has a nonzero value, and $\tilde{g}\left(Y_{1}\right)$ produces 0 otherwise, then only the values for $X_{2}$ must be stored in the AUX memory.
Proof. Consider the decomposition chart of the function $\hat{f}_{1}\left(Y_{1}, X_{2}\right)$. By construction, each column of the decomposition chart has at most one nonzero element. When a registered vector is applied to the IGU, the main memory produces a nonzero output. In this case, the $X_{2}$ part of the input vector is equal to the output of the AUX memory, showing that the vector is registered. Thus, the IGU produces the correct nonzero output.

Assume that the input vector is not registered, but the output of the AUX memory is equal to the $X_{2}$ part of the input vector. We have two cases:

1. The main memory produces the zero-output.

In this case, even if the $X_{2}$ part of the input vector is equal to the output of the AUX memory, the output of the main memory is zero. Thus, the IGU produces the correct output.
2. The main memory produces a nonzero output.

Due to the construction of the IGU, the input vector is registered. However, this contradicts the assumption. So, such a case never happens.

### 10.3 Reduction by a Linear Transformation

As will be suggested by Conjecture 11.5.2, most incompletely specified index generation functions with weight $k$ can be represented by at most $p=2\left\lceil\log _{2}(k+\right.$ 1) $\rceil-3$ variables. However, there exist functions that require more variables. Example 10.3.1 shows such a function. In this case, we can often reduce the number of variables by a linear transformation of the input variables.

Example 10.3.1. Consider the incompletely specified index generation function shown in Table 10.4. Note that all the variables are essential in $f$. Now, replace the variables $x_{1}, x_{2}, x_{3}$, and $x_{4}$ with $y_{1}=x_{1} \oplus x_{4}, y_{2}=x_{2} \oplus x_{4}, y_{3}=x_{3}$,

Table 10.4 Original index generation function

| Inputs |  |  |  | Index |
| :--- | :--- | :--- | :--- | :--- |
| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | f |
| 1 | 0 | 0 | 0 | 1 |
| 0 | 1 | 0 | 0 | 2 |
| 0 | 0 | 1 | 0 | 3 |
| 0 | 0 | 0 | 1 | 4 |
| 0 | 0 | 0 | 0 | 5 |

Table 10.5 Transformed index generation function

Table 10.6 Registered vector table for a 6 -variable function

| Inputs |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | | Index |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- |
|  | $y_{2}$ | $x_{3}$ | $x_{4}$ | g |
| 1 | 0 | 0 | 0 | 1 |
| 0 | 1 | 0 | 0 | 2 |
| 0 | 0 | 1 | 0 | 3 |
| 1 | 1 | 0 | 1 | 4 |
| 0 | 0 | 0 | 0 | 5 |


| Vector |  |  |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $x_{5}$ | $x_{6}$ | Index |
| 1 | 0 | 0 | 0 | 0 | 0 | 1 |
| 0 | 1 | 0 | 0 | 0 | 0 | 2 |
| 0 | 0 | 1 | 0 | 0 | 0 | 3 |
| 0 | 0 | 0 | 1 | 0 | 0 | 4 |
| 0 | 0 | 0 | 0 | 1 | 0 | 5 |
| 0 | 0 | 0 | 0 | 0 | 1 | 6 |
| 0 | 0 | 0 | 0 | 0 | 0 | 7 |

and $y_{4}=x_{4}$, respectively. Then, $f$ can be represented as the index generation function $g\left(y_{1}, y_{2}, y_{3}, y_{4}\right)$ shown in Table 10.5 . Note that $g$ can be represented using only $y_{1}, y_{2}$, and $x_{3}$, since they can uniquely specify five different patterns. The programmable hash circuit in Fig. 10.3 performs this linear transformation.

Example 10.3.2. In the registered vector table in Table 10.6, the number of 0's is much larger than that of 1's.

1. A single-input hash circuit is used.

In this case, all the variables are necessary to represent the function, since any change of each variable from $(0,0,0,0,0,0)$ will change the value of the function. Thus, the main memory requires 6 variables.
2. A double-input hash circuit is used.

Consider the transform:

$$
\begin{aligned}
& y_{1}=x_{1} \oplus x_{5} \\
& y_{2}=x_{2} \oplus x_{5} \\
& y_{3}=x_{3} \oplus x_{6} \\
& y_{4}=x_{4} \oplus x_{6}
\end{aligned}
$$

Table 10.7 Registered vector table for IGU with double-input hash circuit

| Vector |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- |
| $y_{1}$ | $y_{2}$ | $y_{3}$ | $y_{4}$ | Index |
| 1 | 0 | 0 | 0 | 1 |
| 0 | 1 | 0 | 0 | 2 |
| 0 | 0 | 1 | 0 | 3 |
| 0 | 0 | 0 | 1 | 4 |
| 1 | 1 | 0 | 0 | 5 |
| 0 | 0 | 1 | 1 | 6 |
| 0 | 0 | 0 | 0 | 7 |

Table 10.8 Registered vector table for IGU with triple-input hash circuit

| Vector |  |  |  |
| :--- | :--- | :--- | :--- |
| $z_{1}$ | $z_{2}$ | $z_{3}$ | Index |
| 1 | 0 | 0 | 1 |
| 0 | 1 | 0 | 2 |
| 0 | 0 | 1 | 3 |
| 0 | 1 | 1 | 4 |
| 1 | 0 | 1 | 5 |
| 1 | 1 | 0 | 6 |
| 0 | 0 | 0 | 7 |

Table 10.7 shows the transformed function. In this case, all the patterns are different. This means that these four variables are sufficient to represent the function. In fact, this is a minimum solution when a double-input hash circuit is used.
3. A triple-input hash circuit is used.

Consider the transform:

$$
\begin{aligned}
& z_{1}=x_{1} \oplus x_{5} \oplus x_{6} \\
& z_{2}=x_{2} \oplus x_{4} \oplus x_{6} \\
& z_{3}=x_{3} \oplus x_{4} \oplus x_{5}
\end{aligned}
$$

Table 10.8 shows the transformed function. In this case, all the patterns are different. This means that three variables are sufficient to represent the function. In fact, this is a minimum solution when a hash circuit with any number of inputs is used.

We have developed a heuristic algorithm [135] to find a linear transformation that reduces the number of variables, when the double-input hash circuit is used. To find a linear transformation, we use the following:

Theorem 10.3.1. Let $f\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ be an incompletely specified index generation function. Let $Y_{1}=\left(y_{1}, y_{2}, \ldots, y_{p}\right)$, where $y_{i}=x_{i} \oplus x_{j}$ and $j \in\{p+1, p+2, \ldots, n\}$, and $X_{2}=\left(x_{p+1}, x_{p+2}, \ldots, x_{n}\right)$. Consider the transformed function $g\left(Y_{1}, X_{2}\right)=f\left(X_{1}, X_{2}\right)$. Then, $f$ can be represented using only $Y_{1}$, if each column of the decomposition chart $\left(Y_{1}, X_{2}\right)$ has at most one specified element.

### 10.4 Hybrid Method

From here, we consider methods to implement an index generation function using memories. In the index generation function, the number of registered vectors $k$, is usually much smaller than $2^{n}$, the total number of the input combinations.

Definition 10.4.1. The hybrid method is an implementation of an index generation function using the circuit consisting of an IGU as shown in Fig. 10.6. An IGU is used to realize most of the registered vectors, while a rewritable PLA is used to realize remaining registered vectors. The OR gate in the output combines the indices to form a single output. The rewritable PLA can be replaced by another circuit, such as an LUT cascade or a CAM.

In the hybrid method, the main memory has $p=q+2$ inputs, and realizes $88 \%$ of the registered vectors, where $q=\left\lceil\log _{2}(k+1)\right\rceil$. The rest of the registered vectors are implemented by the rewritable PLA.

Example 10.4.1. For the circuit shown in Fig. 10.7, the values of only $X_{2}=$ $\left(x_{4}, x_{5}, x_{6}\right)$ are compared with the output of the AUX memory to check if the main memory produces the correct output.


Fig. 10.6 Index generator using hybrid method


Fig. 10.7 6-variable function implemented by a hybrid method

Table 10.9 Decomposition chart for $f\left(X_{1}, X_{2}\right)$

|  |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $x_{3}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  |  |  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $x_{2}$ |
|  |  |  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | $x_{1}$ |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 0 | 1 | 0 | 1 | 0 | 2 | 0 | 3 | 0 | 0 | 0 |  |
| 0 | 1 | 1 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 |  |
| 1 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 |  |
| 1 | 1 | 1 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 |  |
| $x_{6}$ | $x_{5}$ | $x_{4}$ |  |  |  |  |  |  |  |  |  |

Table 10.10 Decomposition
chart for $\hat{f}\left(Y_{1}, X_{2}\right)$
(transformed function)

|  |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $y_{3}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  |  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $y_{2}$ |
|  |  |  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | $y_{1}$ |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 0 | 1 | 0 | 2 | 0 | 1 | 0 | 0 | 0 | 3 | 0 |  |
| 0 | 1 | 1 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 |  |
| 1 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 |  |
| 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | 0 |  |
| $x_{6}$ | $x_{5}$ | $x_{4}$ |  |  |  |  |  |  |  |  |  |

Example 10.4.2. Consider the index generation function defined by Table 10.2. Table 10.9 is a decomposition chart of a 6 -variable function $f\left(X_{1}, X_{2}\right)$ with weight $k=7$. In this function, transform the variables $X_{1}=\left(x_{1}, x_{2}, x_{3}\right)$ into $Y_{1}=$ $\left(y_{1}, y_{2}, y_{3}\right)=\left(x_{1} \oplus x_{6}, x_{2} \oplus x_{5}, x_{3} \oplus x_{4}\right)$. The decomposition chart of the transformed function $\hat{f}\left(Y_{1}, X_{2}\right)$ is shown in Table 10.10. In the transformed function, the columns of the original truth tables are permuted. Also, each row has a different permutation. In the original table, three columns for $\left(x_{1}, x_{2}, x_{3}\right)=$ $(0,0,0),(0,1,0),(0,0,1)$ have two nonzero elements. On the other hand, in the decomposition chart in Table 10.10, for the transformed function $\hat{h}\left(Y_{1}, X_{2}\right)$, only one column $\left(y_{1}, y_{2}, y_{3}\right)=(0,1,0)$ has two nonzero elements. Let $\hat{f}_{1}\left(Y_{1}, X_{2}\right)$ be the function where the nonzero element 4 is replaced by 0 . The decomposition chart is shown in Table 10.11. Table 10.12 shows the decomposition chart of the function $\hat{f}_{2}\left(Y_{1}, X_{2}\right)$ that is realized by the rewritable PLA. In this case, the function has only one nonzero element. $\hat{f}_{1}\left(Y_{1}, X_{2}\right)$ is implemented by the main memory shown in Table 10.13 and the AUX memory shown in Table 10.14. The output of the main memory $\hat{f}\left(Y_{1}\right)$ shows the nonzero value of the function $\hat{f}_{1}$ for the column $Y_{1}=$ $\left(y_{1}, y_{2}, y_{3}\right)$. The AUX memory shown in Table 10.14 stores the corresponding values of $x_{4}, x_{5}$, and $x_{6}$ when $f\left(X_{1}, X_{2}\right)$ takes nonzero values. Figure 10.8 shows that the pair of the main memory and the AUX memory is sufficient to represent

Table 10.11 Decomposition chart for $\hat{f}_{1}\left(Y_{1}, X_{2}\right)$

|  |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $y_{3}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  |  |  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $y_{2}$ |
|  |  |  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | $y_{1}$ |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 0 | 1 | 0 | 2 | 0 | 1 | 0 | 0 | 0 | 3 | 0 |  |
| 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 1 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 |  |
| 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | 0 |  |
| $x_{6}$ | $x_{5}$ | $x_{4}$ |  |  |  |  |  |  |  |  |  |

Table 10.12 Decomposition chart for $\hat{f}_{2}\left(Y_{1}, X_{2}\right)$

|  |  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $y_{3}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  |  |  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $y_{2}$ |
|  |  |  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | $y_{1}$ |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 0 | 1 | 1 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 |  |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| $x_{6}$ | $x_{5}$ | $x_{4}$ |  |  |  |  |  |  |  |  |  |

Table 10.13 Function $\hat{h}\left(Y_{1}\right)$ realized by the main memory

| $y_{3}$ | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $y_{2}$ | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| $y_{1}$ | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| $\hat{h}\left(Y_{1}\right)$ | 2 | 5 | 1 | 0 | 6 | 7 | 3 | 0 |

Table 10.14 Contents of the AUX memory

| $z_{3}$ | $z_{2}$ | $z_{1}$ | $x_{4}$ | $x_{5}$ | $x_{6}$ |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 | 1 | 0 |
| 0 | 1 | 0 | 0 | 1 | 0 |
| 0 | 1 | 1 | 0 | 1 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 0 | 0 | 1 |
| 1 | 1 | 0 | 0 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 |



Fig. 10.8 Index generator implemented using a sparse matrix technique
the information of Table 10.11. Table 10.13 is implemented by the main memory. The function that is implemented by the rewritable PLA has nonzero output 4. The corresponding input values are $\left(x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{6}\right)=(0,0,1,1,1,0)$. The nonzero output is 4 , and its binary representation is $(1,0,0)$. This is implemented by ORing the most significant bit of the AND gates. Figure 10.7 shows the whole circuit for function $f$. The AUX memory and comparator check if $\left(x_{4}, x_{5}, x_{6}\right)$ is the input that produces the nonzero output.

### 10.5 Registered Vectors Realized by Main Memory

In this part, we assume that the nonzero elements in the index generation function are uniformly distributed in the decomposition chart. In this case, we can estimate the fraction of registered vectors realized by the main memory.

Lemma 10.5.1. Let $f(X)$ be a uniformly distributed index function of $n$ variables with weight $k$. Consider a decomposition chart, and let $p$ be the number of bound variables. Then, the probability that a column of the decomposition chart has all zero elements is approximately $\mathrm{e}^{-\xi}$, where $\xi=\frac{\dot{k}}{2^{p}}$.

Proof. The probability that a function takes a specified value is $\alpha=\frac{k}{2^{n}}$. The probability that a function takes a zero value is $\beta=1-\alpha$. Since the decomposition chart has $2^{n-p}$ rows, the probability that a column of the chart has all zero elements is

$$
\beta^{2^{n-p}}=(1-\alpha)^{2^{n-p}} .
$$

Since $\alpha=\frac{k}{2^{n}}$ is sufficiently small, by Lemma 3.7.1, $\beta=1-\alpha$ can be approximated by $\mathrm{e}^{-\alpha}$. Thus, we have

$$
\beta^{2^{n-p}} \simeq \mathrm{e}^{-\alpha 2^{n-p}}=\mathrm{e}^{-\xi},
$$

where $\xi=\frac{k}{2^{p}}$,
Theorem 10.5.1. Consider a set of uniformly distributed index generation functions $f\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ with weight $k$. Consider an IGU whose inputs to the main memory are $x_{1}, x_{2}, \ldots$, and $x_{p}$. Then, the expected number of registered vectors of $f$ that can be realized by the $I G U$ is $2^{p}\left(1-\mathrm{e}^{-\xi}\right)$, where $\xi=\frac{k}{2^{p}}$.

Proof. Let $\left(X_{1}, X_{2}\right)$ be a partition of the input variables $X$, where $X_{1}=$ $\left(x_{1}, x_{2}, \ldots, x_{p}\right)$ and $X_{2}=\left(x_{p+1}, x_{p+2}, \ldots, x_{n}\right)$. Consider the decomposition chart for $f\left(X_{1}, X_{2}\right)$, where $X_{1}$ labels the column variables and $X_{2}$ labels the row variables. If a column has at least one care element, then the IGU can realize an element of the column. From Lemma 10.5.1, the probability that each column has at least one nonzero element is $1-\mathrm{e}^{-\xi}$, where $\xi=\frac{k}{2^{p}}$. Since there are $2^{p}$ columns, the expected number of registered vectors realized by the IGU is $2^{p}\left(1-\mathrm{e}^{-\xi}\right)$.

Example 10.5.1. Table 10.9 is the decomposition chart for a 6 -variable index generation function with weight $k=7$. Note that $X_{1}=\left(x_{1}, x_{2}, x_{3}\right)$ denotes the bound variables, and $X_{2}=\left(x_{4}, x_{5}, x_{6}\right)$ denotes the free variables. In this case, three columns $\left(x_{1}, x_{2}, x_{3}\right)=(0,0,1),(0,1,1),(1,0,1)$, and $(1,1,0)$ have all zero elements. In the other words, the fraction of columns that have all zero elements is $\frac{4}{8}=0.5$. In Lemma 10.5.1, we have $n=6, p=3$, and $\xi=\frac{k}{2^{p}}=0.875$. It shows that the probability that a column has all zero elements is $\mathrm{e}^{-\xi}=0.4169$. In Theorem 10.5.1, the expected number of vectors realized by the IGU is

$$
2^{p}\left(1-\mathrm{e}^{-\xi}\right)=8 \times 0.583=4.665
$$

In Table 10.9, four vectors for 1, 2, 3, 6 can be realized by an IGU. The remaining vectors should be realized by other parts of the circuit.

Corollary 10.5.1. Consider a set of uniformly distributed incompletely specified index generation functions $f\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ with weight $k$. Consider an IGU whose inputs to the main memory are $x_{1}, x_{2}, \ldots$, and $x_{p}$. Then, the fraction of registered vectors of $f$ that can be realized by the IGU is

$$
\delta=\frac{1-\mathrm{e}^{-\xi}}{\xi}
$$

where $\xi=\frac{k}{2^{p}}$.
For example, when $\frac{k}{2^{p}}=\frac{1}{4}$, we have $\delta \simeq 0.8848$, when $\frac{k}{2^{p}}=\frac{1}{2}$, we have $\delta \simeq 0.7869$, and when $\frac{k}{2^{p}}=1$, we have $\delta \simeq 0.63212$.
Example 10.5.2. Consider the case of $n=40$ and $k=1730$. Let us compare two realizations: LUT cascade and hash-based. Since $q=\left\lceil\log _{2}(k+1)\right\rceil=\left\lceil\log _{2}(1730+\right.$ $1)\rceil=11$, the number of bound variables is $p=13$.

## 1. Realization with an LUT cascade alone

Let $p=13$ be the number of inputs for cells. Then, from Lemma 5.1.5, the number of levels $s$ of the cascade is given by

$$
s=\left\lceil\frac{n-q}{p-q}\right\rceil=\left\lceil\frac{40-11}{13-11}\right\rceil=\left\lceil\frac{29}{2}\right\rceil=15 .
$$

For each cell, the size of the memory is $2^{p} \times q=2^{13} \times 11$ bits. Thus, the total amount of memory is $2^{13} \times 11 \times 15=1351680$ bits.

## 2. Realization with the hybrid method

From the Corollary 10.5.1, the fraction of registered vectors of $f$ that can be realized by the IGU is

$$
\delta \simeq \frac{1-\mathrm{e}^{-\xi}}{\xi}=0.9015
$$

The main memory has $p=13$ inputs and $q=11$ outputs. The AUX memory has $q=11$ inputs and $r=n-p=27$ outputs. The LUT cascade realizes the index generation function with weight $1730 \times(1-0.901) \simeq 171$. In this case, each cell in the cascade has $\left\lceil\log _{2}(171+1)\right\rceil=8$ outputs. Let the number of inputs of cells be 10 . Then, the number of levels in the LUT cascade is

$$
\left\lceil\frac{n-q}{p-q}\right\rceil=\left\lceil\frac{40-8}{10-8}\right\rceil=\left\lceil\frac{32}{2}\right\rceil=16
$$

Note that the size of a cell except for the last stage is $2^{10} \times 8$ bits. The size of the cell in the last stage is $2^{10} \times 11$ bits. Thus, the total amount of memory for the cascade is $2^{10} \times 8 \times 15+2^{10} \times 11=134,144$ bits. The size of the main memory is $2^{13} \times 11=90,112$ bits. The size of the AUX memory is $2^{11} \times 27=55,296$ bits. Thus, the total amount of memory is 279,552 bits, which is $20.7 \%$ of the total memory for the LUT cascade-only realization.

In this example, the hybrid method requires a smaller amount of memory than the LUT cascade alone.

### 10.6 Super Hybrid Method

In the hybrid method, about $88 \%$ of the registered vectors are implemented by an IGU, and the remaining $12 \%$ are implemented by the PLA. When we use two IGUs, about $96 \%$ of the registered vectors are implemented by IGUs and the remaining $4 \%$ are implemented by the PLA.

Definition 10.6.1. The super hybrid method is an implementation of an index generation function using a circuit consisting of two IGUs, as shown in Fig. 10.9.


Fig. 10.9 Index generator implemented by super hybrid method
$I G U_{1}$ is used to realize most of the registered vectors, $I G U_{2}$ is used to realize the registered vectors not realized by $I G U_{1}$, and the rewritable PLA is used to realize registered vectors not realized by either $I G U$. The OR gate in the output combines the indices to form a single output. The rewritable PLA can be replaced by another circuit, such as an LUT cascade, a CAM or an IGU.

The super hybrid method shown in Fig. 10.9 is more complicated than the hybrid method, but requires smaller memory overall. In this method, the main memory in $I G U_{1}$ has $p=q+1$ inputs, and realizes $79 \%$ of the registered vectors. The main memory in $I G U_{2}$ has $p=q$ inputs, and realizes $16.6 \%$ of registered vectors. The rest of the registered vectors are implemented by the rewritable PLA.

Hybrid Method. In a typical hybrid method, the main memory has $p=q+2$ inputs and $q=\left\lceil\log _{2}(k+1)\right\rceil$ outputs, while the AUX memory has $q$ inputs and $n-q-2$ outputs. Therefore, the total amount of memory is

$$
M_{1}=q \cdot 2^{q+2}+(n-q-2) \cdot 2^{q}=(4 n+12 q-8) \cdot 2^{q-2}
$$

Super Hybrid Method. In a typical super hybrid method, in $I G U_{1}$, the main memory has $p_{1}=q+1$ inputs and $q$ outputs, while the AUX memory has $q$ inputs and $n-q-1$ outputs. Also, in $I G U_{2}$, the main memory has $p_{2}=q-1$ inputs and $q-2$ outputs, while the AUX memory has $q-2$ inputs and $n-q+1$ outputs. Therefore, the total amount of memory is
$M_{2}=q \cdot 2^{q+1}+(n-q-1) \cdot 2^{q}+(q-2) \cdot 2^{q-1}+(n-q+1) \cdot 2^{q-2}=(5 n+5 q-7) \cdot 2^{q-2}$.
This implies that, when $n \leq 7 \log _{2}(k+1)-1$, the super hybrid method requires a smaller amount of memory.

Theorem 10.6.1. 1. In a typical hybrid method, about $88 \%$ of the registered vectors can be realized by an IGU.
2. In a typical super hybrid method, about $96 \%$ of the registered vectors can be realized by two IGUs.

## Proof. Hybrid Method

In this case, the number of inputs to the main memory is $p=\left\lceil\log _{2}(k+1)\right\rceil+2$. Thus, $2^{p} \geq 4(k+1)$ and $\xi=\frac{k}{2^{p}} \leq \frac{k}{4(k+1)} \simeq \frac{1}{4}$. From Corollary 10.5.1, the fraction of registered vectors realized by the IGU is

$$
\delta=\frac{1-\mathrm{e}^{-\xi}}{\xi} .
$$

When $\xi=\frac{1}{4}$, we have $\delta=4\left(1-\mathrm{e}^{-0.25}\right)=0.8848$.

## Super Hybrid Method

In this case, $p_{1}=\left\lceil\log _{2}(k+1)\right\rceil+1$. Thus, $2^{p_{1}} \geq 2(k+1)$ and $\xi_{1} \leq \frac{k}{2(k+1)} \simeq \frac{1}{2}$. When $\xi_{1}=\frac{1}{2}$, we have $\delta_{1}=4\left(1-\mathrm{e}^{-0.5}\right)=0.7869$. Thus, the fraction of remaining vectors is 0.213 .

Note that $p_{2}=\left\lceil\log _{2}(k+1)\right\rceil-1$. Thus, $2^{p_{2}} \geq \frac{1}{2}(k+1)$, and $\xi_{2} \leq \frac{k_{2}}{2^{p_{2}}} \simeq$ $\frac{0.213 k}{0.5(k+1)} \simeq 0.426$. When, $\xi_{2}=0.426$, we have $\delta_{2}=\frac{1-\mathrm{e}^{-0.426}}{0.426}=0.8142$.

Thus, the total number of registered vectors realized by $I G U_{1}$ and $I G U_{2}$ is

$$
0.7896 \mathrm{k}+0.8142 \times 0.2134 \mathrm{k}=0.9603 \mathrm{k}
$$

Thus, we have the theorem.
Example 10.6.1. Consider the index generation function with $n=40$ and $k=$ 1730. In this case, $q_{1}=\left\lceil\log _{2}(k+1)\right\rceil=\left\lceil\log _{2}(1730+1)\right\rceil=11$.

## Rewritable PLA

The number of vectors realized by the rewritable PLA is 1730 .

## Hybrid Method

The main memory has $p=q_{1}+2=13$ inputs and $q_{1}=11$ outputs. The AUX memory has $q_{1}=11$ inputs and $r=n-p=27$ outputs. Since, $\xi=\frac{1730}{2^{13}}$, from Corollary 10.5.1, the fraction of registered vectors of $f$ that can be realized by the IGU is

$$
\delta \simeq \frac{1-\mathrm{e}^{-\xi}}{\xi}=0.901
$$

Thus, the number of vectors realized by the $I G U$ is $1730 \times 0.901=1599$, and the number of remaining vectors to be realized by the rewritable PLA is 171 .

The size of the main memory is $2^{13} \times 11=90,112$ bits. The size of the AUX memory is $2^{11} \times 27=55,296$ bits. Thus, the total amount of memory is 145,408 bits.

## Super Hybrid Method

The first main memory has $p_{1}=q_{1}+1=12$ inputs and $q_{1}=11$ outputs. The first AUX memory has $q_{1}=11$ inputs and $r_{1}=n-p_{1}=27$ outputs.

Since, $\xi_{1}=\frac{1730}{2^{12}}=0.422$, from Corollary 10.5.1, the fraction of registered vectors of $f$ that can be realized by $I G U_{1}$ is

$$
\delta_{1} \simeq \frac{1-\mathrm{e}^{-\xi_{1}}}{\xi_{1}}=0.8156
$$

Thus, the number of vectors realized by $I G U_{1}$ is $1730 \times 0.8156=1411$, and the number of the remaining vectors is $1730-1411=319$.

The second main memory has $p_{2}=q_{1}-1=10$ inputs and $q_{2}=9$ outputs. The second AUX memory has $q_{2}=9$ inputs and $r_{2}=n-p_{2}=30$ outputs. Since $\xi_{2}=\frac{319}{2^{p_{2}}}=0.3115$, the fraction of registered vectors of $f$ that can be realized by $I G U_{2}$ is

$$
\delta_{2} \simeq \frac{1-\mathrm{e}^{-\xi_{2}}}{\xi_{2}}=0.859223 .
$$

Thus, the number of vectors realized by $I G U 2$ is $319 \times 0.859223=274$, and the number of the remaining vectors is 45 .

The size of the first main memory is $2^{12} \times 11=45,056$ bits. The size of the first AUX memory is $2^{11} \times 28=57,344$ bits. The size of the second main memory is $2^{10} \times 9=9,216$ bits. The size of the second AUX memory is $2^{9} \times 30=15,360$ bits. Thus, the total amount of memory is 126,976 bits. The number of vectors realized by the rewritable PLA is 45 .

Thus, for this problem, the super hybrid method requires a smaller amount of memory than the hybrid method.

A problem with the super hybrid method is that the second main memory has only $q-2$ outputs. Thus, the indices of the registered vectors in the second main memory should be smaller than or equal to $2^{q-2}-1$. The first main memory stores registered vectors whose indices are greater than $2^{q-2}$.

### 10.7 Parallel Sieve Method

The hybrid method uses only one IGU, while the super hybrid method uses two IGUs. By increasing the number of IGU's, we have the parallel sieve method. The parallel sieve method is especially useful when the number of the registered vectors is very large [85] (Fig. 10.11).


Fig. 10.10 Index generator unit (IGU)


Fig. 10.11 Index generator implemented by parallel sieve method

Definition 10.7.1. The parallel sieve method is an implementation of an index generation function using the circuit consisting of multiple IGUs as shown in Fig. 10.11. $I G U_{i+1}$ is used to realize a part of the registered vectors not realized by $I G U_{1}, I G U_{2}, \ldots$, or $I G U_{i}$. The OR gate in the output combines the indices to form a single output. In the standard parallel sieve method, the number of inputs to the main memory is selected as

$$
p_{i}=\left\lceil\log _{2}\left(k_{i}+1\right)\right\rceil .
$$

Example 10.7.1. By using the standard parallel sieve method, realize an index generation function with $n=40$ and $k_{1}=10,000$. Note that $q_{1}=\left\lceil\log _{2}\left(k_{1}+1\right)\right\rceil=14$. Consider Fig. 10.11.

1. In $I G U_{1}$, the number of inputs for the main memory is $p_{1}=q_{1}=14$.

By Theorem 10.5.1, the number of the vectors realized by $I G U_{1}$ is $2^{p_{1}}(1-$ $\mathrm{e}^{-\xi_{1}}$ ) where $\xi_{1}=\frac{k_{1}}{2^{p 1}}$, which is $16384 \times(1-0.5432)=7484$. The number of remaining vectors is $k_{2}=k_{1}-7484=2516$.
2. In $I G U_{2}$, since $q_{2}=\left\lceil\log _{2}(2516+1)\right\rceil=12$, the number of the inputs for the main memory is $p_{2}=q_{2}=12$. The number of the vectors realized by $I G U_{2}$ is $2^{p_{2}}\left(1-\mathrm{e}^{-\xi_{2}}\right)$ where $\xi_{2}=\frac{k_{2}}{2^{p^{2}}}$, which is $4096 \times 0.4589=1879$. The number of remaining vectors is $k_{3}=k_{2}-1879=637$.
3. In $I G U_{3}$, since $q_{3}=\left\lceil\log _{2}(637+1)\right\rceil=10$, the number of inputs for the main memory is $p_{3}=q_{3}=10$. The number of vectors realized by $I G U_{3}$ is $2^{p_{3}}\left(1-\mathrm{e}^{-\xi_{3}}\right)$ where $\xi_{3}=\frac{k_{3}}{2^{p^{3}}}$, which is $1024 \times 0.46317=474$. The number of remaining vectors is $k_{4}=k_{3}-474=163$.
4. In $I G U_{4}$, since $q_{4}=\left\lceil\log _{2}(163+1)\right\rceil=8$, the number of inputs for the main memory is $p_{4}=q_{4}=8$. The number of vectors realized by $I G U_{4}$ is $2^{p_{4}}(1-$ $\mathrm{e}^{-\xi_{4}}$ ) where $\xi_{4}=\frac{k_{4}}{2^{p 4}}$, which is $256 \times 0.46317=120$. The number of remaining vectors is $k_{4}=k_{3}-120=43$.
5. In $I G U_{5}$, since $q_{5}=\left\lceil\log _{2}(43+1)\right\rceil=6$, the number of inputs for the main memory is $p_{5}=q_{5}=6$. The number of vectors realized by $I G U_{5}$ is $2^{p_{5}}(1-$ $\mathrm{e}^{-\xi_{5}}$ ) where $\xi_{5}=\frac{k_{5}}{2 p 5}$, which is $64 \times 0.48925=31$. The number of remaining vectors is $k_{5}=k_{4}-31=12$.
6. In $I G U_{6}$, since the number of the remaining vectors is only $k_{5}=12$, they can be implemented by an IGU [132], or rewritable PLA or an LUT cascade.

Note that, for each $I G U_{i}$, the main memory has $p_{i}$ inputs and $p_{i}$ outputs, while the AUX memory has $p_{i}$ inputs and $\left(n-p_{i}\right)$ outputs. Thus, the total amount of memory for $I G U_{i}$ is

$$
p_{i} 2^{p_{i}}+\left(n-p_{i}\right) 2^{p_{i}}=n 2^{p^{i}} .
$$

The amount of memory for each $I G U_{i}$ is:

$$
\begin{aligned}
& I G U_{1}: 40 \times 2^{14}=640 \times 2^{10} \\
& I G U_{2}: 40 \times 2^{12}=160 \times 2^{10} . \\
& I G U_{3}: 40 \times 2^{10} . \\
& I G U_{4}: 40 \times 2^{8}=10 \times 2^{10} . \\
& I G U_{5}: 40 \times 2^{6}=2.5 \times 2^{10} .
\end{aligned}
$$

The total amount of memory for the standard parallel sieve method is

$$
\sum_{i=1}^{5} n 2^{n_{i}}=640+160+40+10+2.5=852.5
$$

Kibits.

### 10.8 Experimental Results

### 10.8.1 List of English Words

To demonstrate the usefulness of the design method, first we realized lists of frequently used English words by the hybrid method shown in Fig. 10.6 and the super hybrid method shown in Fig. 10.9. Here, we use three kinds of English word lists: List 1, List 2, and List 3. The numbers of letters in the word lists are at most 13, but we only consider the first 8 letters. For the English words consisting of fewer than 8 letters, we append blanks to the end of words to make them 8 -letter words. Each English alphabet letter is represented by 5 bits. Thus, each English word is represented by 40 bits. The numbers of words in the lists are $1,730,3,366$, and 4,705 , respectively. Within each word list, each English word has a unique index, an integer from 1 to $k$, where $k=1,730$ or 3,360 or 4,705 . The numbers of bits for the indices are 11,12 , and 13 , respectively.

The number of inputs for the main memory is $\left\lceil\log _{2}(k+1)\right\rceil+2$. List 1 consists of $k=1730$ words. The number of bits for the index is $q=\left\lceil\log _{2}(1+k)\right\rceil=$ $\left\lceil\log _{2}(1+1730)\right\rceil=11$. The number of bound variables is $p=q+2=13$. The number of columns in the decomposition chart is $2^{p}=2^{13}=8,192$. The number of columns that has only one nonzero element is 1,389 . The number of columns that has two or more nonzero elements is 165 . The number of registered vectors that are not realized by the main memory is 176 . In other words, about $90 \%$ of the registered vectors are realized by the main memory, and the remaining $10 \%$ of the registered vectors are realized by the rewritable PLA. Table 10.15 shows the design results for three English word lists by the hybrid method shown in Fig. 10.6.

Table 10.15 compares the amount of hardware for the hybrid method, and the super hybrid method. In the super hybrid method, the number of vectors realized by the rewritable PLA is smaller than $4 \%$ of the registered vectors. This is because we optimized the hash functions.

Table 10.15 Realization of English word lists by hybrid method

|  | List 1 | List 2 | List 3 |
| :--- | ---: | ---: | ---: |
| \# of words: $k$ | 1,730 | 3,366 | 4,705 |
| \# of inputs: $n$ | 40 | 40 | 40 |
| \# of outputs: $q$ | 11 | 12 | 13 |
| \# of inputs for the main $p$ <br> memory : | 13 | 14 | 15 |
| \# of columns with only one <br> nonzero element | 1389 | 2752 | 3980 |
| \# of columns with two or <br> more nonzero elements | 165 | 293 | 351 |
| \# of registered vectors not <br> realized by main memory | 176 | 321 | 374 |

### 10.8.2 Randomly Generated Functions

Next, we generated index generation functions with the same sizes by pseudorandom numbers. We did the similar experiments for List 2 and List 3. The experimental results using randomly generated functions and English word lists are close to the theoretical results obtained in Sect. 10.5. This shows that the hash function generated by the hash network effectively scatters the nonzero elements in the decomposition charts.

### 10.8.3 IP Address Table

To verify the effectiveness of the method, we also used IP addresses of computers that accessed our web site in a certain period. List 1 contains 1,730 addresses, List 2 contains 3,366 addresses, and List 3 contains 4,588 addresses. The number of inputs are all 32 , but the number of outputs for Lists $1-3$ are 11,12 , and 13 , respectively. Also, in this case, results produced by the real address tables, the data obtained from the random address tables, and the data obtained by analytical results in Sect. 10.5 were similar (Table 10.16). (Experimental results are omitted.)

Table 10.16 Amount of hardware for English word lists

| Size of Lists |  | List 1 | List 2 | List 3 |
| :---: | :---: | :---: | :---: | :---: |
| \# of inputs | $n$ | 40 | 40 | 40 |
| \# of outputs | $q$ | 11 | 12 | 13 |
| \# of vectors | $k$ | 1,730 | 3,366 | 4,705 |
| Hybrid Method |  | List 1 | List 2 | List 3 |
| \# of inputs for main memory | $p$ | 13 | 14 | 15 |
| Size of main memory | $q 2^{p}$ | 90,112 | 196,608 | 425,984 |
| Size of AUX memory | $r 2^{q}$ | 55,296 | 106,496 | 204,800 |
| Total amount of memory |  | 145,408 | 303,104 | 630,784 |
| \# of remaining vectors |  | 176 | 321 | 374 |
| Super Hybrid Method |  | List 1 | List 2 | List 3 |
| \# of inputs for main memory 1 | $p_{1}$ | 12 | 13 | 14 |
| \# of inputs for main memory 2 | $p_{2}$ | 10 | 11 | 12 |
| Size of main memory 1 | $q_{1} 2^{p_{1}}$ | 45,056 | 98,304 | 212,992 |
| Size of AUX memory 1 | $r_{1} 2^{q_{1}}$ | 57,344 | 110,592 | 212,992 |
| Size of main memory 2 | $q_{2} 2^{p_{2}}$ | 9,216 | 20,480 | 40,960 |
| Size of AUX memory 2 | $r_{2} 2^{q_{2}}$ | 15,360 | 2,969 | 28,672 |
| Total amount of memory |  | 126,976 | 232,345 | 495,616 |
| \# of remaining vectors |  | 30 | 61 | 42 |

### 10.9 Remarks

This chapter presented the hybrid method, the super hybrid method, and the parallel sieve method to realize index generation functions. In these methods, an index generation function $f$ is decomposed into nonoverlapping index generation functions, and each function is realized by an IGU or a rewritable PLA. In this chapter, a rewritable PLA is used to realize the remaining vectors. However, other methods can also be used: a CAM, an LUT cascade, or an IGU implemented by the method shown in Chap. 11. This chapter is based on [126, 128, 136].

## Problems

10.1. In Example 10.1.1, suppose that, in a company with 5,000 employees, each person has a unique employee number between 1 and 5,000 inclusive. Suppose that $\operatorname{hash}(x)=x(\bmod 9973)$ is used to find the employee number from his or her 10digit telephone number. Calculate the expected number of collisions in the hash table. Do the same calculation when the number of the employees is 2,000, instead of 5,000. Assume that the hash function produces a uniform distribution.
10.2. Let $f(x)$ be the index generation function, where $f(x)=i$ when $x$ is the $i$-th prime number, and $f(x)=0$ otherwise. Let $\pi(x)$ be the prime-counting function that gives the number of primes less than or equal to x , for any integer number x . For example, $\pi(8)=4$ because there are four prime numbers ( $2,3,5$, and 7 ) less than or equal to 8 . It is known that $\pi(100,000)=9,592$.

Design the circuit of $f(x)$ that works for $x \leq 100,000$, by the standard parallel sieve method. Estimate the size of the circuit, and compare it with the singlememory realization.
10.3. Design an 8 -digit ternary-to-binary converter [119]. Use the binary-codedternary code to represent a ternary digit. That is, 0 is represented by ( 00 ); 1 is represented by ( 01 ); and 2 is represented by (10). (11) is an unused code. Let $\vec{y}=\left(y_{m-1}, y_{m-2}, \ldots, y_{0}\right)$ be the outputs of the converter, where $y_{i} \in\{0,1\}$. Then, in general, $y_{i}$ depends on all the inputs $x_{i}(i=0,1, \ldots, n-1)$. When this converter is implemented by a two-valued logic circuit, unused combinations occur. So, we have an incompletely specified function. For example, the truth table of the 2 -digit ternary to 4 -bit binary converter is shown in Table 10.17. In the case of binary-coded-ternary representation, (11) is an undefined input, and the corresponding output is a don't care. In Table 10.17, the binary-coded-ternary representation is denoted by $\vec{w}=\left(w_{3}, w_{2}, w_{1}, w_{0}\right)$, the ternary representation is denoted by $\vec{x}=\left(x_{1}, x_{0}\right)$, and the binary representation is denoted by $\vec{y}=\left(y_{3}, y_{2}, y_{1}, y_{0}\right)$. Design a converter for an 8 -digit ternary number to a 13 -digit binary number by the standard parallel sieve method. Compare the memory size with that of the singlememory realization.

Table 10.17 Truth table for a ternary-to-binary converter

| Binary-coded Ternary |  |  |  | Ternary |  | Binary |  |  |  | Decimal |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $w_{3}$ | $w_{2}$ | $w_{1}$ | $w_{0}$ | $x_{1}$ | $x_{0}$ | $y_{3}$ | $y_{2}$ | $y_{1}$ | $y_{0}$ |  |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 |
| 0 | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 1 | 0 | 2 |
| 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 3 |
| 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 4 |
| 0 | 1 | 1 | 0 | 1 | 2 | 0 | 1 | 0 | 1 | 5 |
| 1 | 0 | 0 | 0 | 2 | 0 | 0 | 1 | 1 | 0 | 6 |
| 1 | 0 | 0 | 1 | 2 | 1 | 0 | 1 | 1 | 1 | 7 |
| 1 | 0 | 1 | 0 | 2 | 2 | 1 | 0 | 0 | 0 | 8 |

Table 10.18 1-out-of-15 to binary converter

| 1-out-of-15 code |  |  |  |  |  |  |  |  |  |  |  |  |  |  | Index |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $x_{15}$ | $x_{14}$ | $x_{13}$ | $x_{12}$ | $x_{11}$ | $x_{10}$ | $x_{9}$ | $x_{8}$ | $x_{7}$ | $x_{6}$ | $x_{5}$ | $x_{4}$ | $x_{3}$ | $x_{2}$ | $x_{1}$ |  |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 4 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 5 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 6 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 7 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 |
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 |
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10 |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 11 |
| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 12 |
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 |
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 15 |

10.4. A linear transformation of the input variables, in general, changes a function into an another function. However, it does not change the weight of the function. Prove this.
10.5. In the proof of Lemma $10.5 .1, \beta^{2^{n-p}}$ is approximated by $\mathrm{e}^{-\xi}$, where $\xi=\frac{k}{2^{p}}$, $\beta=1-\alpha$, and $\alpha=\frac{k}{2^{n}}$. When $k=2^{p}$, compute the approximation error: $E R R O R=\mathrm{e}^{-\xi}-\beta^{2^{n-p}}$. Make a table similar to Table 3.1 for $\alpha=$ $2^{-1}, 2^{-2}, 2^{-3}, \ldots, 2^{-14}$, and $2^{-15}$.
10.6. Consider the 15 -variable incompletely specified index generation function $f(X)$ shown in Table 10.18. Show that at least 14 variables are necessary to represent the function. Next, consider the linear transformation:

$$
\begin{aligned}
& y_{1}=x_{1} \oplus x_{3} \oplus x_{5} \oplus x_{7} \oplus x_{9} \oplus x_{11} \oplus x_{13} \oplus x_{15} \\
& y_{2}=x_{2} \oplus x_{3} \oplus x_{6} \oplus x_{7} \oplus x_{10} \oplus x_{11} \oplus x_{14} \oplus x_{15} \\
& y_{3}=x_{4} \oplus x_{5} \oplus x_{6} \oplus x_{7} \oplus x_{12} \oplus x_{13} \oplus x_{14} \oplus x_{15} \\
& y_{4}=x_{8} \oplus x_{9} \oplus x_{10} \oplus x_{11} \oplus x_{12} \oplus x_{13} \oplus x_{14} \oplus x_{15}
\end{aligned}
$$

Show that $f$ can be represented with $y_{1}, y_{2}, y_{3}$, and $y_{4}$.

## Chapter 11 <br> Reduction of the Number of Variables

This chapter considers a method to reduce the number of variables needed to represent incompletely specified logic functions. When the number of specified minterms is $k$, most functions can be represented with at most $2\left\lceil\log _{2}(k+1)\right\rceil-2$ variables.

### 11.1 Optimization for Incompletely Specified Functions

For completely specified logic functions, logic minimization is a process of reducing the number of products to represent the given function. However, for incompletely specified functions (i.e., functions with don't cares), at least two problems exist [46]. The first is to reduce the number of the products to represent the function, and the second is to reduce the number of variables. The first problem is useful for sum-of-products expression (SOP)-based realizations [13], while the second problem is useful for memory-based realizations, since reducing the number of variables reduces the memory size.

Example 11.1.1. Consider the 4-variable function shown in Fig. 11.1, where the blank cells denote don't cares. The SOP with the minimum number of products is $\mathcal{F}_{1}=x_{1} x_{4} \vee x_{2} \bar{x}_{3}$, while the SOP with the minimum number of variables is $\mathcal{F}_{2}=x_{1} x_{2} \vee x_{1} x_{4} \vee x_{2} x_{4}$. Note that $\mathcal{F}_{1}$ shown in Fig. 11.2 has two products and depends on four variables, while $\mathcal{F}_{2}$ shown in Fig. 11.3 has three products and depends on only 3 variables. $x_{3}$ is a nonessential variable, since $\mathcal{F}_{2}$ does not include it.

As shown in this example, the expression corresponding to the minimal number of products is different from the expression corresponding to the minimal number of variables. This chapter considers the minimization of the number of variables in incompletely specified functions. Indeed, it is shown that many variables can be eliminated when the fraction of don't cares is large.

Fig. 11.1 4-variable incompletely specified logic function


Fig. 11.2 Expression with the fewest products


Fig. 11.3 Expression with the fewest variables


### 11.2 Definitions and Basic Properties

Definition 11.2.1. An incompletely specified logic function $f$ is a mapping $D \rightarrow B$, where $D \subset B^{n}, B=\{0,1\}$.

Definition 11.2.2. An incompletely specified logic function is represented by a pair of characteristic functions $F_{0}$ and $F_{1}$, where $F_{0}(\vec{a})=1$ iff $f(\vec{a})=0$, and $F_{1}(\vec{a})=1$ iff $f(\vec{a})=1$. Note that $F_{0} F_{1}=0$. If $\vec{a} \in D$, then the value of $f(\vec{a})$ is specified, and is called care value. Otherwise, the value of $f(\vec{a})$ is unspecified, and is called don't care.

Table 11.1 Function for
Fig. 11.1

| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $f$ |
| :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | 1 | 0 |
| 0 | 1 | 1 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 1 | 1 |
| 1 | 0 | 0 | 1 | 1 |
| 1 | 1 | 0 | 0 | 1 |

Example 11.2.1. Consider the function in Fig. 11.1. In this case $n=4$. Table 11.1 also shows this function. The characteristic functions are

$$
\begin{aligned}
& F_{0}=\bar{x}_{1} \bar{x}_{2} \bar{x}_{3} x_{4} \vee \bar{x}_{1} x_{2} x_{3} \bar{x}_{4} \vee x_{1} \bar{x}_{2} \bar{x}_{3} \bar{x}_{4} \text { and } \\
& F_{1}=\bar{x}_{1} x_{2} \bar{x}_{3} x_{4} \vee x_{1} \bar{x}_{2} \bar{x}_{3} x_{4} \vee x_{1} x_{2} \bar{x}_{3} \bar{x}_{4} .
\end{aligned}
$$

In this case, the function is specified for only 6 out of 16 possible minterms.
Definition 11.2.3. $f$ depends on $x_{i}$ if there exists a pair of vectors

$$
\begin{aligned}
\vec{a} & =\left(a_{1}, a_{2}, \ldots, a_{i}, \ldots, a_{n}\right) \text { and } \\
\vec{b} & =\left(a_{1}, a_{2}, \ldots, b_{i}, \ldots, a_{n}\right)
\end{aligned}
$$

such that both $f(\vec{a})$ and $f(\vec{b})$ are specified, and $f(\vec{a}) \neq f(\vec{b})$.
If $f$ depends on $x_{i}$, then $x_{i}$ is essential in $f$ and $x_{i}$ must appear in every expression for $f$.

Definition 11.2.4. Two functions $f$ and $g$ are compatible when the following condition holds: For any $\vec{a} \in B^{n}$, if both $f(\vec{a})$ and $g(\vec{a})$ are specified, then $f(\vec{a})=g(\vec{a})$.

Lemma 11.2.1. Let $f_{0}=f\left(\mid x_{i}=0\right)$ and $f_{1}=f\left(\mid x_{i}=1\right)$. Then, $x_{i}$ is nonessential in $f$ iff $f_{0}$ and $f_{1}$ are compatible.

If $x_{i}$ is nonessential in $f$, then $f$ can be represented by an expression without $x_{i}$.
Example 11.2.2. Consider the function $f$ in Fig. 11.4. It is easy to verify that all the variables are nonessential. Note that $f$ can be represented as $\mathcal{F}_{1}=\bar{x}_{2} \vee x_{3}$ or $\mathcal{F}_{2}=x_{1} \oplus \bar{x}_{4}$.

Essential variables must appear in every expression for $f$, while nonessential variables may appear in some expressions and not in others. Algorithms to represent a given function by using the minimum number of variables have been considered [14, 32, 38, 46, 67].

Fig. 11.4 4-variable function without essential variables


### 11.3 Algorithm to Minimize the Number of Variables

This section describes an algorithm to represent an incompletely specified index generation function $f: D \rightarrow\{1,2, \ldots, k\}$, where $D \subset B^{n}$, using the minimum number of variables. To show the idea of the method, we use the following:
Example 11.3.1. Let us minimize the number of variables to represent the index generation function shown in Fig. 11.5.

1. Let the four vectors be $\vec{a}_{1}=(1,0,0,1), \vec{a}_{2}=(1,1,1,1), \vec{a}_{3}=(0,1,0,1)$, and $\vec{a}_{4}=(1,1,0,0)$.
2. To distinguish $\vec{a}_{1}$ and $\vec{a}_{2}$, either $x_{2}$ or $x_{3}$ is necessary. Thus, we have the condition $x_{2} \vee x_{3}=1$, where $x_{1}=1$ denotes that $x_{1}$ must appear in the expression. Thus, $x_{2} \vee x_{3}=1$ denotes either $x_{2}$ or $x_{3}$ must appear in the expression. In the same way, to distinguish $\vec{a}_{1}$ and $\vec{a}_{3}$, we have the condition $x_{1} \vee x_{2}=1$; to distinguish $\vec{a}_{1}$ and $\vec{a}_{4}$, we have the condition $x_{2} \vee x_{4}=1$; to distinguish $\vec{a}_{2}$ and $\vec{a}_{3}$, we have the condition $x_{1} \vee x_{3}=1$; to distinguish $\vec{a}_{2}$ and $\vec{a}_{4}$, we have the condition $x_{3} \vee x_{4}=1$; and to distinguish $\vec{a}_{3}$ and $\vec{a}_{4}$, we have the condition $x_{1} \vee x_{4}=1$.
3. To distinguish all the vectors, all the conditions must hold at the same time. This is expressed by the condition $R=1$, where

$$
R=\left(x_{2} \vee x_{3}\right)\left(x_{1} \vee x_{2}\right)\left(x_{2} \vee x_{4}\right)\left(x_{1} \vee x_{3}\right)\left(x_{3} \vee x_{4}\right)\left(x_{1} \vee x_{4}\right) .
$$

4. By the distributive law, and the absorption law, we have

$$
R=x_{1} x_{2} x_{4} \vee x_{1} x_{2} x_{3} \vee x_{2} x_{3} x_{4} \vee x_{1} x_{3} x_{4} .
$$

5. Since every product has three literals, each corresponds to a minimum solution. Thus, $f$ can be represented by 3 variables. Since no variable appears in all product terms, no variable is essential.

In principle, the above method produces the minimum number of variables to represent an incompletely specified index generation function. However, the straightforward application is quite inefficient. Also, we have an efficient minimization algorithm for SOPs, but do not have one for product-of-sums expressions. Thus, instead of obtaining $R$ directly, first we obtain $\bar{R}$, the complement of $R$, and perform simplification, and then convert $\bar{R}$ into the SOP for $R$ as follows:

Fig. 11.5 Index generation function with 4 variables


Fig. 11.6 7-segment display
Algorithm 11.3.1. (Algebraic Method)

1. Let $A$ be the set of vectors $\vec{a}_{i}$, such that $f\left(\vec{a}_{i}\right)=i$, where $i=1,2, \ldots, k$.
2. For each pair of vectors $\vec{a}_{i}=\left(a_{1}, a_{2}, \ldots, a_{n}\right) \in A$ and $\vec{b}_{j}=\left(b_{1}, b_{2}, \ldots, b_{n}\right) \in$ $A$, associate a product defined by $s(i, j)=\bigwedge_{r=1}^{n} y_{r}$, where $y_{r}=1$ if $a_{r}=b_{r}$ and $y_{r}=\bar{x}_{r}$ if $a_{r} \neq b_{r}$, where $r=1,2, \ldots, n$. Note that there are $k(k-1) / 2$ pairs.
3. Define a covering function $\bar{R}=\bigvee_{i<j} s(i, j)$.
4. Represent $\bar{R}$ by the a minimum $S O P$.
5. Represent $R$, the complement of $\bar{R}$ by a minimum $S O P$.
6. The product with the fewest literals corresponds to the minimum solution.

In Algorithm 11.3.1, Steps 4, 5, and 6 compute a minimum covering. Since $\bar{R}$ has only complemented literals, we generate only products with complemented literals. Applying absorption law yields the minimum SOP for $\bar{R}$.

By first detecting the essential variables, we can reduce the computational effort to derive the covering function. The next example illustrates this.

Example 11.3.2. The 7-segment display shown in Fig. 11.6 displays a decimal number by using 7 segments: $\mathrm{a}, \mathrm{b}, \mathrm{c}, \mathrm{d}, \mathrm{e}, \mathrm{f}$, and g .

Table 11.2 shows the correspondence between segment data and the binary number. Consider a logic circuit that converts 7 -segment data into the corresponding

Table 11.2 7-segment to BCD converter

| 7-segment |  |  |  |  |  |  | BCD code |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $a$ | $b$ | c | $d$ | $e$ | $f$ | $g$ | 8 | 4 | 2 | 1 |
| 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
| 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 |  |

Binary Coded Decimal (BCD) representation of a digit. The straightforward circuit requires 7 inputs. However, only 5 inputs are necessary to distinguish the decimal numbers. This means that only 5 segments are needed to distinguish between the 10 digits.

1. Let the vectors be

$$
\begin{aligned}
& \vec{a}_{1}=(0,1,1,0,0,0,0), \vec{a}_{2}=(1,1,0,1,1,0,1), \vec{a}_{3}=(1,1,1,1,0,0,1), \\
& \vec{a}_{4}=(0,1,1,0,0,1,1), \vec{a}_{5}=(1,0,1,1,0,1,1), \vec{a}_{6}=(1,0,1,1,1,1,1), \\
& \vec{a}_{7}=(1,1,1,0,0,0,0), \vec{a}_{8}=(1,1,1,1,1,1,1), \vec{a}_{9}=(1,1,1,1,0,1,1), \\
& \text { and } \\
& \vec{a}_{10}=(1,1,1,1,1,1,0) .
\end{aligned}
$$

2. First, find the essential variables.

From $\vec{a}_{6}$ and $\vec{a}_{8}$, we can see that $b$ is essential. From $\vec{a}_{8}$ and $\vec{a}_{9}$, we can see that $e$ is essential. From $\vec{a}_{3}$ and $\vec{a}_{9}$, we can see that $f$ is essential. From $\vec{a}_{8}$ and $\vec{a}_{10}$, we can see that $g$ is essential.
3. Next, we derive $\bar{R}$. Since $b, e, f$, and $g$ are essential, we can ignore the pairs, where the essential variables are inconsistent. For example, from the pair $\left(\vec{a}_{1}, \vec{a}_{2}\right)$, we have the product $\bar{a} \bar{c} \bar{d} \bar{e} \bar{g}$. Note that, in this case, two vectors are inconsistent with the essential variable $g$. Since the essential variable $g$ is always included in the solution, we know that $g=1$. Thus, we need not generate it. From $\vec{a}_{4}$ and $\vec{a}_{9}$, we have $\bar{a} \bar{d}$. Thus, $\bar{R}=\bar{b} \vee \bar{e} \vee \bar{g} \vee \bar{f} \vee \bar{a} \bar{d}$.
4. By using De Morgan's law, and the distributive law, we have $R=\operatorname{befg}(a \vee d)=$ abef $g \vee b d e f g$.
5. Since each product has five literals, each corresponds to a minimum solution. Thus, the binary numbers can be represented by 5 variables.

Thus, we can eliminate either segments $c$ and $d$, or $a$ and $c$, and still determine which digit is being represented.

### 11.4 Analysis for Single-Output Logic Functions

This section derives the number of variables to represent single-output incompletely specified logic functions. In the analysis that follows, we consider a set of functions (e.g., all incompletely specified functions) restricted by conditions (e.g. the number of care values is $2 u$ ).

Definition 11.4.1. A set of functions is uniformly distributed, if the probability of occurrence of any function is the same as any other function.

For example, the set of 4-variable incompletely specified functions with 1 care value consists of 32 members, 16 having a single 1 and 16 having a single 0 . If the functions are uniformly distributed, the probability of the occurrence of any one of them is $\frac{1}{32}$.
Theorem 11.4.1. Consider a set of uniformly distributed incompletely specified functions, where $u$ combinations are mapped to 0 , $u$ combinations mapped to 1 , and the other $2^{n}-2 u$ combinations are mapped to don't cares. Let $\eta$ be the probability that $f\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ can be represented by using only $x_{1}, x_{2}, \ldots, x_{p-1}$, and $x_{p}$, where $p<n$. Then, $\eta \geq(1-\tilde{\alpha})^{u}$, where $\tilde{\alpha}=\frac{u}{2^{p}}$.

Proof. Let $f\left(X_{1}, X_{2}\right)$ be an incompletely specified function, where $X_{1}=\left(x_{1}\right.$, $\left.x_{2}, \ldots, x_{p}\right)$ and $X_{2}=\left(x_{p+1}, x_{p+2}, \ldots, x_{n}\right)$. Consider the decomposition chart of $f\left(X_{1}, X_{2}\right)$, where $X_{1}$ labels the columns, and $X_{2}$ labels the rows. If no column has both 0 and 1 , a completely specified function can be formed by setting all column entries to the same value, yielding a function independent of $X_{2}$. From here, we obtain the probability $\eta$.
Assume that $u 0$ 's are already distributed to the decomposition chart. Thus, at most $u$ columns have 0 's. Next, we distribute $u 1$ 's to the decomposition chart. The probability of distributing a single 1 to a column not containing 0 's is at least $\frac{2^{p}-u}{2^{p}}=1-\tilde{\alpha}$. Thus, the probability of distributing $u 1$ 's to the columns without 0 's is larger than or equal to $(1-\tilde{\alpha})^{u}$. Hence, we have the relation:

$$
\eta \geq(1-\tilde{\alpha})^{u} .
$$

Theorem 11.4.1 considers the probability for one partition: $X_{1}=\left(x_{1}\right.$, $\left.x_{2}, \ldots, x_{p}\right)$ and $X_{2}=\left(x_{p+1}, x_{p+2}, \ldots, x_{n}\right)$. However, in practice, we can select a minimum set of variables to represent the function. The following theorem considers such a case:

Theorem 11.4.2. Consider a set of uniformly distributed incompletely specified functions, where u combinations are mapped to 0 , $u$ combinations mapped to 1 , and the other $2^{n}-2 u$ combinations are mapped to don't cares. Then, the probability that $f\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ can be represented by using only $p$ variables is at least

$$
1-\sigma\binom{n}{p},
$$

where $\sigma=1-\eta$, and $\eta$ is the probability that a function can be represented by using only $x_{1}, x_{2}, \ldots, x_{p-1}$, and $x_{p}$.
Proof. The probability that a function cannot be represented by using $x_{1}, x_{2}, \ldots$, $x_{p-1}$, and $x_{p}$ is $\sigma=1-\eta$. Since there are $\binom{n}{p}$ ways to choose $p$ variables out of $n$ variables, the probability that a function cannot be represented by using any combinations of $p$ variables is $\sigma\binom{n}{p}$. The probability that a function can be presented by using at least one combination of $p$ variables is

$$
1-\sigma^{\binom{n}{p}} .
$$

From Theorem 11.4.2, we have the following:
Conjecture 11.4.1. Consider a set of uniformly distributed functions of $n$ variables, where $u$ combinations are mapped to $0, u$ combinations are mapped to 1 , and the other $2^{n}-2 u$ combinations are mapped to don't cares. If

$$
p \geq 2 \log _{2} u-2,
$$

then more than $95 \%$ of the functions can be represented with $p$ variables.
(Explanation supporting the Conjecture) Since $\sigma=1-\eta<1.0,1-\sigma\binom{n}{p}$ approaches 1.0 , as $n$ increases. When $p<n,\binom{n}{p} \geq n(n-1) / 2$. Assume that $n \geq 20$. The condition that $\sigma\binom{n}{p} \leq 0.05$ is $\sigma<0.984$. Thus, if $\eta \geq 0.0156$, then at least $95 \%$ of the functions can be realized with $p$ products. When $\tilde{\alpha}$ is sufficiently small, $1-\tilde{\alpha}$ is approximated by $\mathrm{e}^{-\tilde{\alpha}}$. Thus,

$$
\eta \geq(1-\tilde{\alpha})^{u} \simeq \mathrm{e}^{-\tilde{\alpha} u}=\mathrm{e}^{-\frac{u^{2}}{2 D}}
$$

When $p \geq 2 \log _{2} u-2$, we have $\eta>\mathrm{e}^{-4}=0.0183$. (End of explanation) From experimental results in Sect. 11.6, we have the following:
Conjecture 11.4.2. Consider a set of uniformly distributed functions of $n$ variables, where $u$ combinations are mapped to $0, u$ combinations are mapped to 1 , and the other $2^{n}-2 u$ combinations are mapped to don't cares. Then, the fraction of the functions represented with $p=2\left\lceil\log _{2} u\right\rceil-2$ variables approaches 1.0 , as $n$ increases.

### 11.5 Extension to Multiple-Output Functions

In practical applications, many functions have multiple outputs, and the outputs values are different for different inputs. So, we now consider such a class of functions. First, we consider the class of index generation functions, which are special case of multiple-output functions. Then, we extend the theory to general multipleoutput functions.

### 11.5.1 Number of Variables to Represent Index Generation Functions

This section derives the number of variables to represent an incompletely specified index generation function with $k$ registered vectors. The basic idea is as follows: a function $f\left(X_{1}, X_{2}\right)$ is represented by a decomposition chart, where $X_{1}$ labels the columns and $X_{2}$ labels the rows. If each column has at most one care element, then the function can be represented by using only variables in $X_{1}$. The next example illustrates this.

Example 11.5.1. Table 11.3 shows a registered vector table consisting of 6 vectors. When no entry matches the input vector, the function produces 0 . Consider the decomposition chart shown in Table 11.4. In Table 11.4, $x_{1}, x_{2}$, and $x_{3}$ specify the columns, and $x_{4}$ and $x_{5}$ specify the rows, and blank elements denote don't cares. Note that, in Table 11.4, each column has at most one care element. Thus, the function can be represented by only the column variables: $x_{1}, x_{2}$, and $x_{3}$.

From here, we obtain the probability of such a condition by a statistical analysis.
Theorem 11.5.1. Consider a set of uniformly distributed incompletely specified index generation functions $f\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ with weight $k$, where $2 \leq k<2^{n-2}$. Let $\eta(k)$ be the probability that $f$ can be represented with $x_{1}, x_{2}, \ldots$, and $x_{p}$, where $p<n$. Then,

$$
\begin{equation*}
\eta(k) \simeq \exp \left(-\frac{k^{2}}{2^{p+1}}\right) \tag{11.1}
\end{equation*}
$$

Proof. Let $\left(X_{1}, X_{2}\right)$ be a partition of the input variables $X$, where $X_{1}=\left(x_{1}\right.$, $\left.x_{2}, \ldots, x_{p}\right)$ and $X_{2}=\left(x_{p+1}, x_{p+2}, \ldots, x_{n}\right)$. Consider the decomposition chart for

Table 11.3 Registered vector table

| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $x_{5}$ | $f$ |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 1 | 0 | 0 | 1 |
| 0 | 1 | 0 | 0 | 1 | 2 |
| 0 | 1 | 1 | 1 | 0 | 3 |
| 1 | 0 | 0 | 1 | 1 | 4 |
| 1 | 0 | 0 | 1 | 1 | 5 |
| 1 | 1 | 1 | 1 | 0 | 6 |

Table 11.4 Decomposition chart for $f\left(X_{1}, X_{2}\right)$

|  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $x_{3}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  |  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $x_{2}$ |
|  |  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | $x_{1}$ |
| 0 | 0 |  | 1 |  |  |  |  |  |  |  |
| 0 | 1 |  |  | 2 |  |  | 5 |  |  |  |
| 1 | 0 |  |  |  | 3 |  |  | 6 |  |  |
| 1 | 1 |  |  |  |  | 4 |  |  |  |  |
| $x_{5}$ | $x_{4}$ |  |  |  |  |  |  |  |  |  |

$f\left(X_{1}, X_{2}\right)$, where $X_{1}$ labels the column variables and $X_{2}$ labels the row variables. If each column has at most one care element, then $f$ can be represented by using only $X_{1}$. Assume that $k$ care elements are distributed in the decomposition chart. Then, the probability that each column has at most one care element is

$$
\begin{aligned}
\eta(k) & =\frac{2^{p}}{2^{p}} \cdot \frac{2^{p}-1}{2^{p}} \cdot \frac{2^{p}-2}{2^{p}} \cdots \cdots \frac{2^{p}-(k-1)}{2^{p}} \\
& =1 \cdot\left(1-\frac{1}{2^{p}}\right) \cdot\left(1-\frac{2}{2^{p}}\right) \cdots \cdots\left(1-\frac{k-1}{2^{p}}\right) \\
& =\prod_{i=0}^{k-1}\left(1-\frac{i}{2^{p}}\right) .
\end{aligned}
$$

That is, in such a distribution, ' 1 ' can be placed in any column, ' 2 ' can be placed in any column except that for ' 1 ', etc.

Next, $\eta(k)$ can be approximated as follows:

$$
\begin{aligned}
\eta(k) & \simeq \prod_{i=0}^{k-1} \exp \left(-\frac{i}{2^{p}}\right)=\exp \left(-\sum_{i=1}^{k-1} \frac{i}{2^{p}}\right) \\
& =\exp \left(-\frac{k(k-1)}{2 \times 2^{p}}\right) \simeq \exp \left(-\frac{k^{2}}{2^{p+1}}\right)
\end{aligned}
$$

The above theorem shows the case when the input variables are removed without considering the property of the function. In practice, we can remove the maximum number of nonessential variables by an optimization program.

Theorem 11.5.2. Consider a set of uniformly distributed incompletely specified index generation functions $f\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ with weight $k$, where $2 \leq k<2^{n-2}$. The probability that $f$ can be represented with $p<n$ variables is greater than

$$
1-\sigma\binom{n}{p},
$$

where $\sigma=1-\eta(k)$, and $\eta(k)$ is the probability that $f$ can be represented with $x_{1}, x_{2}, \ldots, x_{p-1}$, and $x_{p}$.

The proof of Theorem 11.5.2 is similar to that of Theorem 11.4.2. From Theorem 11.5.2, we have the following:

Conjecture 11.5.1. Consider a set of uniformly distributed incompletely specified index generation functions with weight $k$. If $p \geq 2\left\lceil\log _{2}(k+1)\right\rceil-3$, then more than $95 \%$ of the functions can be represented with $p$ variables.

From experimental results, we have the following:
Conjecture 11.5.2. Consider a set of uniformly distributed incompletely specified index generation functions with weight $k$. Then, the fraction of the functions represented with $p=2\left\lceil\log _{2}(k+1)\right\rceil-3$ variables approaches 1.0 , as $n$ increases.

Note that there exist functions that require more than $p=2\left\lceil\log _{2}(k+1)\right\rceil-3$ variables, as shown below. However, the fraction of such functions approaches 0.0, as $n$ increases.

Example 11.5.2. Consider the $n$-variable incompletely specified index generation function $f$ with weight $k=n+1$ :

$$
\begin{aligned}
f(1,0,0, \ldots, 0,0) & =1 \\
f(0,1,0, \ldots, 0,0) & =2 \\
f(0,0,1, \ldots, 0,0) & =3 \\
\vdots & \vdots \\
f(0,0,0, \ldots, 1,0) & =n-1 \\
f(0,0,0, \ldots, 0,1) & =n \\
f(0,0,0, \ldots, 0,0) & =n+1 \\
f\left(a_{1}, a_{2}, a_{3}, \ldots, a_{n-1}, a_{n}\right) & =d \quad \text { (for other combinations). }
\end{aligned}
$$

In this function, all the variables are essential, and no variable can be removed.
Theorem 11.5.3. To represent an incompletely specified index generation function with weight $k$, at least $\left\lceil\log _{2}(k+1)\right\rceil$ variables are necessary.

Proof. To distinguish $k+1$ outputs, at least $\left\lceil\log _{2}(k+1)\right\rceil$ variables are necessary. Note that one output is used to show that there is no matched vector. Nonzero outputs denote registered vectors, while zero outputs denote nonregistered vectors.

### 11.5.2 Number of Variables to Represent General Multiple-Output Functions

Theorem 11.5.4. Let $F$ be an arbitrary $n$ variable $m$ output function, and let $D$ be a set of $k$ randomly selected vectors in $B^{n}$. Let $\hat{F}$ be an incompletely specified function defined on only $D$. The probability that $\hat{F}$ can be represented with $x_{1}, x_{2}, \ldots$, and $x_{p}$, for $p<n$, is $\eta(k)$, where $\eta(k)$ is defined in (11.1).

Proof. For each vector in $D$, assign a unique index in $\{1,2, \ldots, k\}$. From $D$, we can define an incompletely specified index generation function: $D \rightarrow\{1,2, \ldots, k\}$. Next, for each vector in $D$, obtain the output value of $F$, and make a truth table showing the function $\{1,2, \ldots, k\} \rightarrow B^{m}$. Note that this function can be
implemented by memory with $\left\lceil\log _{2}(k+1)\right\rceil$ inputs. Thus, the incompletely specified function $\hat{F}$ can be realized as the cascade connection of the index generation circuit and memory.

By Theorem 11.5.1, the index generation function can be represented with at most $p$ variables. Thus, the function $\hat{F}$ can be also represented with at most $p$ variables.

Similarly to Conjecture 11.5.2, we have the following:
Conjecture 11.5.3. When $k$ minterms are selected randomly, the fraction of multiple-output functions with only $k$ minterms that can be represented by at most $p=2\left\lceil\log _{2}(k+1)\right\rceil-3$ variables approaches 1.0 , as $n$ increases.

### 11.6 Experimental Results

### 11.6.1 Random Single-Output Functions

For different $n$, we randomly generated 1,000 functions, where $u$ combinations are mapped to $0, u$ combinations are mapped to 1 , and the other $2^{n}-2 u$ combinations are mapped to don't cares. We minimized the number of variables by an exact optimization algorithm, which is similar to Algorithm 11.3.1 shown in Sect. 11.3.

Table 11.5 shows the average numbers of variables to represent the single-output functions, where the set of variables are selected by the optimization algorithm. For example, 16 -variable functions where 15 minterms are mapped to zeros, 15 minterms are mapped to ones, and the other minterms are mapped to don't cares, require, on the average, only 5.157 variables to represent the functions.

Table 11.5 shows that the necessary number of variables to represent the functions mainly depends on $u$. The last column of the table shows the number of variables to represent incompletely specified functions by Conjecture 11.4.2. For

Table 11.5 Average numbers of variables to represent single-output logic functions with $u$ 1's and $u 0$ 's

| $u$ | $n=16$ | $n=20$ | $n=24$ | $2\left\lceil\log _{2}(u+1)\right\rceil-2$ |
| :--- | ---: | ---: | ---: | :--- |
| 7 | 3.334 | 3.145 | 3.017 | 4 |
| 15 | 5.157 | 4.981 | 4.940 | 6 |
| 31 | 7.126 | 6.980 | 6.003 | 8 |
| 63 | 9.179 | 8.972 | 8.861 | 10 |
| 127 | 11.362 | 10.971 | 10.776 | 12 |
| 255 | 13.754 | 12.990 | 12.725 | 14 |
| 511 | 15.739 | 15.098 | 14.805 | 16 |
| 1023 | 16.000 | 17.508 | 16.918 | 18 |
| 2047 | 16.000 | 19.705 | 18.996 | 20 |
| 4095 | 16.000 | 20.000 | 21.394 | 22 |
| 8191 | 16.000 | 20.000 | 23.630 | 24 |

Table 11.6 Average number of variables to represent incompletely specified index generation function

| $k$ | $n=16$ | $n=20$ | $n=24$ | $2\left\lceil\log _{2}(k+1)\right\rceil-3$ |
| :--- | ---: | ---: | ---: | :--- |
| 7 | 3.052 | 3.018 | 3.003 | 3 |
| 15 | 4.980 | 4.947 | 4.878 | 5 |
| 31 | 6.447 | 6.115 | 6.003 | 7 |
| 63 | 8.257 | 8.007 | 8.000 | 9 |
| 127 | 10.304 | 10.000 | 9.963 | 11 |
| 255 | 12.589 | 11.996 | 11.896 | 13 |
| 511 | 14.890 | 14.019 | 13.787 | 15 |
| 1023 | 15.991 | 16.293 | 15.874 | 17 |
| 2047 | 16.000 | 18.758 | 17.965 | 19 |
| 4095 | 16.000 | 19.992 | 20.093 | 21 |

example, when $u=15$, to represent a uniformly distributed function, Conjecture 11.4.2 shows that 6 variables are sufficient. On the other hand, experimental results show that only 4,5 , or, 6 variables are necessary to represent the functions. We note that the variance is very small.

### 11.6.2 Random Index Generation Functions

We generated uniformly distributed index generation functions. Table 11.6 shows the average numbers of variables to represent $n$-variables index generation functions with $k$ registered vectors. For the other $2^{n}-k$ combinations, the outputs are set to don't cares. The values are the average of 1,000 randomly generated functions. Table 11.6 shows that the necessary number of variables to represent the functions strongly depends on $k$.

The last column of Table 11.6 shows the number of variables to represent incompletely specified index generation functions with weight $k$ given by Conjecture 11.5.2. For example, when $k=31$, to represent a uniformly distributed function, Conjecture 11.5 .2 shows that 9 variables are sufficient. On the other hand, experimental results show that only 6 or 7 variables are necessary to represent the functions. Again, the variance is very small.

### 11.6.3 IP Address Table

To verify the effectiveness of the method in a practical application, we used distinct IP addresses of computers that accessed our web site over a period of a month. We considered four lists of different sizes: List 1, List 2, List 3, and List 4. Table 11.7 shows the results. The first row shows the number of registered vectors: $k$. The second row shows the number of inputs: $n$. The third row shows the number of outputs: $q=\left\lceil\log _{2}(k+1)\right\rceil$. The fourth row shows the number of variables sufficient to represent the functions given by Conjecture 11.5.2, i.e., $2\left\lceil\log _{2}(k+1)\right\rceil-3$. The fifth

Table 11.7 Realization of IP address tables

|  | List 1 | List 2 | List 3 | List 4 |
| :--- | :---: | :---: | :---: | :---: |
| \# of vectors: $k$ | 1,670 | 3,288 | 4,591 | 7,903 |
| \# of inputs: $n$ | 32 | 32 | 32 | 32 |
| \# of outputs: $q$ | 11 | 12 | 13 | 13 |
| $2\left\lceil\log _{2}(k+1)\right\rceil-3$ | 19 | 21 | 23 | 23 |
| \# of variables using Single-input hash: $n_{s}$ | 18 | 20 | 21 | 23 |
| \# of variables using Double-input hash: $n_{d}$ | 17 | 19 | 20 | 21 |
| Single-memory realization $\left(\times 10^{10}\right.$ bits $)$ | 4.72 | 5.15 | 5.58 | 5.58 |
| Realization using Single-input hash $\left(\times 10^{6}\right.$ bits $)$ | 2.95 | 12.7 | 27.5 | 109.3 |
| Realization using Double-input hash $\left(\times 10^{6}\right.$ bits $)$ | 1.51 | 6.3 | 13.9 | 27.5 |

row shows the number of variables to represent the function, where the number of variables was minimized by Algorithm 11.3.1. In this case, selected input variables are connected to the main memory through the single-input hash circuit shown Fig. 10.3. The sixth row shows the number of variables to represent the function, where a linear transformation is used to reduce the number of variables. In this case, the double-input hash circuit shown in Fig. 10.2 is used. The seventh row shows the number of bits to represent the function by a single memory: $q 2^{n}$. The eighth row shows the total number of bits to represent the function by using the single-input hash circuit shown in Fig. 10.3: $q 2^{n_{s}}+n 2^{q}$, where the first term denotes the size of the main memory, while the second term denotes the size of the AUX memory. The last row shows the total number of bits needed to represent the function by using the double-input hash circuit shown in Fig. 10.2: $q 2^{n_{d}}+n 2^{q}$. As shown in Table 11.7, the total amount of memory can be drastically reduced.

### 11.6.4 Benchmark Multiple-Output Functions

We reduced the number of variables for selected PLA benchmark functions [159]. Table 11.8 shows the numbers of variables to represent benchmark functions ( $b c 0$, chkn, in2, in7, intb, and vg2) for different values of care minterms $k$. In the table, $n$ denotes the number of original input variables, $q$ denotes the number of outputs, and $W$ denotes the number of products in the PLA. The rightmost column shows the upper bound derived by Conjecture 11.5.3: $2\left\lceil\log _{2}(k+1)\right\rceil-3$. Out of $2^{n}$ combinations, we randomly selected $k$ different combinations as care minterms, and set other $2^{n}-k$ minterms to don't cares. Then, we minimized the number of variables. From the table, we observe that the number of variables strongly depends on $k$, but is virtually independent of $n, q, W$, and the function name. Again, for these benchmark functions, the upper bounds on the number of products given by the Conjecture 11.5.3 are valid.

Table 11.8 Number of variables needed to represent incompletely specified multiple-output PLA benchmark functions

| $k$ | $b c 0$ | $c h k n$ | in2 | in7 | intb | $v g 2$ | Conj. 11.5.3 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :---: |
|  | $n=26$ | $n=29$ | $n=19$ | $n=27$ | $n=15$ | $n=25$ |  |
|  | $q=11$ | $q=7$ | $q=7$ | $q=10$ | $q=7$ | $q=8$ | Upper |
|  | $W=179$ | $W=142$ | $W=135$ | $W=55$ | $W=631$ | $W=110$ | bound |
| 15 | 4 | 4 | 4 | 4 | 5 | 4 | 5 |
| 31 | 6 | 5 | 6 | 6 | 7 | 6 | 7 |
| 63 | 8 | 7 | 7 | 8 | 8 | 7 | 9 |
| 127 | 9 | 9 | 8 | 8 | 10 | 9 | 11 |
| 255 | 11 | 10 | 9 | 10 | 12 | 11 | 13 |
| 511 | 12 | 12 | 10 | 12 | 14 | 13 | 15 |
| 1023 | 14 | 14 | 13 | 13 | 15 | 14 | 17 |

### 11.7 Remarks

For incompletely specified index generation functions, reduction of the number of variables is quite effective. Combined with the hash method presented in the previous chapter, this method drastically reduces the amount of memory to implement the function. An extension to multi-valued input functions is considered in [137]. This chapter is based on $[111,131,132,137]$.

## Problems

11.1. Minimize the number of variables for the incompletely specified logic function whose characteristic functions are:

$$
\begin{gathered}
F_{0}=\bar{x}_{5}\left\{\bar{x}_{1} x_{2}\left(x_{3} \vee \bar{x}_{4}\right) \vee x_{1} x_{2}\left(x_{3} \oplus x_{4}\right)\right\} \vee x_{5}\left\{x_{2} \bar{x}_{3} x_{4} \vee \bar{x}_{1} \bar{x}_{3} \bar{x}_{4} \vee \bar{x}_{1} x_{2} \bar{x}_{3}\right\} . \\
F_{1}=\bar{x}_{5}\left(\bar{x}_{1} \bar{x}_{2} x_{3} \vee x_{1} \bar{x}_{3} \bar{x}_{4} \vee x_{1} \bar{x}_{2} x_{4}\right) \vee x_{5}\left(x_{1} \bar{x}_{2} \bar{x}_{3} \vee x_{1} \bar{x}_{2} \bar{x}_{4} \vee x_{1} x_{2} x_{3} x_{4} \vee \bar{x}_{1} \bar{x}_{2} x_{3} x_{4}\right) .
\end{gathered}
$$

11.2. Consider the incompletely specified function of 8 variables in Table 11.9. Minimize the number of variables.
11.3. Consider a binary matrix of 8 columns and 7 rows, where 0 's and 1 's are distributed uniformly.

1. Calculate the probability that all the rows are distinct.
2. Remove the first column. Calculate the probability that all the rows are distinct.
11.4. Let $n=10$ and $k=31$. Consider a decomposition chart of an index generation function $f\left(X_{1}, X_{2}\right)$ with weight $k$, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{9}\right)$ and $X_{2}=\left(x_{10}\right)$. Calculate the probability that each column has at most one nonzero element. Also, calculate the probability, among the 10 partitions, where $X_{1}=X-x_{i}$,

Table 11.9 Incompletely specified function of 8 variables

|  | $a$ | $b$ | $c$ | $d$ | $e$ | $f$ | $g$ | $h$ | $F$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $v_{1}$ | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| $v_{2}$ | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| $v_{3}$ | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
| $v_{4}$ | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 |
| $v_{5}$ | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
| $v_{6}$ | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 |

Table 11.10 4-Valued input index generation function

| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $x_{5}$ | $x_{6}$ | $x_{7}$ | $x_{8}$ | $f$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| A | A | G | A | G | C | T | A | 1 |
| A | A | G | C | A | C | G | C | 2 |
| G | A | A | G | A | T | C | A | 3 |
| C | T | G | G | A | G | G | G | 4 |
| T | A | G | G | G | A | T | A | 5 |
| T | A | T | G | C | C | A | G | 6 |
| T | G | A | C | C | G | C | G | 7 |

$X_{2}=\left(x_{i}\right)$, and $X=\left(x_{1}, x_{2}, \ldots, x_{10}\right)$ for $(i=1,2, \ldots, 10)$, there exist at least one partition, in which each column has at most one nonzero element. It is suggested that you use a computer or a calculator to obtain this value.
11.5. Consider a set of uniformly distributed incompletely specified index generation functions of $n$ variables with weight $k$. Derive the probability that all the variables are essential, for $n=2 r$ and $k=2^{r}$.
11.6. Show a 4 -variable incompletely specified index generation function satisfying the following conditions:

1. Four combinations are mapped to 1 .
2. Four combinations are mapped to 0 .
3. All other 8 combinations are mapped to don't cares.
4. All the variables are essential.
11.7. The four bases found in deoxyribonucleic acid (DNA) are adenine (abbreviated A), cytosine (C), guanine (G), and thymine (T). Consider the DNA patterns shown in Table 11.10. Find the minimum set of variables to distinguish these patterns.
11.8. Consider a set of uniformly distributed, incompletely specified index generation functions $f: D \rightarrow I$, where $B=\{0,1\}, D \subset B^{n}$ and $I=\{1, \ldots, k\}$. Then, the probability that $f$ can be represented with only $x_{1}, x_{2}, \ldots, x_{p-1}$ and $x_{p}$, where $p<n$ is $\delta_{n-p}=\gamma_{n-p}^{M}$, where $\gamma_{n-p}=\beta^{2^{n-p}}+2^{n-p} \alpha \beta^{2^{n-p}-1}, \alpha=\frac{k}{2^{n}}$, $\beta=1-\alpha$, and $M=2^{p}$. Prove this.
11.9. Consider a set of uniformly distritbuted index generation functions $f$ : $B^{n} \rightarrow I$, where $B=\{0,1\}$ and $I=\{0,1, \ldots, k\}$. Let $P R$ be the probability that $f\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ can be represented by using only $p$ variables. Then

$$
P R=1-\left(1-\delta_{n-p}\right){ }^{\binom{n}{p}},
$$

where $\delta_{n-p}$ is the probability that $f\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ can be represented by using only $x_{1}, x_{2}, \ldots, x_{p-1}$, and $x_{p}$. Compute the numerical values of $P R$ when $n=20$ and $k=2047$, for $p=17,18$, and 19 . You can use the results of the previous problem.
11.10. Consider a group of 64 people. Obtain the probability that the birthdays of all the people are distinct. Assume that the probabilities of the birth are the same for all 365 days in a year. Calculate the expected number of distinct birthdays in the group of 64 people.
11.11. Assume that the probabilities of birth are the same for all 365 days in a year. Calculate the expected number of distinct birthdays in a room with 365 people.

## Chapter 12 Various Realizations

This chapter shows various realizations of index generation functions.

### 12.1 Realization Using Registers, Gates, and An Encoder

An index generator can be directly implemented using a Programmable Logic Array (PLA) or a Content Addressable Memory (CAM). An index generator can also be implemented using ordinary logic elements. Figure 12.1 [98] shows an index generator for an LPM implemented by registers, gates, and a priority encoder. A register pair (Reg. 1 and Reg. 0) is used to store each digit of a ternary vector. For example, if the digit is * (don't care), the register pair stores $(1,1)$. Thus, for $n$ bit data, we need a $2 n$-bit register. The comparison circuit consists of an $n$-input AND gate and $n$ comparison circuits, each of which produces a 1 if and only if the input bit matches the stored bit or the stored bit is don't care (* or 11).

For each prefix vector of an $n$-input LPM index generator, we need a $2 n$-bit register, $n$ comparison circuits, and an $n$-input AND gate. For an $n$-input index generator with $k$ registered prefix vectors, we need $k$ registers of $2 n$ bits, $n k$ comparison circuits, and $k$ AND gates with $n$ inputs. In addition, we need a priority encoder with $k$ inputs and $\left\lceil\log _{2}(k+1)\right\rceil$ outputs to generate the LPM address.

The demerit of this circuit is that it becomes complex as the number of registered vectors increases.

### 12.2 LUT Cascade Emulator

In an LUT cascade, once the number of inputs and outputs for each cell are fixed, only a limited class of functions can be realized. Thus, the LUT cascade emulator ${ }^{1}$ having the architecture shown in Fig. 12.2 has been developed [96]. It consists

[^2]

Fig. 12.1 Index generator implemented using registers, gates, and a priority encoder

Fig. 12.2 LUT cascade emulator

of a large memory that stores the data for cells, a programmable interconnection network, and a control circuit. It emulates an LUT cascade sequentially. That is, the outputs of the various LUTs are produced in sequence starting with the leftmost LUT. Although the emulator is slower than the LUT cascade, it is more flexible than the LUT cascade.

A shifter that drives the programmable interconnection network is used for memory packing [118], while a shifter that drives the output register is used to accumulate the outputs.

Example 12.2.1. Let us emulate the LUT cascade with four cells shown in Fig. 12.3 by the emulator [123].


Fig. 12.3 Operation of an LUT cascade emulator

Step 1 To emulate Cell $_{1}$, the two most significant bits of the address for memory for logic are set to $(0,0)$ to specify the first page of the memory for logic, which corresponds to the first cell. Also, the values of $X_{1}$ are set to the lower address bits of the memory for logic, as shown in Fig. 12.3a. This is done through the programmable interconnection network in Fig. 12.2. By reading the content of the first page, we obtain the outputs of $\mathrm{Cell}_{1}$.
Step 2 To emulate $\mathrm{Cell}_{2}$, the two most significant bits of address are set to $(0,1)$ to specify the second page. Also, the values of $X_{2}$ are set to the middle address bits, and the outputs of $\mathrm{Cell}_{1}$ are connected to the least significant bits through the programmable interconnection network, as shown in Fig. 12.3b. By reading the content of the second page, we obtain the outputs of $\mathrm{Cell}_{2}$.
Step 3 To emulate $\mathrm{Cell}_{3}$, the two most significant bits of address are set to $(1,0)$ to specify the 3rd page. Also, the values of $X_{3}$ are set to the middle address bits, and the outputs of $\mathrm{Cell}_{2}$ are connected to the least significant bits through the programmable interconnection network, as shown in Fig. 12.3c. By reading the content of the 3rd page, we obtain the outputs of $\mathrm{Cell}_{3}$.
Step 4 To emulate Cell $_{4}$, the two most significant bits of address are set to $(1,1)$ to specify the last page. Also, the values of $X_{4}$ are set to the middle address bits, and the outputs of $\mathrm{Cell}_{3}$ are connected to the least significant bits through the programmable interconnection network, as shown in Fig. 12.3d. By reading the content of the 4th page, we obtain the outputs of Cell $_{4}$. At this point, the outputs of all LUTs have been obtained, and the complete circuit's output is specified.

### 12.3 Realization Using Cascade and AUX Memory

Here, we show a method to reduce hardware by using an auxiliary memory and a comparator. Figure 12.4 illustrates the idea of the method.


Fig. 12.4 Index generator using auxiliary memory

## Algorithm 12.3.1. (Simplification of Index Generators).

1. Let $f$ be an index generation function. Let $g$ be the function where the output values for the nonregistered inputs in $f$ are replaced by don't cares.
2. Reduce the number of variables to represent $g$. Produce the circuit for $g$. In general, the circuit for $g$ is simpler than the circuit for $f$.
3. When the query data matches a registered vector, the circuit $g$ produces correct outputs. When the query data does not match any registered vector, the circuit for $g$ may produce wrong output values.
4. To detect the correct outputs, we use an auxiliary (AUX) memory with $q=$ $\left\lceil\log _{2}(k+1)\right\rceil$ inputs and $n$ outputs. The AUX memory stores corresponding registered vector for each address.
5. Apply the output address of the circuit for $g$ to the AUX memory, and read out the registered vector in the $A U X$ memory. If the output vector of the $A U X$ memory equals to the input vector, then the circuit for $g$ produces the correct output value. If the output vector of the AUX memory is different from the input vector, then the input vector is not registered. In this case, the comparator sends 0 to the AND gate. Thus, the circuit produces 0 .

An ordinary logic circuit can be simplified by don't cares [110]. The present method has the following features:

- The number of nonzero outputs $(k)$ of the index generation function is much smaller than the total number of input combinations $2^{n}$. In $g$, the outputs for the nonregistered inputs are set to don't cares.
- To verify the correctness of the output of the circuit for $g$, we use the AUX memory.

The total amount of hardware is smaller than the direct implementation of $f$. In logic synthesis using memories, reduction of support variables is important. In the index generation functions, the fraction of don't cares is very large, and we can often reduce the number of support variables. Details are shown in Chap. 11.

Example 12.3.1. (LUT Cascade and LUT Memory) Let us design the index generator for the registered vector table shown in Table 12.1. Since the number of the

Table 12.1 Registered vector table

| Index | $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $x_{5}$ | $x_{6}$ | $x_{7}$ | $x_{8}$ | $x_{9}$ | $x_{10}$ | $x_{11}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 |
| 2 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 |
| 3 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
| 4 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 |
| 5 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 |
| 6 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| 7 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 |
| 8 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 9 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 |
| 10 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
| 11 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| 12 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 13 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 |
| 14 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 15 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |

Table 12.2 Reduced registered vector table

| Index | $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{8}$ | $x_{10}$ | $x_{11}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 0 | 0 | 1 | 1 | 0 | 0 |
| 2 | 0 | 0 | 1 | 0 | 0 | 1 |
| 3 | 0 | 0 | 1 | 0 | 1 | 0 |
| 4 | 0 | 1 | 0 | 0 | 0 | 1 |
| 5 | 0 | 1 | 0 | 1 | 1 | 1 |
| 6 | 1 | 0 | 0 | 0 | 0 | 0 |
| 7 | 1 | 0 | 0 | 0 | 0 | 1 |
| 8 | 1 | 0 | 0 | 1 | 0 | 1 |
| 9 | 1 | 0 | 0 | 1 | 1 | 1 |
| 10 | 1 | 0 | 1 | 1 | 0 | 1 |
| 11 | 1 | 0 | 1 | 0 | 0 | 0 |
| 12 | 1 | 1 | 0 | 0 | 0 | 1 |
| 13 | 1 | 1 | 0 | 1 | 1 | 1 |
| 14 | 1 | 1 | 0 | 0 | 0 | 0 |
| 15 | 1 | 1 | 0 | 1 | 0 | 1 |

registered vectors is 15 , the index generator has 4 outputs. It has 11 inputs, as well. Let $g$ be a function, where the output values for the nonregistered input vectors are replaced by don't cares. By reducing the number of variables, $g$ can be represented with only six variables $\left\{x_{1}, x_{2}, x_{3}, x_{8}, x_{10}, x_{11}\right\}$. This can be obtained by Algorithm 11.3.1 with the help of a computer program. Sufficiency can be verified by Table 12.2, where all the reduced vectors are different. Thus, six bits are sufficient to distinguish the indices. Also, the set is minimal. That is, deletion of any variable makes at least one pair of indices indistinguishable. Next, realize the reduced function $g$ by an LUT cascade. Figure 12.5 shows an index generator using the AUX memory. Note that, in Fig. 12.5, only six variables $\left\{x_{1}, x_{2}, x_{3}, x_{8}, x_{10}, x_{11}\right\}$ are used as the inputs for the LUT cascade. We designed the cascade with 5-LUTs. Table 12.3 shows the content of the AUX memory, where the values for $\left\{x_{4}, x_{5}, x_{6}, x_{7}, x_{9}\right\}$ are stored.


Fig. 12.5 Index generator using auxiliary memory
Table 12.3 Truth table for auxiliary memory

| $z_{3}$ | $z_{2}$ | $z_{1}$ | $z_{0}$ | $y_{4}$ | $y_{5}$ | $y_{6}$ | $y_{7}$ | $y_{9}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 |
| 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
| 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 |
| 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 |
| 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 |
| 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 |
| 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 |
| 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
| 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |

The second cell realizes the temporary index $\left(z_{3}, z_{2}, z_{1}, z_{0}\right)$. This index is used to read the AUX memory. The output of the AND memory $\left(y_{4}, y_{5}, y_{6}, y_{7}, y_{9}\right)$ is compared with the input values $\left(x_{4}, x_{5}, x_{6}, x_{7}, x_{9}\right)$. If they agree, the temporary index is correct, then $\left(f_{3}, f_{2}, f_{1}, f_{0}\right)=\left(z_{3}, z_{2}, z_{1}, z_{0}\right)$ is the output. Otherwise, the input query data is not in the AUX memory, and $\left(f_{3}, f_{2}, f_{1}, f_{0}\right)=(0,0,0,0)$ is produced at the output.

Figure 12.5 shows the circuit. The total memory is

$$
32 \times 4 \times 2+16 \times 5=256+80=336
$$

bits. Tables 12.4 and 12.5 are truth tables for the cells in the cascade.

Table 12.4 Truth table for the first cell

| Index | $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{8}$ | $x_{10}$ | $u_{1}$ | $u_{2}$ | $u_{3}$ | $u_{4}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 |
| 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| 3 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| 4 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 5 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 6 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 7 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 8 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 |
| 9 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
| 10 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 |
| 11 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 |
| 12 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 13 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 14 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 15 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 |

Table 12.5 Truth table for the second cell

| Index | $u_{1}$ | $u_{2}$ | $u_{3}$ | $u_{4}$ | $x_{11}$ | $z_{3}$ | $z_{2}$ | $z_{1}$ | $z_{0}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| 2 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 3 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 |
| 4 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 5 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 6 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 |
| 7 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 8 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| 9 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
| 10 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
| 11 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 |
| 12 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| 13 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 14 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 15 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

### 12.4 Comparison of Various Methods

Example 12.4.1. Estimate the amount of hardware to implement index generation functions for

1. $n=10$ and $k=500$. Use a single LUT.
2. $n=10$ and $k=15$. Use an LUT cascade.
3. $n=48$ and $k=100$. Use the hybrid method.
4. $n=48$ and $k=1,000$. Use the super hybrid method.
5. $n=32$ and $k=500,000$. Use the standard parallel sieve method.
(Solution)

## 1. Single LUT.

When $n=10$ and $k=500$. The number of inputs for the memory is $n=10$, and the number of outputs is $q=\left\lceil\log _{2}(500+1)\right\rceil=9$. Thus, the size of the memory is $2^{10} \times 9=9$ Kibits, where 1 Kibit denotes $2^{10}$ bits.
2. LUT Cascade.

When $n=10$ and $k=15$. Consider an LUT cascade with ( $K=6$ )-input LUTs. The number of inputs is $n=10$, the number of rails is $w=\left\lceil\log _{2}(15+1)\right\rceil=4$, and the number of outputs is $m=w=4$. The number of cells is

$$
s=\left\lceil\frac{n-w}{K-w}\right\rceil=\left\lceil\frac{10-4}{6-4}\right\rceil=\frac{6}{2}=3 .
$$

The total amount of memory is

$$
2^{6} \times 4 \times 3=3 \times 2^{8}=0.75 \times 2^{10}
$$

or, 0.75 Kibits.
3. Hybrid Method.

When $n=48, k_{1}=100 . q=\left\lceil\log _{2}(100+1)\right\rceil=7$. In this case, an LUT cascade would be too large, and so we use the hybrid method. Let the number of inputs to the main memory be $p=q+2=9$. In this case, by Corollary 10.5.1, the fraction of remaining registered vectors is

$$
\gamma_{1}=1-\delta_{1}=\frac{\xi-1+\mathrm{e}^{-\xi}}{\xi} .
$$

Since $p=9$ and $k_{1}=100$, we have $\xi=0.1953$, and

$$
\gamma_{1}=1-0.9084=0.0916 .
$$

Thus, the number of remaining vectors is $\gamma_{1} k_{1} \simeq 9$, which can be implemented by an LUT cascade or a rewritable PLA.
The sizes of memories are as follows:
Main memory: 9-input, 7 -outputs: $2^{9} \times 7=3.5 \times 2^{10}=3.5$ Kibits.
AUX memory: 7 -input, 41 -outputs: $2^{7} \times 41=5.1 \times 2^{10}=5.1$ Kibits.
Thus, the total memory size is 8.6 Kibits.
4. Super Hybrid Method.

When $n=48, k_{1}=1,000 . q_{1}=\left\lceil\log _{2}(1000+1)\right\rceil=10$. In the hybrid method, the remaining vector is $10 \%$ of the original vectors. That is, 100 , which is fairly large. Thus, we use the super hybrid method. In the super hybrid method, we use
the first main memory with $p_{1}=q_{1}+1=11$ inputs, and $q_{1}=10$ outputs. The fraction of vectors not realized by the first IGU is

$$
\gamma_{1}=1-\delta_{1}=\frac{\xi_{1}-1+\mathrm{e}^{-\xi_{1}}}{\xi_{1}}
$$

When, $k_{1}=1,000$ and $p_{1}=11$, we have $\xi_{1}=0.48828$ and $\gamma_{1}=0.2088$. The number of remaining vectors is $k_{2}=k_{1} \gamma_{1} \simeq 209 . q_{2}=\left\lceil\log _{2}(209+1)\right\rceil=8$.

The second main memory has $p_{2}=q_{2}+1=8+1=9$ inputs and $q_{2}=8$ outputs. The fraction of vectors not realized by the second IGU is

$$
\gamma_{2}=1-\delta_{2}=\frac{\xi_{2}-1+\mathrm{e}^{-\xi_{2}}}{\xi_{2}}
$$

When, $k_{2}=209$ and $p_{2}=9$, we have $\xi_{2}=0.398437$, and $\gamma_{2} \simeq 0.1752$. Thus, the number of remaining vectors is $k_{3}=k_{2} \gamma_{2} \simeq 36$, which can be implemented by an LUT cascade or a rewritable PLA.
The sizes of memories are as follows:
first main memory: 11-input, 10 -outputs: $2^{11} \times 10=20$ Kibits.
first AUX memory: 10-input, 37-outputs: $2^{10} \times 37=37$ Kibits.
second main memory: 9 -input, 8 -outputs: $2^{9} \times 8=4 \times 2^{10}=4$ Kibits.
second AUX memory: 8 -input, 39 -outputs: $2^{8} \times 39=9.75$ Kibits.
Thus, the total memory size is 70.75 Kibits.
5. Standard Parallel Sieve Method.

When $k_{1}=500,000$, we have $q_{1}=\left\lceil\log _{2}(500,000+1)\right\rceil=19$. In the super hybrid method, the remaining vector is $4 \%$ of the original vectors. That is, 20000, which is very large. Thus, we use the standard parallel sieve method. In the standard parallel sieve method, the first main memory has $p_{1}=q_{1}=19$ inputs and $q_{1}=19$ outputs. The fraction of vectors not realized by the first IGU is

$$
\gamma_{1}=1-\delta_{1}=\frac{\xi_{1}-1+\mathrm{e}^{-\xi_{1}}}{\xi_{1}}
$$

When $k_{1}=500,000$ and $p_{1}=19$, we have $\xi_{1}=0.953674$ and $\gamma_{1}=0.35546$. Thus, the number of remaining vectors is $k_{2}=k_{1} \gamma_{1} \simeq 177,733$. and $q_{2}=$ $\left\lceil\log _{2}(177733+1)\right\rceil=18$.

The second main memory has $p_{2}=q_{2}=18$ inputs and $q_{2}=18$ outputs. The fraction of vectors not realized by the second IGU is

$$
\gamma_{2}=1-\delta_{2}=\frac{\xi_{2}-1+\mathrm{e}^{-\xi_{2}}}{\xi_{2}}
$$

When, $k_{2}=177,733$ and $p_{2}=18$, we have $\xi_{2}=0.6779976$, and the number of remaining vectors is $k_{3}=k_{2} \gamma_{2} \simeq 48662$.

In the similar way, we have

$$
\begin{aligned}
p_{3} & =16, k_{4} \simeq 14,316 . \\
p_{4} & =14, k_{5} \simeq 4,771 . \\
p_{5} & =13, k_{6} \simeq 1,155 . \\
p_{6} & =11, k_{7} \simeq 273 . \\
p_{7} & =9, \quad k_{8} \simeq 62 .
\end{aligned}
$$

Thus, the number of remaining vectors is $k_{8} \simeq 62$, which can be implemented by an LUT cascade or a rewritable PLA.

Note that for each $I G U_{i}$, the main memory has $p_{i}$ inputs and $p_{i}$ outputs, while the AUX memory has $p_{i}$ inputs and $\left(n-p_{i}\right)$ outputs. Thus, the total amount of memory is

$$
p_{i} 2^{p_{i}}+\left(n-p_{i}\right) 2^{p_{i}}=n 2^{p_{i}}
$$

So, the total amount of memory for the parallel sieve method is

$$
\sum_{i=1}^{r} n 2^{p_{i}}=32 \cdot\left(2^{19}+2^{18}+2^{16}+2^{14}+2^{13}+2^{11}+2^{9}\right)
$$

It is about 24-Mibit, where 1 Mibit denotes $2^{20}$ bits.
Example 12.4.2. Consider a system that detects computer viruses. A complete system using only hardware is too complex, so we use two-stage method: In the first stage, suspicious patterns are detected by hardware, and in the second stage, a complete match is performed by software only for the patterns detected in the first stage. Here, we consider the hardware part in the first stage. Assume that we check the text using a window of four characters, and the number of suspicious patterns is $k=500,000$. Fig. 12.6 shows the circuit to detect the suspicious patterns. Eight 4-stage shift registers are used to store four characters. These registers work as a window. Note that the number of inputs to the memory is $4 \times 8=32$, and the number of outputs is $\left\lceil\log _{2}(k+1)\right\rceil=19$.


Fig. 12.6 Virus scanning circuit

A straightforward implementation requires a memory with impractical size: $\left\lceil\log _{2}(k+1)\right\rceil 2^{32}=76$ Gibits, where 1 Gibit denotes $2^{30}$ bits. If we use the standard parallel sieve method shown in Example 12.4.1, we need only 24 Mibits.

### 12.5 Code Converter

In this part, we consider a class of code converters that can be treated as index generation functions.

Definition 12.5.1. An m-out-of-n code consists of $\binom{n}{m}$ binary code words whose weights are $m$.

Definition 12.5.2. An m-out-of-n to binary converter realizes an index generation function with $\binom{n}{m}$ nonzero elements. It has $n$ inputs and $\left\lceil\log _{2}\binom{n}{m}+1\right\rceil$ outputs. When the number of 1 's in the inputs is not $m$, the converter produces the all 0 code. The $m$-out-of- $n$ code is produced in ascending lexicographical order. That is, the smallest number is denoted by $(0,0, \ldots, 0,1,1, \ldots, 1)$, while the largest number is denoted by $(1,1, \ldots 1,0,0, \ldots, 0)$.

Example 12.5.1. When $n=6$ and $m=3$, we have the function shown in Table 12.6.

Table 12.6 Registered vector table for 3-out-of-6 to binary converter

| 3 -out-of- 6 code |  |  |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :---: |
| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $x_{5}$ | $x_{6}$ | Index |
| 0 | 0 | 0 | 1 | 1 | 1 | 1 |
| 0 | 0 | 1 | 0 | 1 | 1 | 2 |
| 0 | 0 | 1 | 1 | 0 | 1 | 3 |
| 0 | 0 | 1 | 1 | 1 | 0 | 4 |
| 0 | 1 | 0 | 0 | 1 | 1 | 5 |
| 0 | 1 | 0 | 1 | 0 | 1 | 6 |
| 0 | 1 | 0 | 1 | 1 | 0 | 7 |
| 0 | 1 | 1 | 0 | 0 | 1 | 8 |
| 0 | 1 | 1 | 0 | 1 | 0 | 9 |
| 0 | 1 | 1 | 1 | 0 | 0 | 10 |
| 1 | 0 | 0 | 0 | 1 | 1 | 11 |
| 1 | 0 | 0 | 1 | 0 | 1 | 12 |
| 1 | 0 | 0 | 1 | 1 | 0 | 13 |
| 1 | 0 | 1 | 0 | 0 | 1 | 14 |
| 1 | 0 | 1 | 0 | 1 | 0 | 15 |
| 1 | 0 | 1 | 1 | 0 | 0 | 16 |
| 1 | 1 | 0 | 0 | 0 | 1 | 17 |
| 1 | 1 | 0 | 0 | 1 | 0 | 18 |
| 1 | 1 | 0 | 1 | 0 | 0 | 19 |
| 1 | 1 | 1 | 0 | 0 | 0 | 20 |

Theorem 12.5.1. Let $f\left(x_{1}, x_{2}, \ldots, x_{n}\right)$ be a m-out-of-n to binary converter. Let $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{p}\right)$ and $X_{2}=\left(x_{p+1}, x_{p+2}, \ldots, x_{n}\right)$ be a partition of $X=\left(x_{1}, x_{2}, \ldots, x_{n}\right)$. Then, the column multiplicity $\mu_{p}$ of the decomposition $f\left(X_{1}, X_{2}\right)$ is

$$
\begin{aligned}
& \mu_{p}=2^{p} \quad(\text { when } 1 \leq p \leq m) \\
& \mu_{p}=1+\sum_{i=0}^{m}\binom{p}{i} \quad(\text { when } m<p<n-1) \\
& \mu_{p}=\binom{n}{m}+1 \quad(\text { when } p=n-1, \text { or } n)
\end{aligned}
$$

Example 12.5.2. Consider the case of $m=2$ and $n=9$. The profile of the 2-outof -9 to binary converter is

$$
\left(\mu_{1}, \mu_{2}, \mu_{3}, \mu_{4}, \mu_{5}, \mu_{6}, \mu_{7}, \mu_{8}, \mu_{9}\right)=(2,4,8,12,17,23,30,37,37)
$$

Thus, it can be realized by an LUT cascade as shown in Fig. 12.7. Note that the rightmost LUT has 8 inputs. Lemma 4.2 .2 shows that an 8 -LUT can be realized with 5 modules of 6 -LUTs. Thus, the total number of 6 -LUTs to implemented the function is $6+5+5 \times 6=41$.

Example 12.5.3. Consider the case of $m=2$ and $n=20$. The profile of the 2 -out-of-20 converter is

$$
(2,4,8,12,17,23,30,38,47,57,68,80,93,107,122,138,155,173,191,191)
$$

In this case, the LUT cascade realization is not attractive. Thus, we consider a tree-type realization. Partition the inputs into $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{10}\right)$ and $X_{2}=$ ( $x_{11}, x_{12}, \ldots, x_{20}$ ). The column multiplicity of the decomposition with respect to ( $X_{1}, X_{2}$ ) and ( $X_{2}, X_{1}$ ) are the same and are both 57 . Thus, it can be realized by the circuit shown in Fig. 12.8.

Another implementation is IGU shown in Fig. 10.1. If we can use the linear transformation with many EXOR inputs, then the number of inputs to the main memory is reduced to 9 . The derivation of such a linear transformation is beyond the scope of the book.

Fig. 12.7 LUT cascade realization of 2-out-of-9 to binary converter


Fig. 12.8 Tree-type
realization of 2-out-of-20
to binary converter


Fig. 12.9 Tree-type realization of 3-out-of-20 to binary converter


Example 12.5.4. Consider the case of $m=3$ and $n=20$. The profile of the 3 -out-of-20 converter is

$$
\begin{aligned}
& (2,4,8,16,27,43,65,94,131,177,233,300,379,471,577,698,835,988, \\
& \quad 1141,1141)
\end{aligned}
$$

In this case, the LUT cascade realization is impractical. To realize a tree-type circuit, we partition the inputs into $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{10}\right)$ and $X_{2}=\left(x_{11}, x_{12}, \ldots, x_{20}\right)$. The column multiplicity of the decomposition with respect to ( $X_{1}, X_{2}$ ) and $\left(X_{2}, X_{1}\right)$ are the same and equal to 177 . Thus, it can be realized by the treetype circuit shown in Fig. 12.9. Unfortunately, the output LUT has 16 inputs and 11 outputs, and is rather large. Since the function is an index generation function, it can be realized as shown in Chap. 10.

Another implementation is IGU shown in Fig. 10.1. If we can use the linear transformation with many EXOR inputs, then the number of inputs to the main memory is reduced to 11 . Again, the derivation of such a linear transformation is beyond the scope of the book.

### 12.6 Remarks

This chapter presented various methods to implement index generation functions. Given the number of variables $n$, the number of registered vectors $k$, and available devices, we can select the best design method among various methods: singlememory, LUT cascade, LUT cascade emulator, the hybrid method, the super hybrid method, and the standard parallel sieve method. This chapter is based on [85, 97, 98, 123].

## Problems

12.1. Realize the index generation function shown in Table 12.1. Use an LUT cascade, where each LUT has at most 6 inputs. Compare the amount of memory to implement the function by the method shown in Example12.3.1 and the LUT cascade.
12.2. Realize the index generation function shown in Table 12.1. Use an LUT cascade emulator, where each LUT has at most 6 inputs. Explain the operation of the emulator. Compare the size of memory with the LUT cascade realization.
12.3. The standard parallel sieve method presented in Chapter 10 uses IGUs with different sizes. Consider the parallel sieve method that uses IGUs with the same sizes. Discuss the advantage and disadvantage of this approach [136].
12.4. Realize the 2 -out-of- 12 to binary converter.

## Chapter 13 <br> Conclusions

This book showed various methods to realize logic functions with LUTs. These methods can be used to design FPGAs as well as custom integrated circuits. Main applications are communication circuits and pattern matching circuits, where frequent reconfiguration is needed. Major results are as follows:

1. Chapter 4 showed a general method to realize logic functions by LUTs. The number of 6-LUTs to realize an $n$-variable function is $\left(2^{n-4}-3\right) / 3$ or less, when $n=2 r$.
2. Chapter 5 derived the number of LUTs to realize logic functions with small Cmeasure $\mu(f)$. The number of 6 -LUTs to realize an $n$-variable function ( $n \geq$ 8 ) is:
(a) $5 n-35$ or less, when $\mu(f) \leq 32$.
(b) $2 n-11$ or less, when $\mu(f) \leq 16$.
(c) $n-5$ or less, when $\mu(f) \leq 8(n=3 r)$.
(d) $n-4$ or less, when $\mu(f) \leq 8(n \neq 3 r)$.
3. Chapter 6 showed a method to reduce the number of LUTs using nonstrict encoding in a functional decomposition. This method is a way to find a nondisjoint decomposition.
4. Chapter 7 showed various functions with small C-measures. These include symmetric functions, sparse functions, LPM functions, segment index encoder functions, and WS functions. Thus, these functions can be efficiently realized by a cascade-based method.
5. Chapter 8 derived upper bounds on the column multiplicity of a decomposition chart for logic functions with weight $u$. The number of 6-LUTs needed to realize an $n$ variable function is:
(a) 10 or less, when $n=9$ and $u \leq 55$.
(b) 15 or less, when $n=10$ and $u \leq 47$.
(c) 15 or less, when $n=10$ and $u \leq 64$, for most functions.
6. Chapter 9 introduced index generation functions, and showed their applications. An index generation function with weight $k$ can be realized by $(p, q)$-elements, where a $(p, q)$-element is an LUT with $p=\left\lceil\log _{2}(k+1)\right\rceil+1$ inputs and $q$ outputs, where $p>q$.
7. Chapter 10 showed three methods to realize index generation functions: the hybrid method, the super hybrid method, and the parallel sieve method. In these methods:
(a) The output values for nonregistered inputs are set to don't cares, and an incompletely specified logic function is realized by a main memory.
(b) The output of the main memory is verified by an auxiliary memory. If the output is correct, the circuit produces it as is; otherwise the outputs are set to $00 \ldots 0$.
(c) A hash circuit is used to randomize the distribution of registered vectors.
(d) A main memory is used to realize most of the registered vectors.
(e) To realize the remaining vectors, the same method is recursively used until the final stage is sufficiently small.
8. Chapter 11 showed a reduction method of the number of variables to represent incompletely specified functions. It was shown that most index generation functions with weight $k$ can be represented with $p=2\left\lceil\log _{2}(k+1)\right\rceil-1$ or fewer variables. A method was introduced to reduce the number of variables by using a linear transformation of the input variables.
9. Chapter 12 reviewed various methods to realize index generation functions.

## Solutions

## Problems of Chapter 2

2.1 In the CAM, if two or more rows match, then the priority encoder selects the first one. In the PLA, the product for the first row is not necessary to implement, since the outputs are $(0,0)$. Also, note that the pattern of the top row is disjoint from other rows, and it can be ignored. The second pattern (row) and the third pattern (row) intersect. When $\left(x_{1}, x_{2}, x_{3}, x_{4}\right)=(0,1,1,0)$, both the second and the third patterns match. However, the second pattern has the priority. So, we have to modify the third pattern to make these patterns disjoint. We have to implement the PLA shown in Table 13.1.
2.2 A synchronous memory requires one clock pulse to read out the data. So, it is unsuitable for combinational circuits. On the other hand, asynchronous memory requires no clock pulse to read out the data. The data can be read out immediately. So, it can be used as an ordinary logic gate.
2.3 With the miniaturization of transistors and interconnections in LSIs, the delay time of the transistors reduces, while that of the interconnection increases. Also, the number of logic devices that can be included in an FPGA increases. Thus, the reduction of the delay time is more important than the reduction of the logic elements. To reduce the delay time, increasing the number of inputs to an LUT is the most effective.
2.4 To implement NOR or NAND in CMOS, transistors must be connected in series. A series connection of up to four transistors is permitted and it still maintains the performance. Thus, when the number of inputs to a PLA is large, a static CMOS circuit is hard to implement. Dynamic CMOS is a method to implement a logic function that is small and dissipates relatively less power.

## 2.5

1. Form a blank truth table of $n$ inputs and $m$ outputs, where $m=\left\lceil\log _{2} W\right\rceil$.
2. From the top word to the bottom word, fill in the truth table entries.
3. From the truth table, obtain a simplified sum-of-products expression.
4. Program the PLA.

Table 13.1 Pattern for PLA

| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $x_{5}$ | $f_{1}$ | $f_{0}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 1 | 1 | 0 | - | 0 | 1 |
| 0 | 1 | 1 | 1 | - | 1 | 0 |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 |

## 2.6

1. Form the truth table from the PLA.
2. Form a disjoint sum-of-products expression. Reduce the number of products, if possible.
3. For each product, form a CAM word.

## Problems of Chapter 3

3.1 The left-hand-side is

$$
\begin{aligned}
& S B(4,1) \cdot S B(4,2) \\
& \quad=\left(x_{1} \oplus x_{2} \oplus x_{3} \oplus x_{4}\right) \cdot\left(x_{1} x_{2} \oplus x_{1} x_{3} \oplus x_{1} x_{4} \oplus x_{2} x_{3} \oplus x_{2} x_{4} \oplus x_{3} x_{4}\right)
\end{aligned}
$$

The right-hand-side is

$$
S B(4,3)=x_{1} x_{2} x_{3} \oplus x_{1} x_{2} x_{4} \oplus x_{1} x_{3} x_{4} \oplus x_{2} x_{3} x_{4}
$$

The straightforward expansion of the left-hand-side will produce a complicated expression. So, we use the Shannon's expansion to prove the equality. When, $x_{1}=0$ : The left-hand-side is

$$
\begin{aligned}
& \left(x_{2} \oplus x_{3} \oplus x_{4}\right) \cdot\left(x_{2} x_{3} \oplus x_{2} x_{4} \oplus x_{3} x_{4}\right) \\
& \quad=\left\{\left(x_{2} \oplus x_{3}\right) \oplus x_{4}\right\}\left\{x_{2} x_{3} \oplus\left(x_{2} \oplus x_{3}\right) x_{4}\right\} \\
& \quad=x_{2} x_{3} x_{4} \oplus\left(x_{2} \oplus x_{3}\right) x_{4} \oplus\left(x_{2} \oplus x_{3}\right) x_{4} \\
& \quad=x_{2} x_{3} x_{4} .
\end{aligned}
$$

The right-hand-side is $x_{2} x_{3} x_{4}$.
When, $x_{1}=1$ : The left-hand-side is

$$
\begin{aligned}
& \left(1 \oplus x_{2} \oplus x_{3} \oplus x_{4}\right) \cdot\left(x_{2} \oplus x_{3} \oplus x_{4} \oplus x_{2} x_{3} \oplus x_{2} x_{4} \oplus x_{3} x_{4}\right) \\
& \quad=\left\{\left(x_{2} \oplus x_{3}\right) \oplus \bar{x}_{4}\right\}\left\{\left(x_{2} \oplus x_{3}\right) \oplus x_{2} x_{3} \oplus\left(x_{2} \oplus x_{3}\right) x_{4} \oplus x_{4}\right\} \\
& \quad=\left[\left(x_{2} \oplus x_{3}\right) \oplus\left(x_{2} \oplus x_{3}\right) x_{4} \oplus\left(x_{2} \oplus x_{3}\right) x_{4}\right] \oplus\left[\left(x_{2} \oplus x_{3}\right) \bar{x}_{4} \oplus x_{2} x_{3} \bar{x}_{4}\right] \\
& \quad=\left(x_{2} \oplus x_{3}\right) x_{4} \oplus x_{2} x_{3} \bar{x}_{4}
\end{aligned}
$$

The right-hand-side is

$$
\begin{aligned}
x_{2} x_{3} \oplus x_{2} x_{4} \oplus x_{3} x_{4} \oplus x_{2} x_{3} x_{4} & =\left(x_{2} \oplus x_{3}\right) x_{4} \oplus x_{2} x_{3}\left(1 \oplus x_{4}\right) \\
& =\left(x_{2} \oplus x_{3}\right) x_{4} \oplus x_{2} x_{3} \bar{x}_{4}
\end{aligned}
$$

Therefore, the equality holds.
3.2 For $h$, the number of the functions is at most $2^{2^{k}}$. For $g$, the number of the functions is at most $2^{2^{n-k+1}}$. Thus, the total number of functions is at most

$$
2^{2^{k}} \cdot 2^{2^{n-k+1}}=2^{2^{k}+2^{n-k+1}}
$$

3.3 For $h$, the number of functions is at most $\left(2^{2^{k}}\right)^{m}=2^{m 2^{k}}$. For $g$, the number of functions is at most $2^{2^{n-k+m}}$. Thus, the total number of functions is at most

$$
2^{m 2^{k}} \cdot 2^{2^{n-k+m}}=2^{m 2^{k}+2^{n-k+m}} .
$$

3.4 When $X_{1}=\left(x_{1}, x_{2}, x_{3}\right)$, as shown in Fig. 13.1, the column multiplicity is four.
3.5 In an $n$ variable function, the total number of possible input combination is $2^{n}$. Thus, the number of functions with weight $k$ is

$$
\binom{2^{n}}{k}
$$

3.6 Note that

$$
\begin{aligned}
& S_{6}^{9}=\bar{y}_{3} y_{2} y_{1} \bar{y}_{0}, \\
& S_{5}^{9}=\bar{y}_{3} y_{2} \bar{y}_{1} y_{0}, \\
& S_{4}^{9}=\bar{y}_{3} y_{2} \bar{y}_{1} \bar{y}_{0}, \text { and } \\
& S_{3}^{9}=\bar{y}_{3} \bar{y}_{2} y_{1} y_{0} .
\end{aligned}
$$

Fig. 13.1 Decomposition chart for $W G T 5$

|  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $x_{3}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $x_{2}$ |
|  |  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | $x_{1}$ |
| 0 | 0 | 0 | 1 | 1 | 2 | 1 | 2 | 2 | 3 |  |
| 0 | 1 | 1 | 2 | 2 | 3 | 2 | 3 | 3 | 4 |  |
| 1 | 0 | 1 | 2 | 2 | 3 | 2 | 3 | 3 | 4 |  |
| 1 | 1 | 2 | 3 | 3 | 4 | 3 | 4 | 4 | 5 |  |
| $x_{5}$ | $x_{4}$ |  |  |  |  |  |  |  |  |  |

Table 13.2 Condensed decomposition chart for SYM12

|  | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | $X_{1}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |  |
| 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |  |
| 2 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |  |
| 3 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |  |
| $X_{2}$ |  |  |  |  |  |  |  |  |  |  |  |

Thus, we have

$$
\begin{aligned}
S_{\{3,4,5,6\}}^{9} & =\bar{y}_{3} y_{2} y_{1} \bar{y}_{0} \vee \bar{y}_{3} y_{2} \bar{y}_{1} y_{0} \vee \bar{y}_{3} y_{2} \bar{y}_{1} \bar{y}_{0} \vee \bar{y}_{3} \bar{y}_{2} y_{1} y_{0} \\
& =\bar{y}_{3}\left(y_{2}\left(\bar{y}_{1} \vee \bar{y}_{0}\right) \vee \bar{y}_{2} y_{1} y_{0}\right) \\
& =\bar{y}_{3}\left(y_{2} \oplus y_{1} y_{0}\right) .
\end{aligned}
$$

3.7 Consider the decomposition chart of SYM12. Let $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{8}\right)$ and $X_{2}=\left(x_{9}, x_{10}, x_{11}, x_{12}\right)$. Since it is a symmetric function, we can write a condensed decomposition chart shown in Table 13.2 instead of the full table. Note that the labels show the weights of $X_{1}$ or $X_{2}$. Thus, for example, the column 4 represents $\binom{9}{4}=126$ columns of weight 4 . It is clear that the number of different column patterns is 8 .

## Problems of Chapter 4

4.1 Since $\mu_{6}=16, f$ can be realized by a circuit shown in Fig. 13.2. Note that the first block has 6 inputs and $\left\lceil\log _{2}(16)\right\rceil=4$ outputs. Thus, the second block has $4+4=8$ inputs. Also, note that an arbitrary 8 -variable function can be realized with at most five 6 -LUTs. Thus, $f$ can be realized with $4+5=9$ LUTs.
4.2 Consider the expansion of the function

$$
f\left(X_{1}, X_{2}\right)=\bigvee_{\vec{a}_{i} \in B^{4}} f\left(\vec{a}_{i}, X_{2}\right) X_{1}^{\vec{a}_{i}}
$$

Among 16 subfunctions $f\left(\vec{a}_{i}, X_{2}\right), \vec{a}_{i} \in B^{4}$, only 10 functions are distinct. So, $f$ can be realized as Fig. 13.3. Note that 10 LUTs are used to realize these 10 different subfunctions, while 5 LUTs are used to realize a 4-MUX.
4.3 From (4.2) and (4.3), we have the following relations:

$$
\begin{aligned}
& g_{0}(Y)=f_{0}(Y) \\
& g_{1}(Y)=f_{0}(Y) \oplus f_{2}(Y)
\end{aligned}
$$

Fig. 13.2 Realization of 10 -variable function

Fig. 13.3 Realization of 10 -variable function with 6-LUTs


Fig. 13.4 Reed-Muller type module


$$
\begin{aligned}
& g_{2}(Y)=f_{0}(Y) \oplus f_{1}(Y) \\
& g_{3}(Y)=f_{0}(Y) \oplus f_{1}(Y) \oplus f_{2}(Y) \oplus f_{3}(Y)
\end{aligned}
$$

Thus, instead of the Shannon's expansion, we can also use the Reed-Muller expansion to realize the function. The universal logic element for this expansion is shown in Fig. 13.4, which can be implemented by a 6-LUT.
4.4 The function $f\left(X_{1}, X_{2}\right)$ can be expanded as

$$
f\left(X_{1}, X_{2}\right)=g_{0}\left(\vec{a}_{0}, X_{2}\right) \vee g_{1}\left(\vec{a}_{1}, X_{2}\right) \vee \cdots \vee g_{4^{k}-1}\left(\vec{a}_{4^{k}-1}, X_{2}\right),
$$

where $\vec{a}_{0}=(0,0, \ldots, 0,0), \vec{a}_{1}=(0,0, \ldots, 0,1), \ldots, \vec{a}_{15}=(1,1, \ldots, 1,1)$. Thus, $f\left(X_{1}, X_{2}\right)$ can be realized as the circuit similar to Fig.4.12. Since the column
multiplicity is $\mu$, the number of different column functions $g_{i}\left(\vec{a}_{i}, X_{2}\right)$ is $\mu$. So, in Fig. 4.12, the LUTs producing the same functions can be shared, and only $\mu$ LUTs are sufficient to produce $g_{i}\left(\vec{a}_{i}, X_{2}\right)$. The number of 6-LUTs to realize ( $2 k$ )-MUX is $\frac{4^{k}-1}{3}$. Thus, we need at most $\mu+\frac{4^{k}-1}{3}$ LUTs.

## Problems of Chapter 5

5.1 Tree-type Realization. Advantage: fast. Disadvantage : layout is complex.

Cascade realization. Advantage: layout is simple. Disadvantage : slow.
5.2 Figure 13.5 shows a realization of a 4-bit adder by 6-LUTs. The expressions for the outputs are:

$$
\begin{aligned}
& z_{0}=x_{0} \oplus y_{0} \\
& z_{1}=\left(x_{1} \oplus y_{1}\right) \oplus\left(x_{0} y_{0}\right) \\
& z_{2}=\left(x_{2} \oplus y_{2}\right) \oplus\left(x_{1} y_{1}\right) \oplus\left(x_{1} \oplus y_{1}\right)\left(x_{0} y_{0}\right) \\
& c_{2}=\left(x_{2} y_{2}\right) \oplus\left(x_{2} \oplus y_{2}\right)\left(x_{1} y_{1}\right) \oplus\left(x_{2} \oplus y_{2}\right)\left(x_{1} \oplus y_{1}\right)\left(x_{0} y_{0}\right) \\
& z_{3}=\left(x_{3} \oplus y_{3}\right) \oplus c_{2} \\
& z_{4}=\left(x_{3} y_{3}\right) \oplus c_{2}\left(x_{3} \oplus y_{3}\right)
\end{aligned}
$$

In total, it uses 6 LUTs.
5.3 The outputs of the circuits are $y_{3}=S B(12,8), y_{2}=S B(12,4), y_{1}=S B(12,2)$, and $y_{0}=S B(12,1)$. Each function $y_{i}$ is a symmetric function of 12 variables.


Fig. 13.5 Realization of 4-bit adder by 6-LUTs

Fig. 13.6 Tree-type realization of WGT12 using 6-LUTs


Fig. 13.7 Cascade realization of WGT12 using 6-LUTs


Fig. 13.8 Cascade realization of 7 -variable functions by 6-LUTs


Tree-type realization. The input variables $X=\left(x_{1}, x_{2}, \ldots, x_{12}\right)$ are partitioned into $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{6}\right)$ and $X_{2}=\left(x_{7}, x_{8}, \ldots, x_{12}\right)$. Each output can be implemented by seven 6-LUTs as shown in Fig. 5.9. Thus, WGT12 can be implemented by $7 \times 4=28$ LUTs. However, when the function is decomposed as $g\left(h_{1}\left(X_{1}\right), h_{2}\left(X_{2}\right)\right)$ as shown in Fig. 13.6, the number of LUTs is reduced to $3+3+4=10$.

Cascade realization. The input variables $X=\left(x_{1}, x_{2}, \ldots, x_{12}\right)$ are partitioned into $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{6}\right), X_{2}=\left(x_{7}, x_{8}, x_{9}\right), X_{3}=\left(x_{10}, x_{11}\right)$, and $X_{4}=\left(x_{12}\right)$.

When $X_{1}$ denotes the bound variables, $\mu_{6}=7$. When ( $X_{1}, X_{2}$ ) denotes the bound variables, $\mu_{9}=10$. When $\left(X_{1}, X_{2}, X_{3}\right)$ denotes the bound variables, $\mu_{11}=12$. Figure 13.7 shows the realization. In this case, the total number of LUTs is $3+4+4+4=15$.
5.4 Figure 13.8 shows the cascade realization of 7 -variable functions. First, each function is decomposed as $f\left(X_{1}, x_{7}\right)=g\left(h\left(X_{1}\right), x_{7}\right)$, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{6}\right)$. Note that, the column multiplicity is at most 40. So, the outputs of $h$ can be encoded

Table 13.3 Encoding for intermediate functions

| $y_{1}$ | $y_{2}$ | $y_{3}$ | $y_{4}$ | $y_{5}$ | $y_{6}$ |
| :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 0 | 1 | 0 |
| . | . | . | . | . | . |
| 0 | 1 | 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 0 | 1 | 0 |
| 1 | 0 | 0 | 0 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 | 1 | 0 |
| 1 | 0 | 0 | 1 | 1 | 1 |

by 6 bits as shown in Table 13.3. Note that in the first 32 codes, $y_{1}=0$, while in the remaining 8 codes, $y_{1}=1$. Let the function with $y_{1}=0$ be $f_{0}\left(X_{1}, x_{7}\right)$, and let the function with $y_{1}=1$ be $f_{1}\left(X_{1}, x_{7}\right)$. Then, $f$ can be represented by $f=\overline{y_{1}} f_{0}\left(X_{1}, x_{7}\right) \vee y_{1} f_{1}\left(X_{1}, x_{7}\right)$, where $y_{1}$ denotes the first bit. The LUTs in the middle column realize $f_{0}\left(X_{1}, x_{7}\right)$ and $f_{1}\left(X_{1}, x_{7}\right)$. The LUTs in the output column realize $f\left(X_{1}, x_{7}\right)$. Note that this realization requires 12 LUTs. It is possible to realize each function independently by using Lemma 4.2.1, which requires only $3 \times 3=9$ LUTs .
5.5 Let $\left(X_{1}, X_{2}\right)$ be a partition of the input variables, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{k}\right)$ and $X_{2}=\left(x_{k+1}, x_{k+2}, \ldots, x_{n}\right)$. Consider the decomposition chart where $X_{1}$ denotes the bound variables. The number of different column patterns is $\mu_{k}(n)$. Next, consider the decomposition chart with the partition $\left(\hat{X}_{1}, \hat{X}_{2}\right)$, where $\hat{X}_{1}=$ $\left(x_{1}, x_{2}, \ldots, x_{k}, x_{k+1}\right)$ and $\hat{X}_{2}=\left(x_{k+2}, x_{k+3}, \ldots, x_{n}\right)$. In this chart, the number of the columns is halved, while the number of the rows is doubled. The number of different column patterns is at most $2 \mu_{k}(n)$, since the column functions of the original decomposition chart are split into two parts, where the upper and the lower parts have at most $\mu_{k}(n)$ patterns. Thus, we have the first relation.

Next, consider the decomposition chart with the partition ( $\left.\check{X}_{1}, \check{X}_{2}\right)$, where $\check{X}_{1}=$ $\left(x_{1}, x_{2}, \ldots, x_{k-2}, x_{k-1}\right)$ and $\check{X}_{2}=\left(x_{k}, x_{k+1}, \ldots, x_{n}\right)$. In this case, the number of the columns is reduced by half, while the number of the rows is increased by two. The number of different column patterns is at most $\mu_{k}^{2}(n)$, since the upper (lower parts) of the column functions of the new decomposition chart have at most $\mu_{k}(n)$ patterns. Thus, we have the second relation.
5.6 Consider the decomposition chart for $f\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, x_{3}\right.$, $\left.x_{4}, x_{5}\right)$ and $X_{2}=\left(x_{6}, x_{7}, x_{8}\right)$. The number of columns is $2^{5}=32$, and the number of rows is $2^{3}=8$. Consider a function whose column multiplicity is 32 . Since the
column functions are 3 -variable functions, there are $2^{2^{3}}=256$ candidate functions. The number of ways to choose 32 different column functions is

$$
\binom{256}{32}
$$

Next, the number of ways to order 32 columns is 32 ! So, the total number of functions is

$$
\binom{256}{32} \cdot(32!)=(256!) /(224!)
$$

## Problems of Chapter 6

6.1 Consider the decomposition $f\left(X_{1}, X_{2}\right)=g\left(h\left(X_{1}\right), X_{2}\right)$. The column multiplicity is $\mu_{4}=5$. So, we need three intermediate variables $\left(h_{3}, h_{2}, h_{1}\right)$. Note that $x_{1} \Psi_{0}=x_{1} \Psi_{1}=0, \bar{x}_{1} \Psi_{2}=\bar{x}_{1} \Psi_{3}=0, x_{3} \Psi_{0}=x_{3} \Psi_{1}=x_{3} \Psi_{2}=x_{3} \Psi_{3}=0$, $\bar{x}_{3} \Psi_{4}=0$.

To $\bar{x}_{1} \Psi_{0}$ assign the code 000 , to $\bar{x}_{1} \Psi_{1}$ assign the code 001 , to $\bar{x}_{1} \Psi_{4}$ assign the code 010 , to $x_{1} \Psi_{4}$ assign the code 110, to $x_{1} \Psi_{2}$ assign the code 100 , to $x_{1} \Psi_{3}$ assign the code 101 .
From these, we have intermediate variables:

$$
\begin{aligned}
& h_{3}=x_{1}\left(\Psi_{2} \vee \Psi_{3} \vee \Psi_{4}\right)=x_{1} \\
& h_{2}=\bar{x}_{1} \Psi_{4} \vee x_{1} \Psi_{4}=\Psi_{4}=x_{3} \\
& h_{1}=\bar{x}_{1} \Psi_{1} \vee x_{1} \Psi_{3}=\bar{x}_{1} x_{2} \bar{x}_{3} \vee x_{1} \bar{x}_{3} \bar{x}_{4} .
\end{aligned}
$$

Note that, in this case, two variables can be simplified.
6.2 First, consider the decomposition $f(X)=g\left(h\left(X_{1}\right), X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}\right.$, $\left.x_{3}, x_{4}, x_{5}\right)$, and $X_{2}=\left(x_{6}, x_{7}, x_{8}\right)$. The equivalence classes of the decompositions are

$$
\begin{aligned}
& \Psi_{0}=\bar{x}_{1} \bar{x}_{2} \bar{x}_{3} \bar{x}_{4} \bar{x}_{5}, \\
& \Psi_{1}=x_{1} x_{2} x_{3} x_{4} x_{5}, \\
& \Psi_{2}=\overline{\Psi_{0} \vee \Psi_{1}} .
\end{aligned}
$$

Since $\mu=3$, a standard encoding requires $u=\left\lceil\log _{2} 3\right\rceil=2$ intermediate variables. $\Psi_{i}$ and $x_{1}$ satisfy the condition of Theorem 6.3.1: $\bar{x}_{1} \Psi_{1}=0$, and $x_{1} \Psi_{0}=0$.

> To $\bar{x}_{1} \Psi_{0}$, assign the code 00.
> To $\bar{x}_{1} \Psi_{2}$, assign the code 01.
> To $x_{1} \Psi_{2}$, assign the code 11.
> To $x_{1} \Psi_{1}$, assign the code 10.

Thus, we have the following intermediate variables:

$$
\begin{aligned}
& h_{1}=\bar{x}_{1} \Psi_{2} \vee x_{1} \Psi_{2}=\Psi_{2}, \\
& h_{2}=x_{1} \Psi_{1} \vee x_{1} \Psi_{2}=x_{1} .
\end{aligned}
$$

Note that

$$
\begin{aligned}
\bar{h}_{2} \bar{h}_{1} & =\Psi_{0}, \\
h_{2} \bar{h}_{1} & =\Psi_{1} .
\end{aligned}
$$

Therefore, $\Psi_{i}(i=0,1,2)$ can be represented by $\left(h_{2}, h_{1}\right)$. Thus, $f$ can be represented as

$$
\begin{aligned}
f(X) & =\Psi_{0} \bar{x}_{6} \bar{x}_{7} \bar{x}_{8} \vee \Psi_{1} x_{6} x_{7} x_{8} \\
& =\bar{x}_{1} \bar{h}_{1} \bar{x}_{6} \bar{x}_{7} \bar{x}_{8} \vee x_{1} \bar{h}_{1} x_{6} x_{7} x_{8}
\end{aligned}
$$

Figure 13.9 shows the realization using 5-LUTs.
6.3 Consider the decomposition

$$
f\left(X_{1}, X_{2}\right)=g\left(h_{1}\left(X_{1}\right), h_{2}\left(X_{1}\right), \ldots, h_{u}\left(X_{1}\right), X_{2}\right)
$$

Let $\Psi_{i}\left(X_{1}\right),(i=0,1, \ldots, \mu-1)$ be the equivalence classes of the decomposition. Let $x_{j}, x_{k}, \in X_{1}$. If the number of different nonzero functions in each of

$$
\begin{aligned}
& x_{j} x_{k} \Psi_{i}, \\
& x_{j} \bar{x}_{k} \Psi_{i},
\end{aligned}
$$

Fig. 13.9 Realization of symmetric function by 5-LUTs


$$
\begin{aligned}
& \bar{x}_{j} x_{k} \Psi_{i}, \\
& \bar{x}_{j} \bar{x}_{k} \Psi_{i},
\end{aligned}
$$

$(i=1,2, \ldots, \mu)$ is equal to or less than $2^{u-2}$, then $h_{u}\left(X_{1}\right)$ and $h_{u-1}\left(X_{1}\right)$ can be represented as follows: $h_{u}\left(X_{1}\right)=x_{j}$ and $h_{u-1}\left(X_{1}\right)=x_{k}$.

## Problems of Chapter 7

7.1 From Lemma 7.2.1, we have

$$
\left(\mu_{1}, \mu_{2}, \mu_{3}, \mu_{4}, \mu_{5}, \mu_{6}, \mu_{7}, \mu_{8}, \mu_{9}\right)=(2,3,4,5,6,7,8,4,2) .
$$

7.2 When $X_{1}=\left(x_{2}, x_{1}, x_{0}\right)$, as shown in Fig. 13.10, the column multiplicity is 8 . When $X_{1}=\left(x_{3}, x_{2}, x_{1}, x_{0}\right)$, as shown in Fig. 13.11, the column multiplicity is 11 .
7.3 Suppose that $X$ is partitioned as $\left(X_{2}, X_{1}\right)$, where $X_{1}=\left(x_{n-p-1}, x_{n-p-2}\right.$, $\left.\ldots, x_{0}\right)$ and $X_{2}=\left(x_{n-1}, x_{n-2}, \ldots, x_{n-p}\right)$. We assume that the integer represented by $X=\left(x_{n-1}, x_{n-2}, \ldots, x_{1}, x_{0}\right)$ is $\sum_{i=0}^{n-1} 2^{i} x_{i}$. Consider the decomposition chart. Assume that both the columns and rows are labeled in ascending order of the values of $X_{1}$ and $X_{2}$, respectively. In scanning left-to-right and then top-to-bottom, the

Fig. 13.10 Decomposition chart for $f\left(X_{1}, X_{2}\right)$

|  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $x_{2}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $x_{1}$ |
|  |  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | $x_{0}$ |
| 0 | 0 | 0 | 1 | 1 | 1 | 2 | 2 | 2 | 3 |  |
| 0 | 1 | 3 | 3 | 4 | 4 | 4 | 5 | 5 | 5 |  |
| 1 | 0 | 6 | 6 | 6 | 7 | 7 | 7 | 8 | 8 |  |
| 1 | 1 | 8 | 9 | 9 | 9 | 10 | 10 | 10 | 11 |  |
| $x_{4}$ | $x_{3}$ |  |  |  |  |  |  |  |  |  |


|  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | $x_{3}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $x_{2}$ |
|  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $x_{1}$ |
|  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | $x_{0}$ |
| 0 | 0 | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 | 4 | 4 | 4 | 5 | 5 | 5 |  |
| 1 | 6 | 6 | 6 | 7 | 7 | 7 | 8 | 8 | 8 | 9 | 9 | 9 | 10 | 10 | 10 | 11 |  |
| $x_{4}$ |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |

Fig. 13.11 Decomposition chart for $f\left(X_{1}, X_{2}\right)$
values of $g(X)$ will increase from 0 to 1 , and then decrease from 1 to 0 . In the chart, remove the rows with all 0 's and rows with all 1 's. Then, the column patterns of the decomposition chart are $(0,1),(1,1)$, and $(1,0)$. Thus, the column multiplicity is three.
7.4 When the function is represented as an SOP with $k$ products, the C-measure increases exponentially with $k$. For example, when $n=2 k$, consider the function

$$
f=x_{1} x_{k} \vee x_{2} x_{k+1} \vee x_{3} x_{k+3} \vee \cdots \vee x_{k} x_{2 k}
$$

Let $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{k}\right)$ and $X_{2}=\left(x_{k+1}, x_{k+2}, \ldots, x_{2 k}\right)$. In this case, the column multiplicity with respect to the decomposition $\left(X_{1}, X_{2}\right)$ is $2^{k}$. Thus, $\mu(f)=2^{k}$.
7.5 Let $n_{i}$ be the number of variables in $X_{i}$, where $i=1,2,3,4$.

1. Since $f$ is symmetric with respect to $X_{1}, X_{2}$, and $X_{3}$, the number of different column patterns is at most

$$
\left(n_{1}+1\right) \times\left(n_{2}+1\right)=12 .
$$

2. Since $f$ is symmetric with respect to $X_{4}$, the column functions are also symmetric. The number of symmetric functions of $n_{4}$ variables is $2^{n_{4}+1}=8$. Thus, the number of different column patterns is at most 8.
7.6 When $X_{1}$ is the set of bound variables, the column functions have $\mu_{1}$ different patterns. When $\left(X_{1}, X_{2}\right)$ is the set of bound variables, the number of different column patters is at most $\mu_{1} \cdot \mu_{2}$.
7.7 Consider the case of $n=8$. Table 13.4 shows the function of the priority encoder. Note that this is an LPM function. Thus, the C-measure is at most $8+1=9$. In general, we can show that the C -measure of the priority encoder function of $n$ variables is $n+1$.
7.8 Table 13.5 shows the decomposition chart of the WS function, while Table 13.6 shows the decomposition chart of the threshold function. The column multiplicity is four.

Table 13.4 Priority encoder ( $n=8$ )

| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{4}$ | $x_{5}$ | $x_{6}$ | $x_{7}$ | $x_{8}$ | output |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | - | - | - | - | - | - | - | 0001 |
| 0 | 1 | - | - | - | - | - | - | 0010 |
| 0 | 0 | 1 | - | - | - | - | - | 0011 |
| 0 | 0 | 0 | 1 | - | - | - | - | 0100 |
| 0 | 0 | 0 | 0 | 1 | - | - | - | 0101 |
| 0 | 0 | 0 | 0 | 0 | 1 | - | - | 0110 |
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | - | 0111 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1000 |

Table 13.5 A decomposition chart of a WS function

|  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $x_{2}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $x_{2}$ |
|  |  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | $x_{0}$ |
| 0 | 0 | 0 | 1 | 2 | 3 | 3 | 4 | 5 | 6 |  |
| 0 | 1 | 4 | 5 | 6 | 7 | 7 | 8 | 9 | 10 |  |
| 1 | 0 | 5 | 6 | 7 | 8 | 8 | 9 | 10 | 11 |  |
| 1 | 1 | 9 | 10 | 11 | 12 | 12 | 13 | 14 | 15 |  |
| $x_{4}$ | $x_{3}$ |  |  |  |  |  |  |  |  |  |

Table 13.6 A decomposition chart of a threshold function

|  |  | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | $x_{2}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  |  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | $x_{2}$ |
|  |  | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | $x_{0}$ |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |  |
| 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |  |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |  |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |  |
| $x_{4}$ | $x_{3}$ |  |  |  |  |  |  |  |  |  |

7.9 Let the input variables $X=\left(x_{31}, x_{30}, \ldots, x_{1}, x_{0}\right)$ be partitioned into

$$
\begin{aligned}
& X_{1}=\left(x_{31}, x_{30}, \ldots, x_{25}, x_{24}\right) \\
& X_{2}=\left(x_{23}, x_{22}, \ldots, x_{17}, x_{16}\right) \\
& X_{3}=\left(x_{15}, x_{14}, \ldots, x_{9}, x_{8}\right), \text { and } \\
& X_{4}=\left(x_{7}, x_{6}, \ldots, x_{1}, x_{0}\right)
\end{aligned}
$$

Then,

$$
f(X)=\left(\sum_{i=0}^{31} 2^{i} x_{i}\right)(\bmod 17)
$$

is implemented as

$$
f(X)=\left(h_{1}\left(X_{1}\right)+h_{2}\left(X_{2}\right)+h_{3}\left(X_{3}\right)+h_{4}\left(X_{4}\right)\right)(\bmod 17)
$$

Figure 13.12 shows the circuit to compute the modulo. The circuit A1 computes

$$
\begin{aligned}
h_{1}\left(X_{1}\right) & =\left(\sum_{i=24}^{31} 2^{i} x_{i}\right)(\bmod 17) \\
& =\left(2^{24} \sum_{i=0}^{7} 2^{i} x_{i+24}\right)(\bmod 17) \\
& =\left(\sum_{i=0}^{7} 2^{i} x_{i+24}\right)(\bmod 17)
\end{aligned}
$$

Fig. 13.12 Mod 17 circuit with LUTs


The circuit A2 computes

$$
\begin{aligned}
h_{2}\left(X_{2}\right) & =\left(\sum_{i=16}^{23} 2^{i} x_{i}\right)(\bmod 17) \\
& =\left(2^{16} \sum_{i=0}^{7} 2^{i} x_{i+16}\right)(\bmod 17) \\
& =\left(\sum_{i=0}^{7} 2^{i} x_{i+16}\right)(\bmod 17) .
\end{aligned}
$$

The circuit A3 computes

$$
\begin{aligned}
h_{3}\left(X_{3}\right) & =\left(\sum_{i=8}^{15} 2^{i} x_{i}\right)(\bmod 17) \\
& =\left(2^{8} \sum_{i=0}^{7} 2^{i} x_{i+8}\right)(\bmod 17) \\
& =\left(\sum_{i=0}^{7} 2^{i} x_{i+8}\right)(\bmod 17) .
\end{aligned}
$$

The circuit A4 computes

$$
h_{4}\left(X_{4}\right)=\left(\sum_{i=0}^{7} 2^{i} x_{i}\right)(\bmod 17)
$$

The circuits A5, A6, and A7 compute

$$
\begin{aligned}
& \left(h_{1}+h_{2}\right)(\bmod 17), \\
& \left(h_{3}+h_{4}\right)(\bmod 17),
\end{aligned}
$$

and

$$
f(X)
$$

respectively. Note that in the above equations, we used the relations:

$$
2^{24} \equiv 2^{16} \equiv 2^{8} \equiv 1(\bmod 17)
$$

7.10 The proofs are easily derived from the relations:

$$
\begin{aligned}
& 2^{k} \equiv-1 \quad\left(\bmod 2^{k}+1\right) \\
& 2^{k} \equiv 1 \quad\left(\bmod 2^{k}-1\right) .
\end{aligned}
$$

7.11 Consider the decomposition table $\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{k}\right)$, and $X_{2}=\left(x_{k+1}, x_{k+2}, \ldots, x_{n}\right)$. Consider two columns: $\left(a_{1}, a_{2}, \ldots, a_{k}\right)$ and $\left(b_{1}, b_{2}, \ldots, b_{k}\right)$. If

$$
\sum_{i=1}^{k} a_{i} w_{i}=\sum_{i=1}^{k} b_{i} w_{i}
$$

then, two columns have the same patterns due to the definition of the threshold function. Thus, the column multiplicity is at most $1+\sum_{i=1}^{k} w_{i}$.

## Problems of Chapter 8

8.1 From Table 8.3, we can conjecture that $\lambda(8,20) \leq 16$, since $\lambda(8,23)=16$. Consider the case when the column multiplicity is maximum.
8 columns have weight 1,6 columns have weight 2, and all other columns have weight 0 .
Thus, the column multiplicity is $8+6+1=15$. In this way, we have $\lambda(8,20)=15$.
8.2 First, convert the SOP into a disjoint SOP. Then, obtain the sum of volumes for the cubes.
8.3 Let $g_{i}\left(y_{1}, y_{2}, \ldots, y_{n}\right)$, where $\left(i=0,1, \ldots, 2^{2^{n}}-1\right)$, be $2^{2^{n}}$ different functions of $n$ variables. Let the function $f\left(x_{1}, x_{2}, \ldots, x_{2^{n}}, y_{1}, y_{2}, \ldots, y_{n}\right)$ represent $g_{i}\left(y_{1}, y_{2}, \ldots, y_{n}\right)$, when $\left(x_{1}, x_{2}, \ldots, x_{2^{n}}\right)$ denotes the integer $i$, where $0 \leq i \leq 2^{2^{n}}-1$. Consider the decomposition chart of $f(X, Y)$, where $X=\left(x_{1}\right.$, $\left.x_{2}, \ldots, x_{2^{n}}\right)$, and $Y=\left(y_{1}, y_{2}, \ldots, y_{n}\right)$. In this case, all the columns represent $2^{2^{n}}$ distinct functions $g_{i}$. Thus, the column multiplicity is $2^{2^{n}}$.

## 8.4

1. 5 LUTs (Fig. 4.9).
2. 4 LUTs (Fig. 5.3).
3. 21 LUTs (Fig. 4.12).
4. 15 LUTs. (Fig. 13.13).

Fig. 13.13 Realization of 10 -variable function with weight 47


Fig. 13.14 Cascade realization of index generation function with 32 inputs


Fig. 13.15 Multilevel realization of index generation function with 32 inputs

## Problems of Chapter 9

### 9.1 Cascade Realization.

Consider Lemma 5.1.5. Let $s$ be the number of cells in the cascade. Note that $n=$ $32, K=8, w=\left\lceil\log _{2}(k+1)\right\rceil=6$. We have:

$$
s \leq\left\lceil\frac{n-w}{K-w}\right\rceil=\frac{32-6}{8-6}=13 .
$$

Figure 13.14 shows the cascade with 13 cells. Minimum-delay Realization: Fig. 13.15 shows the multilevel realization.


Fig. 13.16 LUT Cascade Realization of the index generation function

Fig. 13.17 Index generator using (4, 2)-elements

9.2 Advantage: Memory is cheaper than PLA. Memory dissipates lower power than PLA.

Disadvantage: Single-memory realization is usually too large. Logic design using smaller memories is more complicated than with PLA's.
9.3 The LUT cascade is shown in Fig. 13.16. The total number of bits for the LUTs is

$$
2^{6} \times 5 \times 7=2240 .
$$

On the other hand, the total number of bits for the realization shown in Fig. 9.14 is

$$
2^{6} \times 5 \times 2+2^{10} \times 5=5760
$$

Thus, the realization shown in Fig. 9.14 is faster, but larger than the cascade realization.
9.4 Figure 13.17 shows the circuit.

## Problems of Chapter 10

10.1 Suppose that balls are thrown independently and uniformly into $v$ bins. Then, $\alpha=\frac{1}{v}$ is the probability of having a ball in a particular bin. Also, $\beta=1-\alpha$ is the probability of not having a ball in a particular bin. When, we throw $u$ balls,

1. The probability that a bin has no ball is $\beta^{u}$.
2. The probability that a bin has at least one ball is $1-\beta^{u}$.
3. The probability that a bin has exactly one ball is

$$
\binom{u}{1} \alpha \cdot \beta^{u-1}=u \cdot \alpha \cdot \beta^{u-1} .
$$

4. The probability that a bin bas more than one ball is

$$
\begin{aligned}
\eta & =\left(1-\beta^{u}\right)-u \cdot \alpha \cdot \beta^{u-1} \\
& =1-\beta^{u-1} \cdot(\beta+u \alpha)=1-\beta^{u-1} \cdot(1-\alpha+u \alpha) \\
& =1-\beta^{u-1} \cdot[1+(u-1) \alpha] \\
& =1-(1-\alpha)^{u-1} \cdot[1+(u-1) \alpha] \\
& \simeq 1-\mathrm{e}^{\alpha(u-1)} \cdot[1+(u-1) \alpha] .
\end{aligned}
$$

Here, we used the approximation $1-\alpha \simeq e^{-\alpha}$.
Since there are $v$ bins, the expected number of bins with more than one ball is $\eta v$. When $u=5000$ and $v=9973$, we have $\eta v=903$. When $u=2000$ and $v=9973$, we have $\eta v=175$.
10.2 Since the maximum number is $N=100,000, x$ can be represented with $n=17$ bits. Also, there exist $k_{1}=9592$ prime numbers between 1 and 100,000.

1. In $I G U_{1}$, the number of inputs for the main memory is $p_{1}=q_{1}=\left\lceil\log _{2} \times\right.$ $\left.\left(k_{1}+1\right)\right\rceil=14$. The number of the vectors realized by $I G U_{1}$ is $2^{p_{1}}\left(1-\mathrm{e}^{-\xi_{1}}\right)$, where $\xi_{1}=\frac{k_{1}}{2^{p_{1}}}$; that is $2^{14} \times 0.4431444=7260$. The number of the remaining vectors is $k_{2}=k_{1}-7260=2332$.
2. In $I G U_{2}$, since $q_{2}=\left\lceil\log _{2}(2332+1)\right\rceil=12$, we have $p_{2}=q_{2}=12$. The number of the vectors realized by $I G U_{2}$ is $2^{p_{2}}\left(1-\mathrm{e}^{-\xi_{2}}\right)$, where $\xi_{2}=\frac{k_{2}}{2^{p_{2}}}$; that is $4096 \times 0.4340989=1778$. The number of the remaining vectors is $k_{3}=$ $k_{2}-1778=554$.
3. In $I G U_{3}$, since $q_{3}=\left\lceil\log _{2}(554+1)\right\rceil=10$, we have $p_{3}=q_{3}=10$. The number of vectors realized by $I G U_{3}$ is $2^{p_{3}}\left(1-\mathrm{e}^{-\xi_{3}}\right)$, where $\xi_{3}=\frac{k_{3}}{2^{p_{3}}}$; that is $1024 \times 0.4178433=427$. The number of the remaining vectors is $k_{4}=$ $k_{3}-427=127$.
4. In $I G U_{4}$, since $q_{4}=\left\lceil\log _{2}(127+1)\right\rceil=7$, we have $p_{4}=q_{4}=7$. The number of vectors realized by $I G U_{4}$ is $2^{p_{4}}\left(1-\mathrm{e}^{-\xi_{4}}\right)$, where $\xi_{4}=\frac{k_{4}}{2^{p_{4}}}$; that is $128 \times$ $0.6292353=80$. The number of the remaining vectors is $k_{5}=k_{4}-80=47$.
5. In $I G U_{5}$, since $q_{5}=\left\lceil\log _{2}(47+1)\right\rceil=6$, we have $p_{5}=q_{5}=6$. The number of vectors realized by $I G U_{5}$ is $2^{p_{5}}\left(1-\mathrm{e}^{-\xi_{5}}\right)$, where $\xi_{5}=\frac{k_{5}}{2^{p_{5}}}$; that is $64 \times$ $0.5201948=33$. The number of the remaining vectors is $k_{6}=k_{5}-33=14$.
6. In $I G U_{6}$, since the number of the remaining vectors is only $k_{6}=14$, they can be implemented by an IGU [132], or rewritable PLA or an LUT cascade.

Memory for the IGUs is distributed as follows:

$$
\begin{aligned}
& I G U_{1}: 17 \times 2^{14}=272 \times 2^{10} \\
& I G U_{2}: 17 \times 2^{12}=68 \times 2^{10} \\
& I G U_{3}: 17 \times 2^{10} . \\
& I G U_{4}: 17 \times 2^{7}=2.125 \times 2^{10} \\
& I G U_{5}: 17 \times 2^{6}=1.0625 \times 2^{10}
\end{aligned}
$$

The total amount of memory for the parallel sieve method is

$$
\sum_{i=1}^{5} n 2^{n_{i}}=(272+68+17+2.125+1.0625) \times 2^{10} \simeq 360 \times 2^{10}
$$

bits or 360 Kibits. The single-memory realization requires

$$
14 \times 2^{17}=1.75 \times 2^{20}
$$

bits or 1.75 Mibits.
10.3 In this case, $n=16$ and $k_{1}=3^{8}=6561$. We assume that the probabilities of appearing 0 's and 1 's are made to equal by a hash circuit.

1. In $I G U_{1}$, the number of inputs for the main memory is $p_{1}=q_{1}=\left\lceil\log _{2} \times\right.$ $\left.\left(k_{1}+1\right)\right\rceil=13$. The number of the vectors realized by $I G U_{1}$ is $2^{p_{1}}\left(1-\mathrm{e}^{-\xi_{1}}\right)$, where $\xi_{1}=\frac{k_{1}}{2^{p_{1}}}$; that is $2^{13} \times 0.5510768=4514$. The number of the remaining vectors is $k_{2}=k_{1}-4514=2047$.
2. In $I G U_{2}$, since $q_{2}=\left\lceil\log _{2}(2047+1)\right\rceil=11$, we have $p_{2}=q_{2}=11$. The number of the vectors realized by $I G U_{2}$ is $2^{p_{2}}\left(1-\mathrm{e}^{-\xi_{2}}\right)$, where $\xi_{2}=\frac{k_{2}}{2^{p_{2}}}$; that is $2048 \times 0.6319409=1294$. The number of the remaining vectors is $k_{3}=$ $k_{2}-1294=753$.
3. In $I G U_{3}$, since $q_{3}=\left\lceil\log _{2}(753+1)\right\rceil=10$, we have $p_{3}=q_{3}=10$. The number of vectors realized by $I G U_{3}$ is $2^{p_{3}}\left(1-\mathrm{e}^{-\xi_{3}}\right)$, where $\xi_{3}=\frac{k_{3}}{2^{p_{3}}}$; that is $1024 \times 0.5206631=533$. The number of the remaining vectors is $k_{4}=$ $k_{3}-533=220$.
4. In $I G U_{4}$, since $q_{4}=\left\lceil\log _{2}(220+1)\right\rceil=8$, we have $p_{4}=q_{4}=8$. The number of vectors realized by $I G U_{4}$ is $2^{p_{4}}\left(1-\mathrm{e}^{-\xi_{4}}\right)$, where $\xi_{4}=\frac{k_{4}}{2^{p_{4}}}$; that is $256 \times$ $0.5765734=147$. The number of the remaining vectors is $k_{5}=k_{4}-147=73$.
5. In $I G U_{5}$, since $q_{5}=\left\lceil\log _{2}(73+1)\right\rceil=7$, we have $p_{5}=q_{5}=7$. The number of vectors realized by $I G U_{5}$ is $2^{p_{5}}\left(1-\mathrm{e}^{-\xi_{5}}\right)$, where $\xi_{5}=\frac{k_{5}}{2^{p_{5}}}$; that is $128 \times$ $0.4346513=55$. The number of the remaining vectors is $k_{6}=k_{5}-55=18$.
6. In $I G U_{6}$, since the number of the remaining vectors is only $k_{6}=18$, they can be implemented by an IGU [132], or rewritable PLA or an LUT cascade.

Memory for the IGUs is distributed as follows:

$$
\begin{aligned}
& I G U_{1}: 16 \times 2^{13}=128 \times 2^{10} \\
& I G U_{2}: 16 \times 2^{11}=32 \times 2^{10} \\
& I G U_{3}: 16 \times 2^{10} \\
& I G U_{4}: 16 \times 2^{8}=4 \times 2^{10} \\
& I G U_{5}: 16 \times 2^{7}=2 \times 2^{10} .
\end{aligned}
$$

The total amount of memory for the parallel sieve method is

$$
\sum_{i=1}^{5} n 2^{n_{i}}=(128+32+16+4+2) \times 2^{10}=182 \times 2^{10}
$$

bits or 182 Kibits. The single-memory realization requires

$$
13 \times 2^{16}=0.8125 \times 2^{20}
$$

bits or 0.8125 Mibits.
10.4 Suppose a linear transformation changes an input vector $\vec{a}$ into another vector $\vec{b}$. Let $A$ be the transformation matrix. Then, we have the relation:

$$
\vec{a} A=\vec{b},
$$

where $\vec{a}, \vec{b} \in B^{n}, A$ is a $0-1$ matrix, and the addition is $\bmod 2$ sum. Assume that $A$ is nonsingular. Then, the mapping is 1 -to- 1 . This means that the minterms are permuted by the transform. Thus, the number of 1 's in the original function is the same as in the transformed one.
10.5 When $k=2^{p}, \xi=1$, and $\alpha=\frac{2^{p}}{2^{n}}=\frac{1}{2^{n-p}}$. Let $z=\frac{1}{\alpha}$. Then, we have $z=2^{n-p}$ and $\beta^{2^{n-p}}=\left(1-\frac{1}{z}\right)^{z}$. Note that $e^{-1} \simeq 0.3678794$. Table 13.7 shows the approximation error.
10.6 The function $f$ can be represented by $g\left(y_{1}, y_{2}, y_{3}, y_{4}\right)$ as shown in Table 13.8. In this case, we can assume that only one variable among the inputs $x_{1}, x_{2}, \ldots, x_{15}$ takes value 1 , and other variables take value 0 . Since the function is represented with $p=\left\lceil\log _{2}(k+1)\right\rceil=4$ variables, it is an optimal linear transformation.

Table 13.7 Approximation error for $\beta^{2^{n-p}}$

| $\alpha$ | $\beta^{2^{n-p}}$ | Error |
| :--- | :--- | :--- |
| $2^{-1}$ | 0.25000000 | 0.11787945 |
| $2^{-2}$ | 0.31640625 | 0.05147320 |
| $2^{-3}$ | 0.34360892 | 0.02427053 |
| $2^{-4}$ | 0.35607412 | 0.01180532 |
| $2^{-5}$ | 0.36205530 | 0.00582416 |
| $2^{-6}$ | 0.36498654 | 0.00289293 |
| $2^{-7}$ | 0.36643770 | 0.00144173 |
| $2^{-8}$ | 0.36715975 | 0.00071970 |
| $2^{-9}$ | 0.36751989 | 0.00035956 |
| $2^{-10}$ | 0.36769974 | 0.00017971 |
| $2^{-11}$ | 0.36778960 | 0.00008984 |
| $2^{-12}$ | 0.36783454 | 0.00004492 |
| $2^{-13}$ | 0.36785698 | 0.00002246 |
| $2^{-14}$ | 0.36786821 | 0.00001124 |
| $2^{-15}$ | 0.36787382 | 0.00000562 |

Table 13.8 Transformed
1-out-of-15 to binary converter

| Transformed code |  |  |  |  |
| :--- | :--- | :--- | :--- | :---: |
| $y_{4}$ | $y_{3}$ | $y_{2}$ | $y_{1}$ | Index |
| 0 | 0 | 0 | 1 | 1 |
| 0 | 0 | 1 | 0 | 2 |
| 0 | 0 | 1 | 1 | 3 |
| 0 | 1 | 0 | 0 | 4 |
| 0 | 1 | 0 | 1 | 5 |
| 0 | 1 | 1 | 0 | 6 |
| 0 | 1 | 1 | 1 | 7 |
| 1 | 0 | 0 | 0 | 8 |
| 1 | 0 | 0 | 1 | 9 |
| 1 | 0 | 1 | 0 | 10 |
| 1 | 0 | 1 | 1 | 11 |
| 1 | 1 | 0 | 0 | 12 |
| 1 | 1 | 0 | 1 | 13 |
| 1 | 1 | 1 | 0 | 14 |
| 1 | 1 | 1 | 1 | 15 |

## Problems of Chapter 11

11.1 To solve manually, write a Karnaugh map of the function. Only one variable $x_{5}$, can be removed.
11.2 From ( $\vec{v}_{1}, \vec{v}_{4}$ ), we have $\bar{c} \bar{d} \bar{g}$.

From $\left(\vec{v}_{2}, \vec{v}_{4}\right)$, we have $\bar{b} \bar{d}$.
From $\left(\vec{v}_{3}, \vec{v}_{4}\right)$, we have $\bar{c} \bar{d} \bar{f}$.
From $\left(\vec{v}_{1}, \vec{v}_{5}\right)$, we have $\bar{b} \bar{c} \bar{e} \bar{f}$.
From $\left(\vec{v}_{2}, \vec{v}_{5}\right)$, we have $\bar{e} \bar{f} \bar{g}$.

From $\left(\vec{v}_{3}, \vec{v}_{5}\right)$, we have $\bar{b} \bar{c} \bar{e} \bar{g}$.
From $\left(\vec{v}_{1}, \vec{v}_{6}\right)$, we have $\bar{a} \bar{c} \bar{d} \bar{f}$.
From $\left(\vec{v}_{2}, \vec{v}_{6}\right)$, we have $\bar{a} \bar{b} \bar{d} \bar{f} \bar{g}$.
From $\left(\vec{v}_{3}, \vec{v}_{6}\right)$, we have $\bar{a} \bar{c} \bar{d} \bar{g}$.

$$
\begin{aligned}
\bar{R} & =\bar{c} \bar{d} \bar{g} \vee \bar{b} \bar{d} \vee \bar{c} \bar{d} \bar{f} \vee \bar{b} \bar{c} \bar{e} \bar{f} \vee \bar{e} \bar{f} \bar{g} \vee \bar{b} \bar{c} \bar{e} \bar{g} \vee \bar{a} \bar{c} \bar{d} \bar{f} \vee \bar{a} \bar{b} \bar{d} \bar{f} \bar{g} \vee \bar{a} \bar{c} \bar{d} \bar{g} \\
& =\bar{c} \bar{d} \bar{g} \vee \bar{b} \bar{d} \vee \bar{c} \bar{d} \bar{f} \vee \bar{b} \bar{c} \bar{e} \bar{f} \vee \bar{e} \bar{f} \bar{g} \vee \bar{b} \bar{c} \bar{e} \bar{g} \\
R & =(c \vee d \vee g)(b \vee d)(c \vee d \vee f)(b \vee c \vee e \vee f)(e \vee f \vee g)(b \vee c \vee e \vee g) \\
& =(c \vee d \vee f g)(b \vee c \vee e \vee f g)(b \vee d)(e \vee f \vee g) \\
& =[f g \vee c \vee d(b \vee e)](b \vee d)(e \vee f \vee g) \\
& =(f g \vee c)(b \vee d)(e \vee f \vee g) \vee d(b \vee e)(b \vee d)(e \vee f \vee g) \\
& =(f g \vee c)(b \vee d)(e \vee f \vee g) \vee d[e \vee b(f \vee g)]
\end{aligned}
$$

The product with the minimum literals is $d e$.
The function can be represented by only two variables, $d$ and $e$.

## 11.3

1. Similar to the proof of Theorem 11.5.1, we have

$$
\begin{aligned}
P R & =\frac{256}{256} \times \frac{255}{256} \times \frac{254}{256} \times \frac{253}{256} \times \frac{252}{256} \times \frac{251}{256} \times \frac{250}{256} \\
& =\prod_{i=0}^{6}\left(1-\frac{i}{256}\right) \\
& \simeq e^{-\sum_{i=1}^{6} \frac{i}{256}}=e^{-\frac{21}{256}}=0.9212
\end{aligned}
$$

2. In a similar way, we have

$$
\begin{aligned}
P R & =\frac{128}{128} \times \frac{127}{128} \times \frac{126}{128} \times \frac{125}{128} \times \frac{124}{128} \times \frac{123}{128} \times \frac{122}{128} \\
& =\prod_{i=0}^{6}\left(1-\frac{i}{128}\right) \\
& \simeq e^{-\sum_{i=1}^{6} \frac{i}{128}}=e^{-\frac{21}{128}}=0.8487 .
\end{aligned}
$$

11.4 From the proof of Theorem 11.5.1, we have

$$
\eta(k)=\prod_{i=0}^{k-1}\left(1-\frac{i}{2^{p}}\right)
$$

where $k=31$ and $p=9$.

$$
\eta(k)=\prod_{i=0}^{30}\left(1-\frac{i}{512}\right) \simeq \prod_{i=0}^{30} e^{-\frac{i}{512}}=e^{-\sum_{i=1}^{30} \frac{i}{512}}=e^{-\frac{465}{512}}=0.40
$$

There are 10 different partitions. So, the probability that, in all 10 partitions, at least one column has two or more nonzero elements is $(1-\eta(k))^{10}$. Thus, the probability that at least one variable is redundant is

$$
1-(1-\eta(k))^{10}=0.994
$$

11.5 First, obtain the probability that the function $f(X)$ does not depend on $x_{1}$. Consider the decomposition $\left(X_{1}, X_{2}\right)$, where $X_{1}=\left(x_{2}, x_{3}, \ldots, x_{n}\right)$, and $X_{2}=\left(x_{1}\right)$.

1. The probability that $f$ takes a specified value is $\alpha=\frac{k}{2^{n}}$.
2. The probability that a column has at least one don't care is $1-\alpha^{2}$.
3. The probability that $f$ does not depend on $x_{1}$ is

$$
\gamma=\left(1-\alpha^{2}\right)^{2^{n-1}}
$$

4. The probability that $f$ depends on $x_{1}$ is $1-\gamma$.
5. The probability that $f$ depends on all the variables is $(1-\gamma)^{n}$.

When $n=2 r$ and $k=2^{n}$, we have $\alpha=\frac{2^{r}}{2^{n}}=2^{-r}$. Thus, $\gamma=\left(1-2^{-n}\right)^{2^{n-1}}$.
Since $2^{-n}$ is sufficiently small, $1-2^{-n}$ can be approximated by $e^{-2^{-n}}$. Thus,

$$
\gamma \simeq e^{-2^{-n} \cdot 2^{n-1}}=e^{-\frac{1}{2}} \simeq 0.6065
$$

Thus, the probability that all the variables are essential is

$$
(1-\gamma)^{n} \simeq 0.393^{n}
$$

11.6 Consider the 4 -variable function $f(X)$ such that

$$
\begin{aligned}
f(0,1,1,1) & =1 \\
f(1,0,1,1) & =1 \\
f(1,1,0,1) & =1 \\
f(1,1,1,0) & =1 \\
f(0,1,0,1) & =0 \\
f(1,0,0,1) & =0 \\
f(1,0,1,0) & =0 \\
f(0,1,1,0) & =0 \\
f\left(a_{1}, a_{2}, a_{3}, a_{4}\right) & =d, \quad \text { (for other combinations). }
\end{aligned}
$$

In this case, all the variables are essential.
11.7 Two variables are sufficient to distinguish all the patterns: $\left(x_{1}, x_{7}\right),\left(x_{2}, x_{7}\right)$, $\left(x_{4}, x_{6}\right),\left(x_{4}, x_{7}\right),\left(x_{5}, x_{6}\right)$, or $\left(x_{6}, x_{7}\right)$.
11.8 Let $\left(X_{1}, X_{2}\right)$ be a partition of the input variables $X$, where $X_{1}=\left(x_{1}\right.$, $\left.x_{2}, \ldots, x_{p}\right)$ and $X_{2}=\left(x_{p+1}, x_{p+2}, \ldots, x_{n}\right)$. Consider the decomposition chart for $f\left(X_{1}, X_{2}\right)$, where $X_{1}$ labels the column variables and $X_{2}$ labels the row variables. If each column has at most one care element, then $f$ can be represented by using only $X_{1}$. Such a condition is true if one of the following is satisfied:

1. A column has only don't cares. The probability of this condition is $\beta^{2^{n-p}}$.
2. A column has only one care element, and the other $2^{n-p}-1$ elements have don't cares. The probability of this condition is $2^{n-p} \alpha \beta^{2^{n-p}-1}$.
Thus, we have $\gamma_{n-p}=\beta^{2^{n-p}}+2^{n-p} \alpha \beta^{2^{n-p}-1}$. Since all the $M=2^{p}$ columns must satisfy this condition, we have $\delta_{n-p}=\gamma_{n-p}^{M}$.

## 11.9

- When $p=17, P R=0.0011$.
- When $p=18, P R=0.3817$.
- When $p=19, P R=0.9461$.
11.10 The probability that the birthdays are all different is the same as the probability that each column of the decomposition chart has at most one care element (See the proof of Theorem 11.5.1). Thus,

$$
\eta(k)=\prod_{i=0}^{k-1}\left(1-\frac{i}{N}\right)=\prod_{i=1}^{k-1}\left(1-\frac{i}{N}\right)
$$

where $k=64$ and $N=365$.

$$
\eta(k) \simeq 2.81 \times 10^{-3} .
$$

Consider the distribution of $u$ distinct balls into $v$ distinct bins, where $u=64$ and $v=365$. Assume that any distribution is as likely as any other. We can use the same argument as in the previous exercise. The probability that a bin has at least one ball is $1-\beta^{u}$, where $\beta=1-\alpha$ and $\alpha=\frac{1}{v}$. Note that

$$
1-\beta^{u}=1-(1-\alpha)^{u} \simeq 1-\mathrm{e}^{-\alpha u}
$$

The expected number of bins with at least one ball is

$$
\left(1-\beta^{u}\right) v \simeq\left(1-\mathrm{e}^{-\alpha u}\right) v
$$

By setting $u=64$ and $v=365$, we have 58.7, as the expected number of distinct birthdays.
11.11 We can use the same argument as the problem 11.10. The probability that a bin has at least one ball is $1-\beta^{u}$, where $\beta=1-\alpha$ and $\alpha=\frac{1}{v}$. The expected number of bins with at least one ball is

$$
\left(1-\beta^{u}\right) \cdot v \simeq\left(1-\mathrm{e}^{-\alpha u}\right) v
$$

By setting $u=365$ and $v=365$, we have

$$
\left(1-\mathrm{e}^{-\alpha u}\right) v=\left(1-e^{-1}\right) \times 365=0.632 \times 365=230.724
$$

as the expected number of distinct birthdays.

## Problems of Chapter 12

12.1 Figure 13.18 shows the cascade realization. The amount of memory for the cascade is

$$
64 \times 4 \times 3+32 \times 4=896
$$

Note that the method shown in Example 12.3.1 requires 336 bits.
12.2 Figure 13.19 shows the LUT cascade emulator. The counter has two bits.

1. When the value of the counter is 00 , set the MSBs to 00 . Also, MUX $A$ selects the variables $x_{1}, x_{2}, x_{3}$, and $x_{4}$. Also, MUX $B$ selects $x_{5}$ and $x_{6}$. In this case, Page 0 is used to emulate $\mathrm{Cell}_{0}$.
2. When the value of the counter is 01 , set the MSBs to 01 . Also, MUX $A$ selects the outputs from the memory, which corresponds to the rail outputs. Also, MUX $B$ selects $x_{7}$ and $x_{8}$. In this case, Page 1 is used to emulate Cell $_{1}$.
3. When the value of the counter is 10 , set the MSBs to 10 . Also, MUX $A$ selects the outputs from the memory, which corresponds to the rail outputs. Also, MUX $B$ selects $x_{7}$ and $x_{8}$. In this case, Page 2 is used to emulate Cell $_{2}$.
4. When the value of the counter is 11 , set the MSBs to 11 . Also, MUX $A$ selects the outputs from the memory, which corresponds to the rail outputs. Also, MUX $B$ selects $x_{11}$. In this case, Page 3 is used to emulate $\mathrm{Cell}_{3}$. Since this is the last cell, the output terminal shows the function value.

Fig. 13.18 Cascade
realization of index generator



Fig. 13.19 LUT cascade emulator
Fig. 13.20 Tree-type realization of 2-out-of-12 to binary converter

12.3 Advantage: Since the sizes of the IGUs are the same, the design of the circuit is easy. Also, the delays of the IGUs are the same. Addition of new registered vectors is easier.

Disadvantage: The approach requires more memory than the method presented in Chap. 10.
12.4 The profile of the 2-out-of-12 to binary converter is

$$
(2,4,8,12,17,23,30,38,47,57,67,67)
$$

In this case, the LUT cascade realization is not attractive. To realize a tree-type circuit, we partition the inputs into $X_{1}=\left(x_{1}, x_{2}, \ldots, x_{6}\right)$ and $X_{2}=\left(x_{7}, x_{8}, \ldots, x_{12}\right)$. The column multiplicity with the decomposition with respect to ( $X_{1}, X_{2}$ ) and $\left(X_{2}, X_{1}\right)$ are the same and both equal to 23 . Thus, it can be realized by the treetype circuit shown in Fig. 13.20. By using a linear transformation, the number of variables can be reduced to 7 , which is the optimum.

## Bibliography

> 1. E. Ahmed and J. Rose, "The effect of LUT and cluster size on deep-submicron FPGA performance and density, " Proceedings of the 2000 ACM/SIGDA Eighth International Symposium on Field Programmable Gate Arrays, pp.3-12, Feb. 10-11, 2000, Monterey, California.
2. E. Ahmed and J. Rose, "The effect of LUT and cluster size on deep-submicron FPGA performance and density," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 12, No. 3, March 2004, pp.288-298.
3. C. Akrout, et al., "Reprogrammable logic fuse based on a 6-device SRAM cell for logic arrays," US Patent 5063537.
4. http://www.altera.com
5. Altera, "Implementing high-speed search applications with Altera CAM," Application Note 119, Altera Corporation, July 2001.
6. Altera, "Stratix IV FPGA Core Fabric Architecture," http://www.altera.com/
7. E. Ahmed and J. Rose, "The effect of LUT and cluster size on deep-submicron FPGA performance and density," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 12, No. 3, March 2004, pp.288-298.
8. R. L. Ashenhurst, "The decomposition of switching functions," International Symposium on the Theory of Switching, pp. 74-116, April 1957.
9. Berkeley Logic Synthesis and Verification Group, ABC: A System for Sequential Synthesis and Verification, Release 70911. http://www.eecs.berkeley.edu/ alanmi/abc/
10. V. Bertacco and M. Damiani, "The disjunctive decomposition of logic functions,"ICCAD-97, pp. 78-82, Nov. 1997.
11. T. Bengtsson, A. Martinelli, and E. Dubrova, "A BDD-based fast heuristic algorithm for disjoint decomposition," ASPDAC 2003, pp.191-196.
12. R.E. Bryant,"Graph-based algorithms for Boolean function manipulation," IEEE Trans. Comput. Vol. C-35, No. 8, pp.677-691, Aug. 1986.
13. R. K. Brayton, G. D. Hachtel, C. T. McMullen, and A. L. Sangiovanni-Vincentelli, Logic Minimization Algorithms for VLSI Synthesis, Kluwer Academic Publishers, Boston, 1984.
14. F. M. Brown, Boolean Reasoning: The logic of Boolean Equations, Kluwer Academic Publishers, Boston, 1990.
15. S. D. Brown, R. J. Francis, J. Rose, and Z. G. Vranesic, Field Programmable Gate Arrays, Kluwer Academic Publishers, Boston (1992).
16. S. Brown, "FPGA architectural research: A Survey," IEEE Design \& Test of Computers, vol. 13, no. 4, 1996, pp. 9-15.
17. S-C. Chang, M. Marek-Sadowska, and T. Hwang, "Technology mapping for LUT FPGA's based on decomposition of binary decision diagrams," IEEE Trans. on CAD, Vol. CAD-15, No. , pp. 1226-1236, Oct. 1996.
18. L. Chisvin and R. J. Duckworth, "Content-addressable and associative memory: Alternatives to the ubiquitous RAM, " IEEE Computer, Vol. 22, pp. 51-64, July 1989.
19. G. R. Chiu, D. P. Singh, V. Manohararajah, and S. D. Brown, "Mapping arbitrary logic functions into synchronous embedded memories for area reduction on FPGAs", Proceedings of the International Conference on Computer-aided design, (ICCAD-2006), Nov. 2006, San Jose, California, pp.135-142.
20. J. Cong and Y. Ding, "Combinational logic synthesis for LUT based field programmable gate arrays," ACM Transactions on Design Automation of Electronic Systems (TODAES), Vol.1, No.2, pp.145-204, April 1996.
21. J. Cong and Y.-Y Hwang, "Partially-dependent functional decomposition with applications in FPGA synthesis and mapping," Fifth Int. Symp. on Field-Programmable Gate Arrays, pp. 35-42, Feb. 1997.
22. J. Cong and S. Xu, "Technology mapping for FPGAs with embedded memory blocks,"Feb. 1998, pp.179-188.
23. J. Cong and K. Yan, "Synthesis for FPGAs with embedded memory blocks", In Proc. of the 2000 ACM/SIGDA 8th International Symposium on Field Programmable Gate Arrays, pp. 75-82, ACM Press NY, 2000, Monterey, California.
24. J. Cong and K. Minkovich, "Optimality study of logic synthesis for LUT-Based FPGAs," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 26, Issue 2, Feb. 2007, pp.230-239.
25. H. A. Curtis, A New Approach to the Design of Switching Circuits, D. Van Nostrand Co., Princeton, NJ, 1962.
26. M. Davio, J.-P Deschamps, and A. Thayse, Digital Systems with Algorithm Implementation, John Wiley \& Sons, New York, 1983, p. 368.
27. D. L. Dietmeyer, Logic Design of Digital Systems (Second Edition), Allyn and Bacon Inc., Boston, 1978.
28. C. H. Divine, "Memory patching circuit with increased capability," US Patent 4028679.
29. C. H. Divine and J. C. Moran, "Memory patching circuit with repatching capability," US Patent 4028684.
30. S. Dharmapurikar, P. Krishnamurthy, and D. E. Taylor, "Longest prefix matching using Bloom filters," ACM SIGCOMM'03, August 25-29, 2003, Karlsruhe, Germany.
31. J. Ditmar, K. Torkelsson, and A. Jantsch, "A reconfigurable FPGA-based content addressable memory for internet protocol characterization," Proc. FPL2000, LNCS 1896, Springer, 2000, pp. 19-28.
32. M. Fujita and Y. Matsunaga, "Multi-level logic minimization based on minimal support and its application to the minimization of look-up table type FPGAs," ICCAD-91, pp. 560-563, 1991.
33. R. J. Francis, J. Roze, and Z. Vranesic, "Chortle-crf: Fast technology mapping for lookup table-based FPGAs," DAC-1991, pp. 227-233, June 1991.
34. D. Green, Modern Logic Design, Addison-Wesley Publishing Company,1986.
35. H. Gouji, T. Sasao, and M. Matsuura,"On a method to reduce the number of LUTs in LUT cascades," Technical Report of IEICE, VLD2001-99, Nov. 2001.
36. S. Guccione, D. Levi, and D. Downs,"A reconfigurable content addressable memory," Lecture Notes in Computer Science, Springer-Verlag, Berlin, Volume 1800,May 2000. Parallel and Distributed Processing, p. 882.
37. P. Gupta, S. Lin, and N. McKeown, "Routing lookups in hardware at memory access speeds, " Proc. INFOCOM, IEEE Press, Piscataway, N.J., 1998, pp. 1240-1247.
38. C. Halatsis and N. Gaitanis, "Irredundant normal forms and minimal dependence sets of a Boolean functions," IEEE Trans. on Computers, Vol. C-27, No. 11, pp. 1064-1068, Nov. 1978.
39. S. Hassoun and T. Sasao (eds.), Logic Synthesis and Verification, Kluwer Academic Publishers, Oct. 2001.
40. J-D. Huang, J-Y. Jou, and W-Z. Shen, "Compatible class encoding in Roth-Karp decomposition for two-output LUT architecture," ICCAD 1995, Nov. 1995, pp.359-363.
41. Ting-Ting Hwang, R. M. Owens, M. J. Irwin, and Kuo Hua Wang,"Logic synthesis for fieldprogrammable gate arrays,"IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.,Vol. 13, No. 10, pp. 1280-1287, Oct. 1994.
42. Y. Iguchi, T. Sasao, and M. Matsuura, "Realization of multiple-output functions by reconfigurable cascades," International Conference on Computer Design: VLSI in Computers \& Processors (ICCD-2001), Austin, TX, Sept. 23-26, 2001. pp. 388-393.
43. Y. Iguchi, T. Sasao, and M. Matsuura, "On designs of radix converters using arithmetic decompositions," ISMVL-2006, Singapore, May 17-20, 2006.
44. J.-H. R. Jiang, J.-Y. Jou, and J.-D. Huang, "Compatible class encoding in hyper-function decomposition for FPGA synthesis," Design Automation Conference, pp. 712-717, June 1998.
45. T. Kam, T. Villa, R. Brayton, and A. Sangiovanni, " Multi-valued decision diagrams: theory and applications," Journal of Multiple-Valued Logic, Vol. 4, No.1-2, 1998, pp. 9-62.
46. Y. Kambayashi, "Logic design of programmable logic arrays," IEEE Trans. on Computers, Vol. C-28, No. 9, pp. 609-617, Sept. 1979.
47. K. Keutzer, " DAGON: technology binding and local optimization by DAG matching," 24th ACM/IEEE Design Automation Conference June 28-July 01, 1987.pp.341-347.
48. S. P. Khatri, R. K. Brayton, A. Sangiovanni-Vincentelli, "Cross-talk immune VLSI design using a network of PLAs embedded in a regular layout fabric," Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design, Nov. 05-09, 2000, San Jose, California.
49. T. Kohonen, Content-Addressable Memories, Springer Series in Information Sciences, Vol. 1, Springer Berlin Heidelberg 1987.
50. Y. Komamiya, Theory of Computing Networks, Researches of ETL, Sept. 1959.
51. V. Kravets and K. Sakallah, "Constructive library-aware synthesis using symmetries," Proc. DATE-2000, pp. 208-213.
52. S. Krishnamoorthy and R. Tessier, "Technology mapping algorithms for hybrid FPGAs containing lookup tables and PLAs", In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 22, No. 5, 2003, pp. 545-559.
53. A. M. Kumar, J. Bobba, and V. Kamakoti, "MemMap: Technology mapping algorithm for area reduction in FPGAs with embedded memory arrays using reconvergence analysis," Design, Automation, and Test in Europe - DATE-2004, March 2004, pp.922-929.
54. Y-T. Lai, M. Pedram and S. B. K. Vrudhula, "BDD based decomposition of logic functions with application to FPGA synthesis", 30th ACM/IEEE Design Automation Conference, June 1993.
55. Y-T. Lai, M. Pedram, and S. B. K. Vrudhula, "EVBDD-based algorithm for integer linear programming, spectral transformation, and functional decomposition," IEEE Trans. CAD, Vol. 13, No. 8, pp. 959-975, Aug. 1994.
56. P-F. Lin and J. B. Kuo, "A 1-V 128-kb four-way set-associative CMOS cache memory using wordline-oriented tag-compare (WLOTC) structure with the content-addressable-memory (CAM) 10-transistor tag cell," IEEE Journal of Solid-State Circuits, Vol. 36, pp. 666-675, April 2001.
57. A. Ling, D. P. Singh, and S. D. Brown, "FPGA technology mapping:a study of optimality," 42nd Design Automation Conference, 13-17 June 2005, pp. 427-432.
58. C. Legl, B. Wurth, and K. EcK1, "Computing support-minimal subfunctions during functional decomposition," IEEE Trans. VLSI, Vol. 6, No. 3, pp. 354-363, Sept. 1998.
59. A. Martinelli, R. Krenz, and E. Dubrova, "Disjoint-support boolean decomposition combining functional and structural methods," Proceedings of the Asia and South Pacific Design Automation Conference, 2004 (ASP-DAC 2004), 27-30, Jan. 2004. pp.597-599.
60. Y. Matsunaga, "An exact and efficient algorithm for disjunctive decomposition," SASIMI'98, pp. 44-50, Oct. 1998.
61. H. J. Mattausch, T. Gyohten, Y. Soda, T. Koide, "Compact associative-memory architecture with fully-parallel search capability for the minimum Hamming distance," IEEE Journal of Solid-State Circuits, Vol. 37, No. 2, pp. 218-227, Feb. 2002.
62. K. McLaughlin, N. O'Connor, and S. Sezer, "Exploring CAM design for network processing using FPGA technology," Advanced International Conference on Telecommunications and International Conference on Internet and Web Applications and Services (AICT-ICIW'06), p. 84, 19-25, Feb. 2006.
63. C. Meinel and T. Theobald, Algorithms and Data Structures in VLSI Design: OBDD - Foundations and Applications, Springer, 1998.
64. S. Minato, N. Ishiura, and S. Yajima, "Shared binary decision diagram with attributed edges for efficient Boolean function manipulation," Proc. 27th ACM/IEEE Design Automation Conf., pp. 52-57, June 1990.
65. S. Minato, "Graph-based representations of discrete functions," in T. Sasao and M. Fujita (e.d.), Representations of Discrete Functions, Kluwer Academic Publishers, 1996.
66. S. Minato and G. De Micheli, "Finding all simple disjunctive decompositions using irredundant sum-of-products forms," ICCAD-99, pp. 111-117, Nov. 1998.
67. A. Mishchenko, C. Files, M. Perkowski, B. Steinbach, and Ch. Dorotska, "Implicit algorithms for multi-valued input support manipulation," Proc. 4th Intl. Workshop on Boolean Problems, Sept. 2000, Freiberg, Germany.
68. A. Mishchenko and T. Sasao, "Logic synthesis of LUT cascades with limited rails: A direct implementation of multi-output functions,", IEICE Technical Reports, VLD2002-99, Nov. 2002.
69. A. Mishchenko and T. Sasao, "Encoding of Boolean functions and its application to LUT cascade synthesis, " International Workshop on Logic and Synthesis (IWLS2002), New Orleans, Louisiana, June 4-7, 2002, pp.115-120.
70. A. Mishchenko, "Fast computation of symmetries in Boolean functions," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Vol. 22, Issue 11, pp. 15881593, Nov. 2003.
71. A. Mishchenko, S. Chatterjee, and R. Brayton, "Improvements to technology mapping for LUT-based FPGAs," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Vol. 26, Issue 2,pp. 250-253, Feb. 2007.
72. A. Mishchenko, S. Cho, S. Chatterjee, and R. K. Brayton, "Combinational and sequential mapping with priority cuts," Proc. ICCAD'07, Nov. 2007, pp.354-361.
73. A. Mishchenko, R. K. Brayton, and S. Chatterjee, "Boolean factoring and decomposition of logic networks", Proc. ICCAD'08, Nov. 2008, pp.38-45.
74. J. C. Moran, "Memory patching circuit," US Patent 4028678.
75. M. Motomura et al., "A 1.2-million-transistor, $33-\mathrm{MHz}, 20-\mathrm{b}$ dictionary search processor (DISP) ULSI with a 160-Kbyte CAM, " IEEE J. Solid-State Circuits, Vol. 25, No. 5, Oct. 1990, p. 1158-1165.
76. V. Manohararajah, S. D. Brown, and Z. G. Vranesic, "Heuristics for area minimization in LUT-based FPGA technology mapping", IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, Vol. 25, No. 11, November 2006, pp. 2331-2340.
77. R. Murgai and R. K. Brayton, "Optimum functional decomposition using encoding," Design Automation Conference. DAC 1994, pp.408-414, June 1994.
78. R. Murgai, R. Brayton, and A. Sangiovanni Vincentelli, Logic Synthesis for FieldProgrammable Gate Arrays, Springer, July 1995.
79. R. Murgai, F. Hirose, and M. Fujita, "Logic synthesis for a single large look-up table," Proc. International Conference on Computer Design,pp. 415-424, Oct. 1995.
80. S. Muroga, VLSI System Design, John Wiley \& Sons, 1982, pp. 293-306.
81. S. Nagayama, T. Sasao, and J. T. Butler, "Programmable numerical function generators based on quadratic approximation: Architecture and synthesis method," ASPDAC 2006, Yokohama Jan. 2006, pp. 378-383.
82. S. Nagayama, T. Sasao, and J. T. Butler "Compact numerical function generators based on quadratic approximation: architecture and synthesis method," IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E89-A, No.12, Dec. 2006, pp.3510-3518, Special Section of VLSI Design and CAD Algorithms.
83. H. Nakahara, T. Sasao, and M. Matsuura, "A design algorithm for sequential circuits using LUT rings," IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E88-A, No.12, Dec. 2005, pp.3342-3350.
84. H. Nakahara and T. Sasao, "A PC-based logic simulator using a look-up table cascade emulator," IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E89-A, No.12, Dec. 2006, pp. 3471-3481.
85. H. Nakahara, T. Sasao, M. Matsuura, and Y. Kawamura, "A Parallel sieve method for a virus scanning engine," 12th EUROMICRO Conference on Digital System Design, Architectures, Methods and Tools, Patras, Greece (DSD 2009).
86. K. Nakamura, T. Sasao, M. Matsuura, K. Tanaka, K. Yoshizumi, H. Qin, and Y. Iguchi, "Programmable logic device with an 8 -stage cascade of 64 K -bit asynchronous SRAMs," Cool Chips VIII, IEEE Symposium on Low-Power and High-Speed Chips, April 20-22, 2005, Yokohama, Japan.
87. K. Nakamura, T. Sasao, M. Matsuura, K. Tanaka, K. Yoshizumi, H. Nakahara, and Y. Iguchi, "A memory-based programmable logic device using a look-up table cascade with synchronous SRAMs," 2005 International Conference on Solid State Devices and Materials (SSDM 2005), Kobe, Japan, Sep. 2005.
88. K. Nakamura, T. Sasao, M. Matsuura, K. Tanaka, K. Yoshizumi, H. Nakahara and Y. Iguchi, "A memory-based programmable logic device using look-up table cascade with synchronous static random access memories," Japanese Journal of Applied Physics, Vol. 45, No. 4B, 2006, April, 2006, pp. 3295-3300.
89. G. Nilsen, J. Torresen and O. Sorasen, "A variable word-width content addressable memory for fast string matching," NorChip 2004, pp. 214-217.
90. K. Pagiamtzis and A. Sheikholeslami, "Content-addressable memory (CAM) circuits and architectures: A tutorial and survey," IEEE Journal of Solid-State Circuits, vol. 41, no. 3, pp. 712-727, March 2006.
91. S. Panda, F. Somenzi, and B. F. Plessier, "Symmetry detection and dynamic variable ordering of decision diagrams," ICCAD1994, Nov. 1994, pp.628-631.
92. C. A. Papachristou,"Characteristic measures of switching functions," Information Science, Vol.13, No.1, pp.51-75, 1977.
93. G. Papadopoulos and D. Pnevmatikatos, "Hashing + memory = low cost, exact pattern matching," in 15th International Conference on Field Programmable Logic and Applications, Aug. 2005, pp. 39-44.
94. S. Posluszny, N. Aoki, D. Boerstler, J. Burns, S. Dhong, U. Ghoshal, P. Hofstee, D. LaPotin, K. Lee, D. Meltzer, H. Ngo, K. Nowka, J. Silberman, and O. Takahashi,"Design methodology for a 1.0 GHz microprocessor,'Proc. ICCD-1998, (Computer Design: VLSI in Computers and Processors), Oct. 5-7, 1998, pp. 17-23.
95. J. Qiao, M. Ikeda, and K. Asada, " Finding an optimal functional decomposition for LUTbased FPGA synthesis," in Proceedings of the 2001 conference on Asia South Pacific design automation, pp. 225-230, Jan. 2001, Yokohama, Japan.
96. H. Qin, T. Sasao, M. Matsuura, K. Nakamura S. Nagayama and Y. Iguchi "A realization of multiple-output functions by a look-up table ring," IEICE Transactions on Fundamentals of Electronics, Vol.E87-A, Dec. 2004, pp.3141-3150.
97. H. Qin, T. Sasao, and J. T. Butler, "Implementation of LPM address generator on FPGAs," International Workshop on Applied Reconfigurable Computing (ARC2006), Delft, the Netherlands, March 1-3, 2006, also appeared in Lecture Notes in Computer Science 3985, pp.170-181.
98. H. Qin, T. Sasao, and J. T. Butler, "On the design of LPM address generators using multiple LUT Cascades on FPGAs," International Journal of Electronics, Vol. 94, Issue 5, May 2007, pp.451-467.
99. J. Rose, R. J. Francis, D. Lewis, and P. Chow, "Architecture of field programmable gate arrays: The effect of logic block functionality on area efficiency,"IEEE J. Solid State Circ. 25,5, pp. 1217-1225, Oct. 1990.
100. J. Rose, A. El Gamal, and A. Sangiovanni-Vincentelli, "Architecture of field-programmable gate arrays," Proc. IEEE, Vol. 81, No. 7, pp.1013-1029, July 1993.
101. J. P. Roth and R. M. Karp, "Minimization over Boolean graphs,"IBM Journal of Research and Development, pp. 227-238, April 1962.
102. R. Rudell, "Dynamic variable ordering for ordered binary decision diagrams," ICCAD-93, pp. 42-47, 1993.
103. A. Sangiovanni-Vincentelli, A. El Gamal, J. Rose, " Synthesis method for field programmable gate arrays," Proceedings of the IEEE, Vol.81, No.7,July 1993, pp.1057-1083.
104. T. Sasao and P. Besslich, "On the complexity of MOD-2 Sum PLA's," IEEE Trans. on Comput. Vol. 39. No. 2, pp. 262-266, Feb. 1990.
105. T. Sasao, "Bounds on the average number of products in the minimum sum-of-products expressions for multiple-valued input two-valued output functions," IEEE Trans. on Comput. Vol. 40, No. 5, pp. 645-651, May 1991.
106. T. Sasao, "FPGA design by generalized functional decomposition," In Logic Synthesis and Optimization, Kluwer Academic Publisher, pp. 233-258, 1993.
107. T. Sasao and J. T. Butler, "A design method for look-up table type FPGA by pseudoKronecker expansion," IEEE International Symposium on Multiple-Valued Logic, Boston, May 1994, pp. 97-106.
108. T. Sasao and M. Matsuura, "DECOMPOS: An integrated system for functional decomposition," 1998 International Workshop on Logic Synthesis,Lake Tahoe, June 1998.
109. T. Sasao, "Totally undecomposable functions: applications to efficient multiple-valued decompositions," IEEE International Symposium on Multiple-Valued Logic, Freiburg, Germany, May 20-23, 1999, pp. 59-65.
110. T. Sasao, Switching Theory for Logic Synthesis, Kluwer Academic Publishers, 1999.
111. T. Sasao, "On the number of dependent variables for incompletely specified multiple-valued functions," 30th International Symposium on Multiple-Valued Logic, pp. 91-97, Portland, Oregon, U.S.A., May 23-25, 2000.
112. T. Sasao," A new expansion of symmetric functions and their application to non-disjoint functional decompositions for LUT-type FPGAs," International Workshop on Logic Synthesis, Dana Point, California, U.S.A., May 31-June 2, 2000, pp.105-110.
113. T. Sasao, M. Matsuura, and Y. Iguchi, "A cascade realization of multiple-output function for reconfigurable hardware," International Workshop on Logic and Synthesis (IWLSO1), Lake Tahoe, CA, June 12-15, 2001, pp. 225-230.
114. T. Sasao, "Design methods for multi-rail cascades," (invited paper) International Workshop on Boolean Problems (IWBP2002), Freiberg, Germany, Sept. 19-20, 2002, pp. 123-132.
115. T. Sasao and M. Matsuura, "A method to decompose multiple-output logic functions," Design Automation Conference, pp. 428-433, San Diego, USA, June 2-6, 2004.
116. T. Sasao, J. T. Butler, and M. D. Riedel, "Application of LUT cascades to numerical function generators," The 12th Workshop on Synthesis And System Integration of Mixed Information technologies (SASIMI2004), Oct. 18-19, 2004, Kanazawa, Japan, pp.422-429.
117. T. Sasao, H. Nakahara, M. Matsuura and Y. Iguchi, "Realization of sequential circuits by lookup table ring," The 2004 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS 2004), Hiroshima, July 25-28, 2004, pp.I:517-I:520.
118. T. Sasao, M. Kusano, and M. Matsuura, "Optimization methods in look-up table rings," International Workshop on Logic and Synthesis (IWLS-2004), June 2-4, Temecula, California, U.S.A. .pp. 431-437.
119. T. Sasao, "Radix converters: Complexity and implementation by LUT cascades," ISMVL2005, pp. 256-263.
120. T. Sasao, "Analysis and synthesis of weighted-sum functions," International Workshop on Logic and Synthesis (IWLS-2005), Lake Arrowhead, CA, USA, June 8-10, 2005,pp.455-462.
121. T. Sasao, Y. Iguchi, T. Suzuki, "On LUT cascade realizations of FIR filters," DSD2005, 8th Euromicro Conference on Digital System Design: Architectures, Methods and Tools, Porto, Portugal, Aug. 30-Sept. 3, 2005, pp.467-474.
122. T. Sasao, S. Nagayama, and J. T. Butler, "Programmable numerical function generators: Architectures and synthesis system," FPL2005, Tampere, Aug.24-26, 2005, pp.118-123.
123. T. Sasao, Y. Iguchi, and M. Matsuura, "LUT cascades and emulators for realizations of logic functions,"RM2005, Tokyo, Japan, Sept.5-Sept.6, 2005, pp.63-70.
124. T. Sasao, "Design methods for multiple-valued input address generators," ISMVL-2006 (invited paper), Singapore, May 17-20, 2006.
125. T. Sasao and J. T. Butler, "Implementation of multiple-valued CAM functions by using LUT cascade, " International Symposium on Multi-Valued Logic, May 17-20, 2006, Singapore, May 17-20, 2006.
126. T. Sasao, "A Design method of address generators using hash memories," IWLS-2006, pp. 102-109, Vail, Colorado, U.S.A, June 7-9, 2006.
127. T. Sasao, "Analysis and synthesis of weighted-sum functions," IEEE TCAD, Special issue on International Workshop on Logic and Synthesis, Vol. 25, No. 5, May 2006, pp. 789-796.
128. T. Sasao and M. Matsuura, "An implementation of an address generator using hash memories," 10th EUROMICRO Conference on Digital System Design, Architectures, Methods and Tools (DSD-2007), Aug. 27-31, 2007, Lubeck, Germany, pp.69-76.
129. T. Sasao and H. Nakahara, "Implementations of reconfigurable logic arrays on FPGAs," International Conference on Field-Programmable Technology 2007 (ICFPT'07), Dec. 12-14, 2007, Kitakyushu, Japan, pp.217-223.
130. T. Sasao, S. Nagayama and J. T. Butler, "Numerical function generators using LUT cascades," IEEE Transactions on Computers, Vol.56, No.6, June 2007, pp.826-838.
131. T. Sasao, "On the number of variables to represent sparse logic functions," 17th International Workshop on Logic \& Synthesis (IWLS-2008), Lake Tahoe, California, USA, June 4-6, 2008, pp.233-239.
132. T. Sasao, "On the number of variables to represent sparse logic functions," ICCAD-2008, San Jose, California, USA, Nov.10-13, 2008, pp. 45-51.
133. T. Sasao, "On the number of LUTs to realize sparse logic functions," International Workshop on Logic and Synthesis (IWLS-2009), July 31-Aug.2, 2009.
134. T. Sasao and A. Mishchenko, "LUTMIN: FPGA logic synthesis with MUX-based and cascade realizations," International Workshop on Logic and Synthesis (IWLS-2009), July 31-Aug. 2, 2009.
135. T. Sasao, T. Nakamura, and M. Matsuura, "Representation of incompletely specified index generation functions using minimal number of compound variables," 12th EUROMICRO Conference on Digital System Design, Architectures, Methods and Tools, Patras, Greece DSD2009.
136. T. Sasao, M. Matsuura, and H. Nakahara, "A realization of index generation functions using modules of uniform sizes," International Workshop on Logic and Synthesis (IWLS-2010), Irvine, California, June 11-13, 2010, pp.201-208.
137. T. Sasao, "On the numbers of variables to represent multi-valued incompletely specified functions," 13th EUROMICRO Conference on Digital System Design, Architectures, Methods and Tools, Lille, France DSD-2010, Sept. 2010, pp.pp.420-423.
138. T. Sasao, "Index generation functions: Recent developments," International Symposium on Multiple-Valued Logic (ISMVL-2011), Tuusula, Finland, May 23-25, 2011. (invited paper).
139. H. Sawada, T. Suyama, and A. Nagoya, "Logic synthesis for look-up table based FPGAs using functional decomposition and support minimization," ICCAD, pp. 353-359, Nov. 1995.
140. C. Scholl and P. Molitor, "Communication based FPGA synthesis for multi-output Boolean functions," Asia and South Pacific Design Automation Conference, pp. 279-287, Aug. 1995.
141. C. Scholl, R. Drechsler, and B. Becker, "Functional simulation using binary decision diagrams," ICCAD’97, pp. 8-12, Nov. 1997.
142. K. J. Schultz, CAM-Based Circuits for ATM Switching Networks, PhD thesis, University of Toronto, 1996.
143. D. A. Simovici, D. Pletea, and R. Vetro, "Information-theoretical mining of determining sets for partially defined functions," ISMVL-2010, May 2010, pp.294-299.
144. H. Song, and J. W. Lockwood, "Efficient packet classification for network intrusion detection using FPGA," International Symposium on Field-Programmable Gate Arrays (FPGA’05), Monterey, CA, Feb 20-22, 2005.
145. I. Sourdis, D. N. Pnevmatikatos, and S. Vassiliadis, "Scalable multigigabit pattern matching for packet inspection," IEEE Trans. VLSI Syst. Vol.16, No.2, pp.156-166, 2008.
146. V. Y-S Shen, A. C. Mckellar, and P. Weiner, "A fast algorithm for the disjunctive decomposition of switching functions," IEEE Trans. on Comput., Vol. C-20, No. 3, pp. 304-309, March 1971.
147. W.-Z. Shen, J.-D. Huang, and S.-M. Chao, "Lamda set selection in Roth-Karp decomposition for LUT-based FPGA technology mapping," 32nd Design Automation Conference, pp. 65-69, June 1995.
148. D. E. Taylor, "Survey and taxonomy of packet classification techniques," ACM Computing Surveys, Vol. 37, Issue 3, pp. 238-275, Sept., 2005.
149. M.Waldvogel, G. Varghese, J. Turner, and B. Plattner, "Scalable high speed IP routing lookups," ACM SIGCOMM Computer Communication Review, Vol. 27, No. 4, pp. 25-38, 1997.
150. W. Wan and M. A. Perkowski,"A new approach to the decomposition of incompletely specified functions based on graph-coloring and local transformations and its application to FPGA mapping," IEEE EURO-DAC'92, pp. 230-235, Hamburg, Sept. 7-10, 1992.
151. S. J. E. Wilton, "SMAP: Heterogeneous technology mapping for FPGAs with embedded memory arrays," In Proc. o $f$ the ACM/SIGDA International Symposium on FieldProgrammable Gate Arrays, pp. 171-178, 1998.
152. S. J. E. Wilton, J. Rose, and Z. Vranesic, "The memory/logic interface in FPGA's with large embedded memory arrays," IEEE Transactions on Very Large Scale Integration(VLSI) Systems, Vol. 7, Issue 1, March 1999, pp.80-91.
153. S. J. E. Wilton, "Heterogeneous technology mapping for area reduction in FPGAs with embedded memory arrays," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Vol. 19, Issue 1, Jan. 2000, pp.56-68.
154. I. Wegener, Branching Programs and Binary Decision Diagrams: Theory and Applications, Society for Industrial Mathematics, Jan. 1987.
155. B. Wurth, K. Eckl, and K. Anterich, "Functional multiple-output decomposition: Theory and implicit algorithm," Design Automation Conf., pp. 54-59, June 1995.
156. http://www.xilinx.com
157. Xilinx,"Designing flexible, fast CAMs with Virtex family FPGAs," Application Note, XAPP203, Sept. 23, 1999.
158. Xilinx, "Advantages of the Virtex-5 FPGA, 6-input LUT architecture," WP284 (v1.0), Dec. 19, 2007, http://www.xilinx.com/
159. S. Yang, "Logic synthesis and optimization benchmark user guide, Version 3.0," MCNC, Jan. 1991.
160. S. S. Yau and C. K. Tang, "Universal logic modules and their applications," IEEE Transactions on Computers, vol. 19, no. 2, pp. 141-149, Feb. 1970.
161. F. Zane, G. Narlikar, and A. Basu, "CoolCAM: Power-efficient TCAMs for forwarding engines", Proceeding of IEEE INFOCOM '03, April, 2003.

## Index

## Symbols

$K$-feasible, 21, 22
( $p, q$ )-element, 88
1-MUX, 25
2-MUX, 25
3-MUX, 25
7-segment display, 123

## A

address table, 81
alphabetic character, 86
arithmetic decomposition, 64
Ashenhurst decomposition, 12
asynchronous memory, 3
atomic number, 84
AUX memory, 98

## B

BCAM, 5
BCD, 124
BDD, 14, 55
binary CAM, 5
Binary Coded Decimal, 124
binary decision diagram, 14
bound variables, 12

## C

C-measure, 34
CAM, 4
category, 84
cells, 34
characteristic functions, 120
chemical symbol, 84
Chinese characters, 86
collision, 96
column function, 12
column multiplicity, 12
comparator, 98
compatible, 121
condensed decomposition chart, 156
cone, 21
Curtis decomposition, 13
cut, 22
cut-set, 22

## D

decomposition chart, 12
density, 84
depends on, 121
dictionary, 85
digital filters, 62
directed acyclic graph, 21
disjoint decomposition, 13
disjoint encoding, 42
distributed arithmetic, 62
double-input hash circuit, 96
DRAM, 3
dynamic PLA, 4

## E

ECFN, 53
EEPROM, 7
elementary functions, 61
elementary symmetric functions, 18
embedded block RAMs, 8
embedded memories, 8
encoding method, 41
essential, 121

## F

fanin size, 21
fanout-free cone, 22

FeRAM, 7
firmware, 83
flash RAM, 7
FPGA, 6
free variables, 12
FTP, 82
functional decomposition, 12

## G

generalized decomposition, 13
Gibits, 147

## H

hash function, 95
hash table, 95
hybrid method, 95, 103

## I

IGU, 96
incompletely specified logic function, 120
index, 81
index generation function, 81
Index Generation Unit, 96
index generator, 81
input vector, 62
Internet Protocol version 4, 58
interval function, 68
IP addresses, 81
IP forwarding table, 59
IPv4, 58
island-style, 6

## K

K-LUT, 6
KANA characters, 86
Kibit, 90
Komamiya, 19

## L

leakage current, 8
linear transformation, 96,117
linked list, 96
literal, 11
logical product, 11
longest prefix match, 57
look-up tables, 6
LPM, 57
LPM function, 58
LPM index generator, 58

LPM table, 57
LUT, 6
LUT cascade, 34
LUT cascade emulator, 137

## M

m-out-of-n code, 147
m-out-of-n to binary converter, 147
MAC address, 82
main memory, 98
maximum fanout free cone, 22
memory packing, 138
memory patch, 83
memory size, 88
Mibit, 146
minimum length encoding, 42
modulo $m$ function, 65
MRAM, 7
MTBDD, 17
multiple-output functions, 126
multiplexer, 25
multiterminal BDD, 17
MUX, 25

## N

natural order, 34
NFG, 61
nondisjoint decomposition, 13, 45
nondisjoint encoding, 43
nonessential, 119, 121
nonvolatile, 3
nonvolatile memory, 7
numerical function generator, 61

## 0

OBDD, 14
ordered BDD, 14
output AND gates, 98

## $\mathbf{P}$

parallel sieve method, 95,112
partially symmetric, 18
patch memory, 84
periodic table, 84
PLA, 4
power dissipation, 8
primary input, 21
primary output, 21
prime-counting function, 116
priority encoder, 5
priority encoder function, 68
product, 11
product term, 11
profile, 33
programmable hash circuit, 96
pseudo-Kronecker expansion, 32

## Q

QRMTBDD, 55
QROBDD, 15
quasi-reduced ordered BDD, 15

## R

radix converters, 62
rails, 34
range match, 5
rd73, 20
reduced ordered BDD, 15
Reed-Muller expansion, 32
refreshing, 3
registered vector table, 81
registered vectors, 81
rewritable PLA, 4
ROBDD, 15
ROM, 3
root, 21
Roth-Karp decomposition, 13

## S

$\mathrm{SB}(n, k), 18$
segment index encoder function, 60
segment index logic function, 60
sense amplifier, 3
Shannon expansion, 32
SIE, 60
simple disjoint decomposition, 13
single-input hash circuit, 97
SOP, 11
sparse, 57
sparse function, 71
specific heat, 84
SRAM, 3
standard parallel sieve method, 112
state, 84
static PLA, 4
strict encoding, 42
sum-of-products expression, 11
super hybrid method, 95, 108
SYM12, 38
SYM9, 50
symmetric function, 18
symmetric with respect, 18
synchronous memory, 3

T
$t$-MUX, 25
TAC, 82
TCAM, 5, 59
Telnet, 82
term, 11
terminal access controller, 82
ternary CAM, 5
ternary content addressable memory, 59
ternary-to-binary converter, 116
threshold, 63
threshold function, 63
totally symmetric, 18
totally symmetric function, 56

## $\mathbf{U}$

uniformly distributed, 76, 125

## V

volatile, 3

## W

Web, 82
weight, 11, 71, 81
weight vector, 62,63
WGT $n, 19$
WGT12, 40
WGT3, 48
WGT4, 48
WGT5, 51
WGT7, 20, 48, 51
WS function, 62


[^0]:    ${ }^{1}$ In the previous publications, disjoint encoding was called strict encoding [44, 112].

[^1]:    ${ }^{1} 9,973$ is the largest prime number less than 10,000 .

[^2]:    ${ }^{1}$ In some publications [113], the emulator was called an LUT cascade. However, later the sequential circuit that emulates an LUT cascade is called an LUT cascade emulator.

