#### PAPER • OPEN ACCESS

# Design and implementation of power efficient clock gated dual-port SRAM

To cite this article: Mohammad Waqar Bhat et al 2022 J. Phys.: Conf. Ser. 2325 012034

View the article online for updates and enhancements.

## You may also like

- <u>Low Power 130 nm CMOS Johnson</u> <u>Counter with Clock Gating Technique</u> Nur Syuhadah Amran and Siti Hawa Ruslan

- Optimizing energy efficiency of CNNbased object detection with dynamic voltage and frequency scaling Weixiong Jiang, Heng Yu, Jiale Zhang et al.

 Nanomagnet logic: progress toward system-level integration
 M T Niemier, G H Bernstein, G Csaba et al. Journal of Physics: Conference Series

# Design and implementation of power efficient clock gated dual-port SRAM

Mohammad Waqar Bhat<sup>1</sup>, Kaartik R<sup>2</sup>, Dr. Sowmya K B<sup>3</sup>

<sup>1</sup> Department of ECE, RV College of Engineering, Bengaluru, India.

<sup>1,2,3</sup>{mohammadwb.lvs21,kaartikr.lvs21,sowmyakb}@rvce.edu.in

Abstract. Synchronous clock gated Dual-port RAM has been designed in this paper. To increase the design's power, a negative latch-based clock gating approach was used. On Xilinx Vivado ML Edition, the design was implemented on an XC7Z010ICLG225-1L device with a -1-speed grade and the Zynq-7000 FPGA family. The Verilog HDL was used to create the design. A 2-bit AND gate is used to realize clock gating which takes the clock and enable signal as input thus clock become only high when enable is high which reduces the switching frequency and thus power consumption by the clock is reduced as power is directly proportional to (frequency)<sup>2</sup>. Apart from clock gating, the SRAM is Dual-Port based, allowing multiple reads or writes to occur simultaneously, or at approximately the same time, unlike single-port RAM, which permits only one access at a given time. The power consumption value of the clock signal with clock gating at 100 GHz is 627.61 mW as compared to power consumption without clock gating which is 1405 mW, at a clock frequency of 100 GHz, a 55 percent reduction in total clock power was achieved. Xilinx Vivado ML Edition's XPower Analyzer tool was used to calculate the device's power.

Keywords: Clock Gating, Dual Port Ram, XPower Analyzer.

#### **1. Introduction**

Random Access Memory (RAM) is a type of semiconductor peripheral device (memory element) used in VLSI circuits to store data or communicate with other peripheral devices. There are two forms of RAM: static RAM (SRAM) and dynamic RAM (DRAM) (DRAM). Each bit is saved in static randomaccess memory (SRAM) utilising a bistable latching circuit with both logic high and logic low stable states. SRAM is used for cache and internal registers in CPUs. SRAM requires a constant power supply for data storage and transfer as they are volatile, i.e., data is permanently lost when power is disconnected.

In contrast to single-port RAM, which enables only one access at a time, dual-ported SRAM allows several reads or writes to occur at the same time, or almost at the same time. Dual-port SRAMs allow two CPUs or peripheral devices to access the same memory in the same device at the same time. As a

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd 1

| International Conference on Electronic Circuits | and Signalling Technolog  | gies             | IOP Publishing      |
|-------------------------------------------------|---------------------------|------------------|---------------------|
| Journal of Physics: Conference Series           | <b>2325</b> (2022) 012034 | doi:10.1088/1742 | -6596/2325/1/012034 |

result, two peripheral devices can communicate with each other via memory. Finally, it improves the memory's efficiency because two peripheral devices are acquiring data from it at the same time.

Clock signals are one of the vital power contributors in VLSI design and are almost responsible for 30%- 70% of the entire dynamic power of the peripheral device [1]. Several methods have been invented to regulate the dynamic power of a peripheral device; clock gating is notable among those. In the primary clock gating approach, the input clock signals of the device are AND 'ed or ORed with designer input enable (EN) signals. Thus, gating of the clock can actually save power in sequential circuits. It simply disables the clock signal where it consumes unnecessary power [2], [3]. In the field of clock gating approach was developed on RAM in 2013 [10]. The paper's main contribution is that we used a negative latch-based clock gating technique to improve the power efficiency of our 8-bit dualport SRAM. The layout is well-organized to handle write-read disagreements. The concept uses a negative latch-based clock to reduce power consumption. We're also working on putting this design into action on the reconfigurable system, as seen in [4-6] [9] [11,12].

It is equipped with two central processors (CPU). Each CPU has its own set of signals, including read, write, enable, semaphore, address, and data lines. The Peripheral Device has two ports, allowing several peripheral devices to connect with one another without the use of any other special communication routes between two peripherals.

On clock edges, both peripheral devices are gaining memory. The clock is generated by the Gated Clock, and the usage of clock gating at the clock has enhanced the circuit's power. The enable signal is used to activate the dual-port RAM clock. Figure 4 shows a negative clock-gated SRAM circuit. The RTL perspective of a negative edge-triggered dual-port SRAM cell is shown in Figure 3. The RTL view of the architecture makes the circuit relationships more obvious. When left enable is set to high, it will send the address bits it wants to access, as well as a read and write signal that indicates whether the peripheral device wants to write or read on that address. It will try to read the memory if it wants to it will give the address bits only. It will transmit the address bits alongside the data bits if it wishes to write. The read write actions will cause the control logic to set the semaphore of this peripheral device. Peripheral device B, on the other hand, can access the memory space at the same time by providing the control block with the necessary inputs. No inputs will reach the control block if enable is flipped logic low, or the peripheral device will be reviewed in an OFF state.

#### 2. Literature Review

Authors describes the implementation of different clock gating strategies for different sequential circuits in Paper [1] Benini, L.; Siegel, P.; De Micheli, G., "Saving power by synthesising gated clocks for sequential circuits, "Design & Test of Computers, IEEE, vol.11, no.4, pp.32,41, Winter 1994. Clock gating (CG) and Power gating (PG) are the most widely used low-power design concepts for reducing dynamic and static power, respectively. Aside from the significant energy savings they give, their popularity stems from their high compatibility with common CMOS technologies and scaling, flexibility for automation applications, and ease of interface with commercial design frameworks provided by major EDA suppliers. Unfortunately, their installation is not free, and scheduling and area issues must be carefully examined. Furthermore, both require dedicated power-managed devices that are responsible for detecting idle conditions and generating the appropriate control signals. Such control units, which are frequently built as independent yet different entities, could be a major source of space and timing overheads, complicating the design verification step. As a result, it's easy to see how, despite being highly desirable in terms of energy savings, combining clock-gating and power-gating under a single control domain could be a good solution for lowering implementation costs.

| International Conference on Electronic Circuit | s and Signalling Technolog | gies             | IOP Publishing      |
|------------------------------------------------|----------------------------|------------------|---------------------|
| Journal of Physics: Conference Series          | <b>2325</b> (2022) 012034  | doi:10.1088/1742 | -6596/2325/1/012034 |

In Paper [2] {V. G. Oklobdzija, Digital System Clocking—High-Performance and Low-Power Aspects. New York, NY, USA: Wiley, 2003.} It is observed the differences in the modelling approach can affect the design process and explore some of the precautions that designers can take to minimise increased power dissipation and potential difficulties during the layout phase and final timing closure.

#### 3. Techniques and Methodology

Clock signals are one of the most significant power contributors in any VLSI design, accounting for almost 70% to 30% of the device's total dynamic power [1]. Several ways for controlling the dynamic power of a peripheral device have been developed, with gating at the clock signal being one of them. The device's input clock signals are AND 'ed / OR 'ed with the designer input enable signal in a basic clock gating mechanism. In sequential circuits, clock gating can therefore conserve power. It simply suppresses the clock signal in areas where it is wasting energy [2], [3]. In the field of clock gating approaches, the negative latch-based strategy has been demonstrated to be one of the best [8]. The technique was compared to clock gating strategies based on NOR, AND, and positive latch. The AND-based clock gating approach was developed on RAM in 2013 [10].



Figure 1. Negative latch-based Clock Gated Circuit.

# 3.1 CLOCK GATING TECHNIQUE USING NEGATIVE LATCH

The clock gating circuitry in Figure 1 is based on a negative -latch. An Enable input is utilised here to command the clock of the controlling peripheral device and assign it to the Gated CLK output, which is then used as the clock of the target peripheral device. Gen is initially set to '0,' Enable to '1,' and x to '0.' As a result, when this x is OR 'ed with a clock, a clock signal is generated for the controlling peripheral device, which is Latch. When the clock is LOW, the latch is driven, and the output Q is received in the next clock cycle. The Gated CLK is derived by ANDing the global clock and Gen, which is logic LOW in this case. As a result, the latch is held in place and no switching actions are performed. Gen now changes to logic HIGH on the following clock cycle, which is estimated by a clock AND. As a result, it generates a clock pulse that activates the targeted peripheral device. OR gate will always be high because Gen is logic high and Enable is also logic high. As a result, the latch will remain in its current state and can only be toggled when Enable changes. As a result, when Enable is set to logic low, Gated CLK is set to logic low as well. That is how the target gadget will carry out its tasks.

**IOP** Publishing

Journal of Physics: Conference Series

**2325** (2022) 012034 doi:10.1088/1742-6596/2325/1/012034



Figure 2. Circuit diagram negative latch-based Dual-port SRAM.

The Clock Gating circuit gives this clock as output, the power of our circuit is optimised by using clock gating circuitry at the clock input. The clock signal for the dual-port RAM circuit is enabled using the enable signal. The circuit of negative clock-based dual-port SRAM is exhibited in figure 2. The RTL diagram of the negative latch-based dual-port SRAM cell is displayed in figure 3. The RTL schematic of the design the circuit connections are more understandable.

When the left enable is turned high, it will send address bits together with the read-write signal, indicating whether the device wishes to write or read on that specific address. If it wants to read, it will merely give the address. If it wishes to write, it will provide the address as well as the data. The semaphore of this device is set accordingly using the control block. Device B can access the memory locations, at the same time, by giving required inputs to the control logic. No inputs will reach the control logic, when the enable is low or the device will be evaluated to be in OFF state.



FIGURE 3. Clock gated 8-bit dual-port SRAM RTL schematic

### 4. SIMULATION RESULTS AND DISCUSSION

This section defines the behavior of the proposed clock gated 10-bit dual-port SRAM circuit using the Xilinx XPower Tool. Results of the Proposed clock gated 10-bit dual-port SRAM are compared with results of 10-bit dual-port SRAM without clock gating. Both the dual-port SRAMs are simulated in the Xilinx Vivado simulator tool, using the XC7Z010ICLG225-1L device family.

Verilog HDL is used to synthesize the design of the circuit. XPower Analyzer tool of XILINX VIVADO calculates the power consumption of the peripheral devices. The two designs' power has been tested at 1 THz, 1GHz, 100 GHz, and 1000 GHz frequencies, with the results compared. Without using the clock gating technique, Table 1 shows the clock, logic, BRAM, signal, dynamic, static, and total power levels of the 10-bit dual port.

The clock, logic, BRAM, signal, dynamic, static, and total power values of the 10-bit dual port with the clock gating technique are shown in Table 2. At 1 THz, 1GHz, 100 GHz, and 1000 GHz frequencies, the various power values were examined.

| POWER ON-<br>CHIP | 8-bit Dual Port SRAM Without Clock Gated |       |        |         |
|-------------------|------------------------------------------|-------|--------|---------|
| FREQUENCY         | 1 THz                                    | 1 GHz | 10 GHz | 100 GHz |
| CLOCK             | 14050                                    | 14.05 | 140.5  | 1405    |
| LOGIC             | 0.25                                     | 0.01  | 0.01   | 0.07    |
| IOs               | 2875.78                                  | 3.34  | 29.22  | 288     |
| BLOCK RAM         | 17978.8                                  | 18    | 179.81 | 1797.9  |
| STATIC            | 433.88                                   | 45.33 | 45.96  | 53.31   |
| DYNAMIC           | 34905.43                                 | 35.44 | 349.6  | 3491.12 |
| SIGNAL            | 0.59                                     | 0.05  | 0.06   | 0.16    |
| HIERARCHY         | 0.97                                     | 0.97  | 0.97   | 0.97    |
| TOTAL             | 35339.31                                 | 80.77 | 395.56 | 3544.43 |

TABLE 1. 8-BIT dual port sram without clock gating on-chip power consumption values

| POWER ON-CHIP | 8-bit Dual Port SRAM with Clock Gated |       |         |         |
|---------------|---------------------------------------|-------|---------|---------|
| FREQUENCY     | 1 THz                                 | 1 GHz | 10 GHz  | 100 GHz |
| CLOCK         | 6276.01                               | 6.28  | 62.77   | 627.61  |
| LOGIC         | 58.03                                 | 0.41  | 1.37    | 6.73    |
| IOs           | 11.42                                 | 11.42 | 11.42   | 11.42   |
| BLOCK RAM     | 3316.61                               | 3.32  | 33.17   | 331.66  |
| STATIC        | 75.29                                 | 45.3  | 45.48   | 47.28   |
| DYNAMIC       | 9663.06                               | 21.76 | 109.28  | 978.41  |
| SIGNAL        | 0.99                                  | 0.34  | 0.55    | 0.99    |
| HIERARCHY     | 0.65                                  | 0.65  | 0.65    | 0.65    |
| TOTAL         | 9738.35                               | 67.07 | 1025.68 | 1025.68 |

**TABLE 2.** 8-Bit dual port sram with clock gating technique on-chip power consumption values

Table 2 shows the clock powers of the two architectures at frequencies of 1 THz, 1GHz, 100 GHz, and 1000 GHz, respectively. At 1 THz, 1GHz, 100 GHz, and 1000 GHz frequencies independently, a 55 percent reduction in total clock power was found. The power of the clock is found to be considerably lowered when the clock gating approach is applied to the device.

The addition of clock gating circuitry increased the logic and signal power in the design, but this increase was found to be insignificant.

Table 2 shows the power consumption of the two designs by input and output power. Due to the installation of more circuitry, the power increases by 70.75 percent at 1 GHz, but drops by 96.03 percent, 99.60 percent, and 60.92 percent at 100 GHz, 1 THz, and 10 GHz, respectively.

At 1 THz, 1GHz, 100 GHz, and 1000 GHz frequencies, overall BRAM power was reduced by around 81.55 percent. The power of the hierarchy has decreased by 32.99 percent.

| International Conference on Electronic Circuits | s and Signalling Technolog | gies              | IOP Publishing     |
|-------------------------------------------------|----------------------------|-------------------|--------------------|
| Journal of Physics: Conference Series           | <b>2325</b> (2022) 012034  | doi:10.1088/1742- | 6596/2325/1/012034 |

The static power of the two peripheral devices nearly remains identical up to 100 GHz frequency but has been found to decrease considerably at 1 THz frequency. At 1 THz, 1GHz, 100 GHz and 1000 GHz frequencies, 82.65%, 0.07%, 1.04% and 11.31% decrease in static power has been observed respectively. An considerable reduction in dynamic power has been observed. The dynamic power decreases by 72.32%, 38.60%, 68.74%, and 71.97% at 1 THz, 1GHz, 100 GHz, and 1000 GHz frequencies respectively. This reduces the supply power by 72.44%, 16.96%, 60.88%, and 71.06% respectively at 1 THz, 1GHz, 100 GHz, and 1000 GHz frequencies.



FIGURE 4. Simulation window of 8-bit dual-port SRAM

The clock gated 8-bit dual-port SRAM simulation waveform created by Vivado is shown in Figure 4.

#### 5. Conclusion

Clock Gated Dual Port SRAM has been successfully designed, simulated, and optimised. The design was created using Verilog HDL. The Total Power Consumption of Clock Gated Dual Port RAM is reduced by 16.96 percent (on 1GHz), 60.87 percent (on 10GHz), 71.062 percent (on 100GHz), 72.44 percent (on 100GHz) as compared to the clock power consumption of dual-port SRAM without employing negative latch-based clock gating Techniques (on 1THz). At 1GHz, 1THz 100 GHz, and 1000 GHz frequencies, total supply power was reduced by 16.96%, 72.44 percent, 60.88 percent, and 71.06 percent, respectively. Cadence design tools at the transistor level can lower the proposed design's power even more.

# 6. References

- [1] Benini, L.; Siegel, P.; De Micheli, G., "Saving power by synthesizing gated clocks for sequential circuits," Design & Test of Computers, IEEE, vol.11, no.4, pp.32,41, Winter 1994.
- [2] V. G. Oklobdzija, Digital System Clocking—High-Performance and Low-Power Aspects. New York, NY, USA: Wiley, 2003.
- [3] M. S. Hosny and W. Yuejian, "Low power clocking strategies in deep submicron technologies," in Proc. IEEE Intll. Conf. Integr. Circuit Design Technol., Jun. 2008, pp. 143–146.

| International Conference on Electronic Circuit | s and Signalling Technolog | gies             | IOP Publishing      |
|------------------------------------------------|----------------------------|------------------|---------------------|
| Journal of Physics: Conference Series          | <b>2325</b> (2022) 012034  | doi:10.1088/1742 | -6596/2325/1/012034 |

- [4] Ashutosh Gupta and Kota Solomon Raju, "Design and Implementation of 32-bit Controller for Interactive Interfacing with Reconfigurable Computing Systems" International Journal of Computer Science and Information Technology (IJCSIT), Vol.1, No.2, pp 80-87, Nov 2009. ISSN: 0975-3826(online); 0975-4660.
- [5] Gupta, A., Duhan, M., & Raju Kota, S. (2009). HDL Implementation of Sine-Cosine Function Using CORDIC Algorithm in 32-Bit Floating Point Format. The Icfai University Journal of Science & Technology, 5(2), 40-48.
- [6] Sharma P and Gupta A. (2009), "Design, Implementation and Optimization of Highly Efficient UART", The IUP Journal of Science and technology, Vol: 5, No. 4, pp. 21-30.
- [7] Sterpone, L.; Carro, L.; Matos, D.; Wong, S.; Fakhar, F., "A new reconfigurable clock-gating technique for low power SRAM-based FPGAs," Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011, vol., no., pp.1,6, 14-18 March 2011.
- [8] Jagrit Kathuria, M. Ayoubkhan, Arti Noor, MIT International Journal of Electronics and Communication Engineering, "A Review Of clock Gating Technique", MIT Publications, ISSN 2230-7672, Vol 1, No.2, Aug 2011.
- [9] Agarwal, C.; Gupta, A.: 'Modeling, Simulation based DC Motor Speed Control by Implementing PID Controller on FPGA', IET Conference Proceedings, 2013, p. 9.07-9.07, DOI: 10.1049/cp.2013.2358 IET Digital Library.
- [10] Pandey, B.; Singh, D.; Baghel, D.; Yadav, J.; Pattanaik, M., "Clock Gated Low Power Memory Implementation on Virtex-6 FPGA," Computational Intelligence and Communication Networks (CICN), 2013 5th International Conference on , vol., no., pp.409,412, 27-29 Sept. 2013