Computer-science-oriented and neuroscience-oriented are two general approaches to developing Artificial General Intelligence (AGI). In this study, a silicon neuron transistor is developed using the neuroscience approach for AGI applications. Neuronal behavior (“weighted sum and threshold” function) is based on the complementary metal-oxide-semiconductor (CMOS) negative differential resistance (NDR) theory. The neuron transistor is implemented by the UMC 180-nm commercial standard CMOS process, which is beneficial to implement an entire neural network or integrate with other CMOS circuits on the same chip. The neuron transistor is composed of three inputs Vg1, Vg2, and Vg3 and a control terminal, Vcon, a load terminal, Vb(load), and a driver terminal, Vb(driver) respectively. The width of each input is 1.8 µm, and the inputs have 1, 2, and 4 fingers respectively, that is, the weight ratio is 1:2:4. Vb(load) and Vb(driver) enable a neuron transistor to function more closely resembles a real biological neuron, with improved sensitivity and less complexity comparing to a traditional artificial neural network. The neuron MOS transistor was measured at the maximum frequency of 10 kHz. It had an extremely low power consumption of <10-4 µW and a miniscule footprint 30×15 µm2. As the process feature size decreases, the chip’s operating frequency could be increased by one order of magnitude, while its power consumption and footprint will decrease.
The physical unclonable function (PUF) of the ring oscillator (RO) is applied to key generation and other fields due to its excellent physical characteristics and simple implementation on FPGA. However, the traditional frequency comparison method uses the sign bit of two ROs’ frequency difference which can only extract one bit of entropy, and the bit error rate (BER) of the response always exceeds 1% without error correction schemes. In this paper, we designed RO based on FPGA LUT unit in a single CLB, and proposed a method of getting difference after summing the first-order frequency difference, which we called Difference on Summed Difference (DSD) method. According to experimental measurement results, the DSD method can achieve the BER of 0.39%, and the uniqueness of 50.30%. In order to obtain more entropy, we proposed a composed entropy extraction method which combined the DSD method with the existing higher-order difference method. Experimental results demonstrated that the composed method totally obtained 32-bit response with the BER of 0.84% and the uniqueness of 49.15% while the BER of the existing higher-order difference method is 1.85%.
Since state-of-the-art multicore processors that execute complicated applications demand a large last-level cache (LLC) for reducing memory latency, next-generation memories have being recently attracted great attentions. Triple-level cell (TLC) spin-transfer torque (STT)-magnetic random access memories (MRAMs) provide high storage density, but degrade latency and power consumption due to three-step resistance state transition and detection processes for write and read operations, respectively. In this paper, we propose a TLC STT-MRAMs aware LLC that limit such penalties for multicore processors. Our LLC minimizes the occurrence of three-step resistance state transition and detection processes via the proposed cell division mapping and conditional block swapping techniques. Experimental results show that the proposed LLC achieves on average 17.2% higher performance and 17.9% lower power consumption than conventional LLCs comprised of TLC STT-MRAMs.
In this paper, the balanced pre-charging and group decoding are presented to realize in-memory computing which can achieve continuous read operations after only one pre-charge operation. Compared with the analog computing of multi-row parallel reading, this structure adopts digital domain computing, which improves the accuracy of computing; compared with the non multi-row in-memory computing, it improves the computation speed. The simulation results of noise margin, read times, and read speed show that the proposed method can improve memory performance. The maximum stability can be improved by 13.8%, the read speed can be improved by 75%, and the power consumption can be reduced by 11.58%. Besides, it can be widely used in a variety of memory structures. The proposed method is applied to the existing in-memory computing structures to verify the effectiveness.
Industrial Cyber-Physical Systems (ICPS) have higher requirements on the data encryption rate and security performance; meanwhile, the devices in ICPS only support the communication protocols in the lower layers, such as the physical layer and data link layer. However, the existing encryption systems are mostly developed on the application layer and not suitable for the ICPS. Moreover, the other encryption systems, equipped the security function on the link layer, update the keys periodically, and can not satisfy the high-security requirements of ICPS. Therefore, taking the high parallel property of FPGA (Field Programmable Gate Array) into consideration, this letter proposes a Dual Dynamic key chaotic Encryption System (D2ES) for ICPS, which provides the dynamics on the key generation and key update cycle on the data link layer. The system mainly includes an encryption module and a dual dynamic key module to realize the high-security performance. Finally, the proposed D2ES is implemented on FPGA platform, and extensive experiments have been conducted to evaluate its security performance. The experimental results show our D2ES provides more efficient encryption compared with the popular encryption algorithm DES (Data Encryption Standard) and can be applied for ICPS.
Among Advanced Encryption Standard (AES) operations, MixColumns/InvMixColumns is the second most computationally complex operation after S-box. It occupies a large hardware resources and critical path delay (CPD) in AES hardware implementations. To reduce the hardware complexity of the MixColumns/InvMixColumns, a whole matrix joint optimization method is proposed in this paper. All coefficient multiplications in MixColumns/InvMixColumns are combined into a single matrix multiplication in the proposed method, and larger number of common subexpressions can be shared in the combined matrix. Therefore, the area can be drastically reduced in implementations. The validity of our whole matrix joint optimization is verified by theoretical analyses and synthesis tools. Both analyses results and synthesized results indicate that, compared with column joint optimization and row joint optimization, the optimization efficiency is improved greatly in the whole matrix joint optimization. Compared with previous works, our implementations have wider area-delay tradeoff, from less delay to minimal area cost.