In this work, we propose a neural network accelerator that supports finer-grained precision tuning for both activations and weights. To the best of our knowledge, this is the first neural network accelerator supporting arbitrary bit widths within 8 bits for both neural weights and activations. Our accelerator combines the features of bit-parallel and bit-serial design methods so that we can accomplish high arithmetic-unit utilization and precision tuning flexibility simultaneously. According to the cycle accurate simulation and the synthesized result, the proposed accelerator achieves up to 1.77x energy-efficiency improvement over the bit-serial design of Stripes for the evaluated workloads.
Miniaturization of planar antennas is the first intention in antenna design for Ultra-Wideband (UWB) applications in which the surface wave effects reduce the antenna efficiency, gain and limit the bandwidth of microstrip antennas. The ground plane effects are minimized using a Defected Ground Structure (DGS) and multiple slots. A compact antenna size including both ground plane and radiator etched on the printed circuit board is 17 mm × 22 mm × 1.588 mm is proposed. Addition of an asymmetrical strip and cutting slots on the antenna provides an enhancement in operating bandwidth of 2.9 to 11.4 GHz for 10 dB return loss and the average radiation efficiency is 88.5% respectively. In particular, the impedance performance of the patch radiator is mainly due to the surface wave currents in the ground plane. The antenna results in an omnidirectional radiation pattern over the operating bandwidth with a gain of 2 dB and VSWR of 1 to 2.
Discrete Cosine Transform (DCT) is by far the most widely adopted transformation in digital image and video processing. Particularly for applications in mobile and smart devices, the required DCT needs to be realized with small circuit area and low power cost. In this paper, a new area- and power-efficient DCT architecture design is proposed. To reduce the area and power cost, temporal redundancy in matrix-vector multiplication is eliminated by time-multiplexing reconfigurable multipliers. To further reduce the complexity, a new minimization algorithm is proposed to maximize the utilization of the newly developed sporadic programmable shifters. Our experimental results on FPGA show significant circuit area reduction by at least 39.0% and power-efficiency improvement by at least 12.3% over existing advanced DCT circuits reported in the literature.
A novel high voltage diode featuring partial n+ adjusting region embedded at the anode side is proposed and analyzed by device simulation in this paper. The low and inverted on-state carrier profile is obtained to realize the fast and soft recovery behavior. The simulations show that the proposed diode achieves a 21% reduction in the reverse peak current density compared with the conventional diode. What’s more, such partial n+ adjusting region results in the parasitic npn transistor triggered during reverse recovery, and electrons are injected into the pn− junction to inhibit the peak electric field, improving the dynamic ruggedness.
We report the enhanced voltage swing of a rapid-single-flux-quantum (RSFQ) distributed amplifier by replacing a single superconducting quantum interference device (SQUID) with a double-stack-SQUID (DSS). A DSS is composed of two stacked 2-junction SQUIDs sharing one sensing inductor. Thanks to its stack structure, a DSS is expected to generate twofold output voltage. We have designed a 12-stage RSFQ distributed amplifier equipped with DSSs. The maximum output voltage swing reached 10.2 mV in simulation. Test chips were fabricated using a 25-µA/µm2 Nb integration process. In measurements, a test chip was cooled in a liquid helium bath. The experimental output voltage swing was up to 8.34 mV.
Variation poses a guard-band requirement for integrated circuit designs, which degrades performance and energy efficiency. As the voltage scales down, the circuits are more sensitive to the variation and the guard-band margin becomes unacceptable. In this paper, we propose a novel error detection and correction technique to eliminate the margin for variation. The error detection latch introduces the low overhead of only 6 transistors compared to the conventional latch. The detection and correction scheme relaxes the timing constraint for the error signal by one clock cycle by extending another latch stage next to the critical stage. The proposed technique reduces the energy per cycle by 51% with 7.6% area overhead compared to the conventional margin technique. Comparison with other works in the state of art shows the proposed technique is quite competitive.
The relationship between wave field information (wave height, wavelength, etc.) and radar echo is hard to analyze with conventional measurement and extraction methods due to the randomness and complexity of the ocean. To improve wave field information estimates, we used an ultrahigh-frequency (340 MHz) radar to probe subscale water waves in the laboratory. First-order radar cross-section Doppler spectra were measured and analyzed. The results indicate a significant linear correlation between RCS (dB) and wave height (dB) (all correlation coefficients R > 0.9) but no linear correlation exists between RCS (dB) and wavelength (dB).
In this paper, many shared cache accesses such as crossed accesses and repeated accesses will be filtered in proposed router network for purpose of access latency reduction. Firstly, the distribution features of all shared cache accesses are analyzed for further optimization on access latency. And then, a meshed router network integrated with enhanced routers is proposed for fast identification of target accesses and further handling them. Hence, the experimental results show that our network design can achieve an average improvement of 26.1 percent on speedup IPC and an average saving of 9.7 percent on energy consumption over base system.
In this paper, a novel load modulation network structure is proposed by connecting a shunt λ/2 micro-strip line at the combiner of traditional DPA. And it can achieve wideband impedance transformation for the carrier power amplifier. In this way, the performance of the DPA is improved in both bandwidth and back-off point efficiency. A broadband high-efficiency Doherty power amplifier (DPA) is designed and fabricated for 5G mobile communication. Measurement results show that the saturated output power of the DPA is above 43 dBm in the range of 3.1 to 3.7 GHz, and the saturated drain efficiency is between 65% and 73.1%. Additionally, drain efficiency of the 8 dB power back-off is above 40%. And better than −45.6 dBc adjacent leak power ratio (ACLR) can be achieved with Digital Pre-Distortion (DPD) under 10 MHz LTE signal testing at 3.3 GHz.