Towards Scalable and Specialized Application Error Analysis

When:

May 6, 2022 1:00 PM — 2:00 PM remote via Zoom

Where:

CDM 222 and online

Speaker:

Abdulrahman Mahmoud, Harvard University, USA

Abstract:

We find ourselves at an exciting crossroad in processor design, where the slowing down of Moore’s Law has led to the rise of specialized architectures and accelerators. At the same time, however, the tiny transistors available to us are increasingly susceptible to errors in the field, due to various phenomena such as high energy particle strikes. Traditional reliability solutions aimed at identifying and mitigating such errors can be unnecessarily expensive. Is it possible to have the best of both worlds, where we can attain high error coverage while maintaining low overhead?

In this talk, I will address this challenge in the context of DNNs, due to their prevalence in many safety-critical tasks such as in self-driving cars. By understanding the effect of errors on the outcome of DNNs, we can leverage domain-specific insights to develop low-overhead reliability techniques and avoid the heavy hammer of traditional methods. In this talk, I will describe two selective protection techniques for DNNs which operate at different granularities, and also show how the combination can be better than the sum of its parts. I will conclude with a discussion of future research avenues which extend the concept of hardware error resiliency to general perturbations and computing anomalies.

Bio:

Abdulrahman is a postdoc researcher in computer science at Harvard University, working with Dr. David Brooks and Dr. Gu-Yeon Wei. His research interests are broadly in the areas of computer architecture, machine learning, reliability, and approximate computing. His work focuses on addressing the role hardware errors play on an application’s error tolerance, by designing tools and techniques to help understand how hardware errors propagate and affect software. Abdulrahman completed his PhD at UIUC under the guidance of Dr. Sarita Adve in the RSim Research Group. During his graduate studies, he was very fortunate to be the recipient of the Mavis Future Faculty Fellowship, to be invited to the 7th Heidelberg Laureate Forum, and to receive multiple awards for teaching and mentoring undergraduate students. Prior to joining UIUC, Abdulrahman completed his BSE from Princeton University, where he was the recipient of the John Ogden Bigelow Jr. Prize in Electrical Engineering.