Over the past decade, deep learning has become a dominant paradigm in a range of machine learning tasks, including but not limited to computer vision, natural language processing, speech recognition, generative artificial intelligence, reinforcement learning, and recommendation systems. The success of deep learning is attributed to various factors, such as advanced computational resources, abundant data availability and most critically, well-designed deep neural architectures. However, the manual design of high-performing architectures often necessitates expert knowledge and considerable time. In parallel to how deep learning has automated feature extraction, replacing manual feature engineering, Neural Architecture Search (NAS) has emerged as a promising methodology for automating deep neural architecture design. Remarkably, architectures designed through NAS have consistently outperformed their human-designed counterparts across multiple tasks.
Nevertheless, a trade-off exists between efficiency and effectiveness in NAS research. The considerable computational overhead primarily arises from the need to evaluate myriad candidate architectures during the search phase. Conventionally, the evaluation of each candidate architecture involves computationally intensive forward and backward training procedures -- a reliable yet inefficient methodology. Alternative strategies, such as surrogate models, offer approximations of a candidate's performance, thereby reducing computational costs. However, these approximation techniques often introduce biases, leading to misranking and suboptimal search outcomes. Hence, a judicious evaluation strategy is essential for both efficient and effective NAS.
This thesis studies the neural architecture search from the efficiency and effectiveness perspectives. Specifically, it tackles the challenges in the candidate networks evaluation, include building a reliable performance predictor with limited training samples, utilising a low-cost but unreliable training-free performance metric and devising a cross-task and cross-space reliable training-free performance metric.
This thesis offers several pivotal contributions. First, it introduces a unique sampling strategy, along with high-fidelity weight inheritance techniques, which enables the development of a robust regression-based performance predictor using limited samples. Secondly, a training-free performance metric is adopted to further reduce computational costs, while a contrastive predictor, along with an active learning strategy, is leveraged to overcome the unreliability of this metric. Thirdly, a novel and potent training-free performance metric is presented, demonstrating a strong correlation with the networks' ground-truth performance across diverse types of tasks and search spaces. These innovations have capability to reduce the search cost of neural architecture search (NAS) from days down to mere minutes.
In summary, this thesis introduces innovative solutions to pressing challenges in the field of neural architecture search, particularly those related to candidate network evaluation. These advances catalyse the development of efficient NAS algorithms and broaden their applicability, even for audiences with constrained computational resources. Furthermore, machine-designed network architectures not only surpass human-designed ones but also provide valuable insights into artificial neural network research.