2021 年 2021 巻 AGI-019 号 p. 04-
The evaluation of artificial intelligence in all its varieties is one of the greatest scientific challenges of our time. This is more so as we are dealing with cognition, which takes us beyond the evaluation of performance towards the evaluation of behaviour. In this talk, I will introduce a series of endeavours about the present and future measurement of AI, such as the evaluation of capabilities rather than task performance, the evaluation of general-purpose systems rather than specialised ones, the evaluation of AI extenders rather than externalised systems, the evaluation of the transformative effect on skills in the workplace, etc. To this end I will vindicate some key elements: the identification of the dimensions of difficulty to determine capability and generality profiles, the proper study of instance variation to ensure robustness in evaluation, the consideration of operating conditions on top of instance distributions, and the need of more ambitious meta-analyses of experimental data about AI measurement.