This year we really started to see AI go mainstream. Systems like Stable Diffusion and ChatGPT captured the public imagination to an extent we haven’t seen before in our field. These are exciting times, and it feels like we are on the cusp of something great: a shift in capabilities that could be as impactful as — without exaggeration — the industrial revolution.
But amidst that excitement, we should be extra wary of hype and extra careful to ensure that we proceed responsibly.
Consider large language models. Whether or not such systems really “have meaning,” lay people will anthropomorphize them anyway, given their ability to perform arguably the most quintessentially human thing: to produce language. It is essential that we educate the public on the capabilities and limitations of these and other AI systems, especially because the public largely thinks of computers as good old-fashioned symbol-processors — for example, that they are good at math and bad at art, while currently the reverse is true.
Modern AI has important and far-reaching shortcomings. Systems are too easily misused or abused for nefarious purposes, intentionally or inadvertently. Not only do they hallucinate information, they do so with seemingly very high confidence and without the ability to attribute or credit sources. They lack a rich-enough understanding of our complex multimodal human world and do not possess enough of what philosophers call “folk psychology,” the capacity to explain and predict the behavior and mental states of other people. They are arguably unsustainably resource-intensive, and we poorly understand the relationship between the training data going in and the model coming out. Lastly, despite the unreasonable effectiveness of scaling — for instance, certain capabilities appear to emerge only when models reach a certain size — there are also signs that with that scale comes even greater potential for highly problematic biases and even less-fair systems.
My hope for 2023 is that we’ll see work on improving all of these issues. Research on multimodality, grounding, and interaction can lead to systems that understand us better because they understand our world and our behavior better. Work on alignment, attribution, and uncertainty may lead to safer systems less prone to hallucination and with more accurate reward models. Data-centric AI will hopefully show the way to steeper scaling laws, and more efficient ways to turn data into more robust and fair models.
Finally, we should focus much more seriously on AI’s ongoing evaluation crisis. We need better and more holistic measurements — of data and models — to ensure that we can characterize our progress and limitations, and understand, in terms of ecological validity (for instance, real-world use cases), what we really want out of these systems.
Douwe Kiela is an adjunct professor in symbolic systems at Stanford University. Previously, he was the head of research at Hugging Face and a research scientist at Facebook AI Research.