Microsoft has developed a new image-captioning algorithm that exceeds human accuracy in certain limited tests. The AI system has been used to update the company’s assistant app for the visually impaired, Seeing AI, and will soon be incorporated into other Microsoft products like Word, Outlook, and PowerPoint. There, it will be used for tasks like creating alt-text for images a function that’s particularly important for increasing accessibility.
Microsoft is offering the new captioning model as part of Azure’s Cognitive Services, so any developer can bring it into their apps. And later this year, the captioning model will also improve your presentations in PowerPoint for the web, Windows and Mac. It’ll also pop up in Word and Outlook on desktop platforms.
It’s not unusual to see companies tout their AI research innovations, but it’s far rarer for those discoveries to be quickly deployed to shipping products. Xuedong Huang, CTO of Azure AI cognitive services, pushed to integrate it into Azure quickly because of the potential benefits for users. His team trained the model with images tagged with specific keywords, which helped give it a visual language most AI frameworks don’t have. Typically, these sorts of models are trained with images and full captions, which makes it more difficult for the models to learn how specific objects interact.
“This visual vocabulary pre-training essentially is the education needed to train the system; we are trying to educate this motor memory,” Huang said in a blog post. That’s what gives this new model a leg up in the nocaps benchmark, which is focused on determining how well AI can caption images they have never seen before.
Microsoft’s new model will be how it functions in the real world. According to Boyd, Seeing AI developer Saqib Shaik, who also pushes for greater accessibility at Microsoft as a blind person himself, describes it as a dramatic improvement over their previous offering. And now that Microsoft has set a new milestone, it’ll be interesting to see how competing models from Google and other researchers also compete.