Microsoft ‘Seeing AI’ App Now Explores Photos With Touch, Hopefully Won’t Get Anyone Killed

Microsoft has released a significant update for its AI-powered application for the severely visually impaired. Intended for the blind or those with low vision, Seeing AI has a laundry list of functionality that’s obviously intended to provide a wide range of capabilities. This latest update ostensibly adds the ability to explore already-taken photographs by touch. AI is used to analyze a scene and map specific objects within the photo. Run a finger over the image, and the phone will detect what objects you are near and identify them to you.

If you’ve ever even attempted to use a screen reader, you’re probably aware of how poorly the technology maps to the modern way we use computers. While they work fairly well for some applications, screen readers don’t map very easily to the modern web. A number of authors on accessibility have spoken to this topic rather well. The bottom line is that assistive technology could absolutely use a boost. I checked out Seeing AI, hoping it would point the way towards a better future. After spending some time with it, I’d say it’s got genuine promise and some problems that make it tough to recommend as a reliable source of information just now. Given that the entire purpose of the application is to identify things, that might be concerning.

Seeing AI uses your phone camera + artificial intelligence to identify whatever it’s “seeing.” Options include:

Short Text
Document
Product (scans for bar codes)
Person (can be taught to recognize individual people)
Currency
Scene
Color
Handwriting
Light (uses audio cues to tell you when the camera is pointed at the brightest light source it can see)

The photo identification and analysis feature was the major one we wanted to test today because that’s what Microsoft added with this update, but we took the other functions out for a spin as well. The app works reasonably well at identifying documents and text, particularly if you use a white background. We weren’t able to correctly identify any product bar codes; this aspect of the app always failed with a “Sorry, something went wrong” message. When we attempted to use the ‘Scene’ function, the system commonly misidentified furniture, such as mistaking a dresser for a desk.

As for the photography section, well… it might be easier to show you than describe it. Allowing the AI to analyze various images for content produced interesting results. The slideshow below contains our results. I threw a variety of images at the application to analyze what it could identify and what it couldn’t.

The dog isn't *in* the basket. The identification process apparently has trouble with depth.

The not-dog isn't covered in snow.

Come on guys. At this point, you're giving the cat a complex.

Well, at least it got the *species* right this time.

I'm not sure how to describe what Kim Kardashian is wearing, but "suit and tie" isn't what I'd pick. Again, the app has problems with figuring out which depth information is associated with which person.

This is the additional data captured by the application for the previous photo. The app properly detects two people, but doesn't mention or include any data at all about the second. As far as it's concerned, this person isn't in the image.

An example of how text analysis is handled. The app properly picked up each individual line of text and read the data properly when prompted.

Nope. Pomegranates.

This is the one image that was properly identified and accurately conveyed.

Not exactly. The app was unable to recognize that the pan contained bacon, despite being given multiple versions of this shot from different angles. It sometimes recognized that the pan contained food and sometimes thought there was another object involved.

Overall performance simply wasn’t very good. Text analysis was fairly strong and the application does a good job reading the material it sees, but everything else was a crapshoot. Seeing AI can’t tell dogs and cats apart very easily. It can’t tell if there’s a plate on the stove. It can’t properly identify different foods.

As for the touch-based photographic analysis, that feature is also lacking.

Seeing AI correctly identified this as a cat, though it mistook Alisdair for a Persian (he’s mostly Maine Coon). It even insisted on “reading” the V in his fur as an actual V. (It pulled this trick on carpet patterns as well at one point, insisting that a rug in the dining room was labeled AAAAAAAAAAA repeating). The app describes him as a “Persian cat sleeping,” which isn’t too bad. But this level of description is rare. Most of the time, what you get is “Person” and a large blue box around a single individual. The app will tell you how many objects it recognizes, but it doesn’t actually show you where the objects are. You have to trace your finger across the entire screen, hoping to hit whatever object the camera saw. Only one object at a time is outlined in blue (as above), and it’s always the last object you touched.

This makes no difference to the blind, who would have to explore the entire image no matter what, but it’s surprising that Microsoft didn’t realize low-vision users would also benefit from seeing all of the boxes where objects were detected in a given image at once, rather than hunting for them individually through blind finger-seeking. Multiple photos also came back with “No information detected,” with no data on why this was or how we could solve the problem.

On the whole, we’d give the app an “A” for effort. It genuinely attempts to provide new and interesting capabilities in an area where they’re sorely needed. The actual execution, however, is still quite lacking. Clearly, these models need more training before anyone could actually rely on them for navigating or object identification. Objects are not identified properly, they aren’t placed in proper context within the photo, and the numerous issues Seeing AI has with depth would make us nervous if we were genuinely attempting to use the app to navigate the physical world. Telling a blind person that there’s a plate over the top of a pan on the stove when no such plate exists could be downright dangerous. The ability to peruse photos by touch, while interesting, ultimately wasn’t nearly accurate enough to provide a compelling use case.

We support what Microsoft is trying to do here. We sympathize with the difficulty and hope the company continues. But while Seeing AI might be good for specific things, like properly identifying currency (I wasn’t able to test this aspect, not having any cash on hand just at the moment), it didn’t wow us with its ability to interpret the waking world.

Modular AI Wheelchairs Can Watch for Obstacles, Incorporate Head TrackingMicrosoft’s New Adaptive Controller Is for Gamers With Limited MobilityChinese Researchers Use the Kinect to Translate Sign Language to Text

Continue reading

Nvidia Says It Won’t Nerf Crypto Mining on Existing GPUs

We're living in a perfect storm for GPU price inflation, between crypto mining, pandemic gaming, and the global semiconductor shortage. Nvidia hopes to combat the resulting high GPU prices with the upcoming RTX 3060 while letting existing cards mine at full speed.

Microsoft’s Windows 10X Woes Show Why Apple Won’t Unify macOS, iOS

Microsoft ‘Seeing AI’ App Now Explores Photos With Touch, Hopefully Won’t Get Anyone Killed

Continue reading

Nvidia Says It Won’t Nerf Crypto Mining on Existing GPUs

Microsoft’s Windows 10X Woes Show Why Apple Won’t Unify macOS, iOS

Newegg Won’t Sell You Certain Components Unless You Pay for Its PC-Building Service

Microsoft Won’t Restrict DirectStorage Support to Windows 11