Microsoft’s Seeing AI App Sounds Like TapTapSee On Steroids

I haven’t tried it for myself just yet since this is the first I’ve heard of it, but if Microsoft’s Seeing AI app works as advertised, holy shit!

Seeing AI, a free app that narrates the world around you, is available now to iOS customers in the United States, Canada, India, Hong Kong, New Zealand and Singapore.
Designed for the blind and low vision community, this ongoing research project harnesses the power of artificial intelligence to open up the visual world and describe nearby people, text and objects.

The app uses artificial intelligence and the camera on your iPhone to perform a number of useful functions:

  • Reading documents, including spoken hints to capture all corners of a document so that you capture the full page. It then recognizes the structure of the document, such as headings, paragraphs and lists, allowing you to rapidly skip through the document using voiceover.
  • Identifying a product based on its barcode. Move the phone’s camera over the product; beeps indicate how close the barcode is – the faster the beeps, the closer you are – until the full barcode is detected. It then snaps a photo and reads the name of the product.
  • Recognizing people based on their face, and providing a description of their visual appearance, such as their gender, facial expression and other identifying characteristics.
  • Recognizing images within other apps – just tap Share, and Recognize with Seeing AI.

In addition to full documents and barcodes, it will also be able to read things like signs and labels, which if done well could be a pretty big step up from what the still awesome and useful TapTapSee does now. Oh, and it will even try to describe any picture you take in detail, a handy feature for anyone who has ever let a sighted friend borrow their phone or had one take a photo for you only to discover that they actually took 12 of them.

And remember, all of this is free. Maybe it’s only free because it’s a research project, but if it’s going to lead to greater accessibility in all sorts of mainstream applications down the line, who cares?

Join the Conversation


  1. I ran and grabbed it yesterday since it’s free. I haven’t had luck getting an entire document, or scanning barcodes yet, but when I first got the app, the default was on short text. I was flicking around the screen, not realizing that I had the camera aimed at my laptop, and lo, it read the tweet that was on the screen, which just happened to be Steve’s. I tried the experimental scene application and it guessed things like a kitchen table, a building next to a window (when I aimed it out the window), and a brown dog sleeping on a bed. He’s yellow, but whatever.

    I’m very excited about this app. It seems nearly effortless to use. OH, and with VO on my Mac, sometimes there are things on the screen that VO doesn’t read, such as error messages, and with Seeing AI, I think I’ll be able to troubleshoot issues when they pop up now. So cool!

    function function.

  2. You’re not the first person to say that some of the recognition is a little wonky (girls being boys, glasses where none exist, wrong hair colour), but I guess that’s where the words ongoing research project come in. I’m just amazed that we’re at a point where we can even try and that much of it is actually working as intended with minimal effort.

    Meep meep.

  3. I have the app now and I can’t stop taking pictures of stuff. I’ve done myself (who apparently looks 41 if I don’t smile but 35 if I do), shots out some windows (the kitchen where apparently there was a group of people I had no idea were down in the parking lot or off the balcony where there’s a road close bye that it actually picked out as probably a road), or stuff in the house (it’s pretty decent at knowing what furniture is although it can’t seem to tell a loveseat from a couch, they’re all sofas). I’ve had no luck with the barcode bit, possibly because Canadian barcodes and American ones are different, although it should have a database for all of its supported countries so maybe I’m just not good with that bit yet. I scanned some envelopes that I still need to put through the real scanner and it was a bit hit and miss, but I also wasn’t fair to it. I aimed it at the stack and it still managed to give me just enough to know that the one on top was a bank statement, so maybe once I actually try it for serious it’ll do really well.

    Pretty cool app for sure.

  4. Yeah even here in the office I’ve been randomly pointing it at stuff to see what happens. So far I’ve been a woman and a man who is 22, my hair has been blond and brown, and I’ve had imaginary glasses…but still. I’m terrible…I can’t hold a smile long enough. I might set it to the task of identifying gift cards lol. That is probably setting it up for failure, but hey.

  5. I hadn’t thought to take a picture of myself haha! I tried taking a picture in the waiting room at physical therapy to see if the person flipping magazine pages was a man or woman, but the app got hung up and the request timed out. Darn. I get bad reception there. Guess I’m gonna have to take some selfies and see how old I look. Ha!

  6. I have only one question. It says it recognizes faces. So, since its Microsoft and Microsoft knows everything, is it going to do that thing like what our phones have started doing where it will, say, take a picture of Carin and then say “This is a picture of Maybe Carin”? Like how much does it recognize? Does it just recognize that this thing is a face and not an air conditioner? Or does it recognize who the face might belong to? Cuz I wouldn’t put that past them. Otherwise though that sounds interesting. I could see how this would actually come in handy in my line of work, as I work with so many non-verbal people. I wonder if I could take a picture of, say, Kayl, and it would then tell me that he was sitting in the circle smiling. That would be useful stuff to know, since some people don’t remember that they need to give me feedback.

    1. You have to specifically train it to recognize certain faces. It doesn’t just do that on its own. I haven’t messed with that bit yet, but I think Carin did unless I’m misremembering something.

      1. Yeah you have to take a picture of someone’s face 3 times and then name them. It does tell you when you’ve got what you think is a face. I trained it to know my face since I was the only willing volunteer. So it might take some work but I’d love to know how it goes.

Leave a comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.