Google’s Clips AI Camera Was Trained in Photography by Pro Photographers

Jan 27, 2018

Michael Zhang

In October 2017, Google announced Clips, a small hands-free AI-powered camera that’s designed to capture your life’s memories without much human intervention. The camera isn’t on store shelves yet, but Google is revealing some new interesting details about it. One such detail is that the camera was trained with the help of real professional photographers.

Clips uses artificial intelligence to automatically capture memorable moments in your life, so Google needed to teach it to recognize photos worth keeping while ignoring throwaway snapshots. The goal is to allow people to enjoy moments more while trusting Google Clip to preserve memories for them, instead of being so absorbed in capturing photos that you miss out on experiences.

“This year, people will take about a trillion photos, and for many of us, that means a digital photo gallery filled with images that we won’t actually look at,” Google writes. “This is especially true with new parents, whose day-to-day experience is full of firsts.

“During moments that can feel precious and fleeting, users are drawn to their smartphone cameras in hopes of capturing and preserving memories for their future selves. As a result, they often end up viewing the world through a tiny screen instead of interacting using all their senses.”

To train Clips in photography, Google turned to a strategy called “human-centered machine learning.” The company put out ambiguous job listings for photographers and ended up hiring a documentary photographer, a photojournalist, and a fine arts photographer. The group then began gathering footage from Clips team members and attempted to answer the question: “What makes a memorable moment?”

“We had romanticized conversations about depth of field, rule of thirds, dramatic lighting, match cuts, storytelling … but what I learned was that we should never underestimate the profound human capability to wield common sense,” writes Clips UX designer Josh Lovejoy. “Basically, we’re trying to teach English to a two-year-old by reading Shakespeare instead of Go, Dog. Go!.”

To establish and improve a baseline of quality, Google trained its AI on what bad photos looked like: objects blocking the shot, blurry images, photos from inside pockets and purses, etc.

Google then trained the AI on things like composition (e.g. stability, sharpness, and framing), recognizing social norms (detecting social cues and people who are consenting to photos), editing (picking out special photos instead of mundane ones), and more.

“Success with Clips isn’t just about keeps, deletes, clicks, and edits (though those are important), it’s about authorship, co-learning, and adaptation over time,” Google says. “We really hope users go out and play with it.”

Google Clips will cost $249 and will be available soon through the Google website (where you can currently put your email on the waitlist).