The Camera Versus the Human Eye

Nov 17, 2012

Roger Cicala

This article started after I followed an online discussion about whether a 35mm or a 50mm lens on a full frame camera gives the equivalent field of view to normal human vision. This particular discussion immediately delved into the optical physics of the eye as a camera and lens — an understandable comparison since the eye consists of a front element (the cornea), an aperture ring (the iris and pupil), a lens, and a sensor (the retina).

Despite all the impressive mathematics thrown back and forth regarding the optical physics of the eyeball, the discussion didn’t quite seem to make sense logically, so I did a lot of reading of my own on the topic.

There won’t be any direct benefit from this article that will let you run out and take better photographs, but you might find it interesting. You may also find it incredibly boring, so I’ll give you my conclusion first, in the form of two quotes from Garry Winogrand:

A photograph is the illusion of a literal description of how the camera ‘saw’ a piece of time and space.

Photography is not about the thing photographed. It is about how that thing looks photographed.

Basically in doing all this research about how the human eye is like a camera, what I really learned is how human vision is not like a photograph. In a way, it explained to me why I so often find a photograph much more beautiful and interesting than I found the actual scene itself.

The Eye as a Camera System

Superficially, its pretty logical to compare the eye to a camera. We can measure the front-to-back length of the eye (about 25mm from the cornea to the retina), and the diameter of the pupil (2mm contracted, 7 to 8 mm dilated) and calculate lens-like numbers from those measurements.

You’ll find some different numbers quoted for the focal length of the eye, though. Some are from physical measurements of the anatomic structures of the eye, others from optometric calculations, some take into account that the lens of the eye and eye size itself change with the contractions of various muscles.

To summarize, though, one commonly quoted focal length of the eye is 17mm (this is calculated from the Optometric diopter value). The more commonly accepted value, however, is 22mm to 24mm (calculated from physical refraction in the eye). In certain situations, the focal length may actually be longer.

Since we know the approximate focal length and the diameter of the pupil, its relatively easy to calculate the aperture (f-stop) of the eye. Given a 17mm focal length and an 8mm pupil the eyeball should function as an f/2.1 lens. If we use the 24mm focal length and 8mm pupil, it should be f/3.5. There have actually been a number of studies done in astronomy to actually measure the f-stop of the human eye, and the measured number comes out to be f/3.2 to f/3.5 (Middleton, 1958).

At this point, both of you who read this far probably have wondered “If the focal length of the eye is 17 or 24mm, why is everyone arguing about whether 35mm or 50mm lenses are the same field of view as the human eye?”

The reason is that the the measured focal length of the eye isn’t what determines the angle of view of human vision. I’ll get into this in more detail below, but the main point is that only part of the retina processes the main image we see. (The area of main vision is called the cone of visual attention, the rest of what we see is “peripheral vision”).

Studies have measured the cone of visual attention and found it to be about 55 degrees wide. On a 35mm full frame camera, a 43mm lens provides an angle of view of 55 degrees, so that focal length provides exactly the same angle of view that we humans have. Damn if that isn’t halfway between 35mm and 50mm. So the original argument is ended, the actual ‘normal’ lens on a 35mm SLR is neither 35mm nor 50mm, it’s halfway in between.

The Eye is Not a Camera System

Having gotten the answer to the original discussion, I could have left things alone and walked away with yet another bit of fairly useless trivia filed away to amaze my online friends with. But NOOoooo. When I have a bunch of work that needs doing, I find I’ll almost always choose to spend another couple of hours reading more articles about human vision.

You may have noticed the above section left out some of the eye-to-camera analogies, because once you get past the simple measurements of aperture and lens, the rest of the comparisons don’t fit so well.

Consider the eye’s sensor, the retina. The retina is almost the same size (32mm in diameter) as the sensor on a full frame camera (35mm in diameter). After that, though, almost everything is different.

The first difference between the retina and your camera’s sensor is rather obvious: the retina is curved along the back surface of the eyeball, not flat like the silicon sensor in the camera. The curvature has an obvious advantage: the edges of the retina are about the same distance from the lens as the center. On a flat sensor the edges are further away from the lens, and the center closer. Advantage retina — it should have better ‘corner sharpness’.

The human eye also has a lot more pixels than your camera, about 130 million pixels (you 24-megapixel camera owners feeling humble now?). However, only about 6 million of the eye’s pixels are cones (which see color), the remaining 124 million just see black and white. But advantage retina again. Big time.

But if we look further the differences become even more pronounced…

On a camera sensor each pixel is set out in a regular grid pattern. Every square millimeter of the sensor has exactly the same number and pattern of pixels. On the retina there’s a small central area, about 6mm across (the macula) that contains the densest concentration of photo receptors in the eye. The central portion of the macula (the fovea) is densely packed with only cone (color sensing) cells. The rest of the macula around this central ‘color only’ area contains both rods and cones.

The macula contains about 150,000 ‘pixels’ in each 1mm square (compare that to 24,000,000 pixels spread over a 35mm x 24mm sensor in a 5DMkII or D3x) and provides our ‘central vision’ (the 55 degree cone of visual attention mentioned above). Anyway, the central part of our visual field has far more resolving ability than even the best camera.

The rest of the retina has far fewer ‘pixels’, most of which are black and white sensing only. It provides what we usually consider ‘peripheral vision’, the things we see “in the corner of our eye”. This part senses moving objects very well, but doesn’t provide enough resolution to read a book, for example.

The total field of view (the area in which we can see movement) of the human eye is 160 degrees, but outside of the cone of visual attention we can’t really recognize detail, only broad shapes and movement.

The advantages of the human eye compared to the camera get reduced a bit as we leave the retina and travel back toward the brain. The camera sends every pixel’s data from the sensor to a computer chip for processing into an image. The eye has 130 million sensors in the retina, but the optic nerve that carries those sensors’ signals to the brain has only 1.2 million fibers, so less than 10% of the retina’s data is passed on to the brain at any given instant. (Partly this is because the chemical light sensors in the retina take a while to ‘recharge’ after being stimulated. Partly because the brain couldn’t process that much information anyway.)

And of course the brain processes the signals a lot differently than a photography camera. Unlike the intermittent shutter clicks of a camera, the eye is sending the brain a constant feed video which is being processed into what we see. A subconscious part of the brain (the lateral geniculate nucleus if you must know) compares the signals from both eyes, assembles the most important parts into 3-D images, and sends them on to the conscious part of the brain for image recognition and further processing.

The subconscious brain also sends signals to the eye, moving the eyeball slightly in a scanning pattern so that the sharp vision of the macula moves across an object of interest. Over a few split seconds the eye actually sends multiple images, and the brain processes them into a more complete and detailed image.

The subconscious brain also rejects a lot of the incoming bandwidth, sending only a small fraction of its data on to the conscious brain. You can control this to some extent: for example, right now your conscious brain is telling the lateral geniculate nucleus “send me information from the central vision only, focus on those typed words in the center of the field of vision, move from left to right so I can read them”. Stop reading for a second and without moving your eyes try to see what’s in your peripheral field of view. A second ago you didn’t “see” that object to the right or left of the computer monitor because the peripheral vision wasn’t getting passed on to the conscious brain.

If you concentrate, even without moving your eyes, you can at least tell the object is there. If you want to see it clearly, though, you’ll have to send another brain signal to the eye, shifting the cone of visual attention over to that object. Notice also that you can’t both read the text and see the peripheral objects — the brain can’t process that much data.

The brain isn’t done when the image has reached the conscious part (called the visual cortex). This area connects strongly with the memory portions of the brain, allowing you to ‘recognize’ objects in the image. We’ve all experienced that moment when we see something, but don’t recognize what it is for a second or two. After we’ve recognized it, we wonder why in the world it wasn’t obvious immediately. It’s because it took the brain a split second to access the memory files for image recognition. (If you haven’t experienced this yet, just wait a few years. You will.)

In reality (and this is very obvious) human vision is video, not photography. Even when staring at a photograph, the brain is taking multiple ‘snapshots’ as it moves the center of focus over the picture, stacking and assembling them into the final image we perceive. Look at a photograph for a few minutes and you’ll realize that subconsciously your eye has drifted over the picture, getting an overview of the image, focusing in on details here and there and, after a few seconds, realizing some things about it that weren’t obvious at first glance.

So What’s the Point?

Well, I have some observations, although they’re far away from “which lens has the field of view most similar to human vision?”. This information got me thinking about what makes me so fascinated by some photographs, and not so much by others. I don’t know that any of these observations are true, but they’re interesting thoughts (to me at least). All of them are based on one fact: when I really like a photograph, I spend a minute or two looking at it, letting my human vision scan it, grabbing the detail from it or perhaps wondering about the detail that’s not visible.

Photographs taken at a ‘normal’ angle of view (35mm to 50mm) seem to retain their appeal whatever their size. Even web-sized images shot at this focal length keep the essence of the shot. The shot below (taken at 35mm) has a lot more detail when seen in a large image, but the essence is obvious even when small. Perhaps the brain’s processing is more comfortable recognizing an image it sees at its normal field of view. Perhaps it’s because we photographers tend to subconsciously emphasize composition and subjects in a ‘normal’ angle-of-view photograph.

The photo above demonstrates something else I’ve always wondered about: does our fascination and love for black and white photography occur because it’s one of the few ways the dense cone (color only) receptors in our macula are forced to send a grayscale image to our brain?

Perhaps our brain likes looking at just tone and texture, without color data clogging up that narrow bandwidth between eyeball and brain.

Like ‘normal-angle’ shots, telephoto and macro shots often look great in small prints or web-sized JPGs. I have an 8 × 10 of an elephant’s eye and a similar-sized macro print of a spider on my office wall that even from across the room look great. (At least they look great to me, but you’ll notice that they’re hanging in my office. I’ve hung them in a couple of other places in the house and have been tactfully told that “they really don’t go with the living room furniture”, so maybe they don’t look so great to everyone.)

There’s no great composition or other factors to make those photos attractive to me, but I find them fascinating anyway. Perhaps because even at a small size, my human vision can see details in the photograph that I never could see looking at an elephant or spider with the ‘naked eye’.

On the other hand, when I get a good wide angle or scenic shot I hardly even bother to post a web-sized graphic or make a small print (and I’m not going to start for this article). I want it printed BIG. I think perhaps so that my human vision can scan through the image picking out the little details that are completely lost when its downsized. And every time I do make a big print, even of a scene I’ve been to a dozen times, I notice things in the photograph I’ve never seen when I was there in person.

Perhaps the ‘video’ my brain is making while scanning the print provides much more detail and I find it more pleasing than the composition of the photo would give when it’s printed small (or which I saw when I was actually at the scene).

And perhaps the subconscious ‘scanning’ that my vision makes across a photograph accounts for why things like the ‘rule of thirds’ and selective focus pulls my eye to certain parts of the photograph. Maybe we photographers simply figured out how the brain processes images and took advantage of it through practical experience, without knowing all the science involved.

But I guess my only real conclusion is this: a photograph is NOT exactly what my eye and brain saw at the scene. When I get a good shot, it’s something different and something better, like what Winogrand said in the two quotes above, and in this quote too: