Science Spotlight: Computer Vision

The hallway of the sixth floor of the McCardell Bicentennial Hall, home to the computer science department, is lined with posters with titles like “3D Capture of Complex Real-World Images” and “Game Development in Java.” The department proudly displays the accomplishments of past and present majors on its website. Anne Blasiak ’07, is featured in the alumni section for her 2008 National Science Foundation Fellowship award, and David Fouhey ’11 took a trip to Istanbul in 2010 to present his summer research project findings at an international conference.

Less prominent online is mention of the Middlebury Vision Benchmark, an internationally renowned collection of test data for stereovision research that has been built and maintained by Professor of Computer Science and Department Chair Daniel Scharstein.

Initially, Scharstein avoided discussion of the Middlebury Vision Benchmark. He focused instead on the other two main threads of computer vision research upon which he and his students have focused for the past several years: vision-based robot navigation and cell phone navigation.

Eventually though, Scharstein opened up about the Middlebury Vision Benchmark.

“Actually, the thing that I’m most known for in the computer vision computing community is the Middlebury Benchmark,” he said. “If you just Google “Vision at Middlebury” you’ll get to this collection of test data that I maintain here.”

Scharstein’s primary research interest is in stereovision: the ability to obtain three-dimensional information from two overlapping two-dimensional images. This is the process humans use to navigate, and it has proved difficult to replicate in computers. In fact, research on the problem of recreating human vision with a computer has been ongoing since the 60’s and 70’s.

Current methodology uses an algorithm — an ordered list of commands with which a computer executes a program — to measure the distance that any one pixel moves between two overlapping images. That information is used to create a “depth map” of the scene portrayed in the two images. Algorithms can do this with varying degrees of accuracy. The problem is that without some sort of benchmark data (a control), researchers cannot be sure how close to the ground truth — the true answer — their algorithm is.

Assistant Professor of Computer Science David Kauchack explained that a benchmark data set “allows researchers from around the world to easily and quantitatively compare new algorithms to previous state of the art approaches. In many other fields, where standardized benchmarks don’t exist, this can be a very painful and error-prone process.”

Scharstein was working on the problem with Richard Szeliski (who currently works for Microsoft in Redmond, Washington) in the 90’s. The two of them realized the need for benchmark data in the field of stereovision, so they created the first Middlebury Vision Benchmark in 2003.

“We had the idea to create ground truths for test data using alternative techniques that give us more information than can currently be generated with stereovision algorithms,” said Scharstein. “We basically have more information than all the researchers that we give the test data to. Then they can run our images with their methods, upload their results, and compare their results to other researchers around the world with our database. So the Middlebury Vision Benchmark is basically a performance indicator for researchers in the field.”

In the 10 years that the database has been in existence, it has become the premier benchmark for stereovision research around the world.

“Right now we have 150 methods submitted to our database,” Scharstein said. “Anyone working in this field around the world accesses this database, and they’ve all heard of Middlebury.”

Students have also become involved in this project in recent years. This past summer Nera Nesic ’13 and Xi Wang ’14 worked with Scharstein to create a new set of ground truths that will be published sometime in the coming year.

“Our goal was to add more realism to our scenes,” said Nesic. “Previous data sets have been built in the lab and can generally be described as an unlikely gathering of visually interesting objects. We wanted to move to environments a stereovision application would be more likely to encounter in the real world.”

Scharstein noted that by setting targets for researchers to strive for, the Middlebury Vision Benchmark is driving stereovision research forward. Ultimately, this project becomes a valuable experience for both professor and student researchers.

According to Scharstein the processing of designing a new dataset “involves a lot of experimentation. The students have really built these systems for me, systems that track pixels or tell the projectors to project patterns, or cameras to take pictures. And what’s unusual and valuable about this is how that student effort is impacting the field of stereovision.”

Listen to Professor Daniel Scharstein discuss his work with the Campus’ Will Henriques.

Comments

Reel Critic: “Problemista”

‘Should we Ban TikTok?’: Middlebury weighs in

Turning a new page: 10 books for graduating seniors