Dear friends,
The image below shows two photos of the same gear taken under different conditions. From the point of view of a computer-vision algorithm — as well as the human eye — the imaging setup that produced the picture on the right makes a defect in the gear much easier to spot.
This example illustrates the power of data-centric AI development. If you want to improve a neural network’s performance, often improving the data it analyzes is far quicker and easier than tinkering with its architecture. In this case, adjusting the imaging setup made the difference.
How can you tell that your imaging setup has room for improvement? If you can look at a physical object from a given angle and spot a defect, but you don’t see it clearly in a photo taken from the same angle, then your imaging setup likely could be improved. Parameters that you can control include
- Illumination: Is the scene well lit (with diffuse and/or spot lighting), at angles that make clearly visible the features you want your model to recognize? Have you controlled ambient sources such as windows and reflections that may make images less consistent? Are the resulting images consistent and free of glare?
- Camera position: Make sure the camera is well positioned to capture the relevant features. A defect in, say, a drinking glass or touch screen may be visible from one angle but not from another. And a camera that shakes or moves in response to surrounding vibrations can’t produce consistent images.
- Image resolution: The density of pixels that cover a given area should be high enough to capture the features you need to see.
- Camera parameters: Factors such as focus, contrast, and exposure time can reveal or hide important details. Are the features you aim to detect clearly in focus? Are the contrast and exposure chosen to make them easy to see?
While deep learning has been used successfully with datasets in which the examples vary widely — say, recognizing faces against backgrounds that range from a crowded concert hall to an outdoor campsite — narrowing the data distribution simplifies computer vision problems. For example, if you want to detect diseased plants, deep learning may be your best bet if you have pictures of plants taken at various distances and under various lighting conditions. But if all the pictures are taken at a fixed distance under uniform lighting, the problem becomes much easier. In practical terms, that means the model will be more accurate and/or need a lot fewer examples. With a consistent dataset, I’ve seen neural networks learn to perform valuable tasks with just 50 images per class (even though I would love to have had 5,000!).
Robotics engineers are accustomed to paying attention to the design of imaging systems (as well as audio and other sensor systems). Such attention also can benefit machine learning engineers who want to build practical computer vision systems.
Recently I had the pleasure of writing an article with machine vision guru David Dechow that describes these ideas in greater detail. The article focuses on manufacturing, but the approach it describes applies to many computer vision projects where you can influence the imaging setup. Please take a look!
Keep learning,
Andrew