Manufacturers who depend on automated inspection systems for process monitoring and quality control generate large databases of product and defect imagery. As these image repositories grow in size, the manufacturer’s ability to reuse this data is limited by the large volumes, and the difficulties and intricacies of automating the description of imagery for cataloging, searching, and retrieving purposes.
The technique of content-based image retrieval (CBIR) can address the issue of image reuse. CBIR refers to methods used to index and retrieve images from databases based on their pictorial content. Pictorial content is typically defined by a set of features extracted from an image that describe the color, texture, and/or shape of the entire image or of specific image regions. For the manufacturing environment, we contend that image content encapsulates process history and experience, and that it can be used to effectively index this information for search and retrieval to identify and characterize manufacturing issues.
At Oak Ridge National Laboratory, we are developing a CBIR technology called automated image retrieval (AIR), which is based on the premise that manufacturing processes or phenomena that are similar are likely to generate images that are visually similar. This simple concept implies that statistical information about processes associated with imagery can be quickly gathered to locate and solve current manufacturing problems based on image content. Thus, AIR facilitates the retention and retrieval of expert knowledge for diagnosing and controlling processes and improving
AIR in action
Figure 9.1. Examples of defects that occur on integrated-circuit device layers during manufacturing. These include surface and embedded particles, scumming, and pattern problems
The semiconductor industry relies heavily on automated inspection technologies for monitoring wafer fabrication, using methods such as confocal and optical microscopy, atomic-force microscopy, laser scattering, and scanning electron microscopy (SEM) to generate digital imagery for failure analysis between process steps (see figure 9.1). On average, more than 20,000 images are collected every week at a typical wafer fab. Fabrication engineers store this data in a data management system (DMS) and use it to diagnose and isolate manufacturing problems. The idea is good in principle, but the semiconductor industry currently has no direct means of searching a DMS using image-based queries. A system like AIR is necessary to optimize the usefulness of this data.
The defect mask is typically a binary representation that localizes the defect boundaries within the field of view. The defect mask is used in AIR to generate an extensive description of the defect region and the substrate region. There are currently 60 numerical features measured for the substrate that describe the color, texture, and structure. The defect itself is decomposed into 51 numerical features that describe the color, texture, and shape. The user can select various feature attributes when formulating a query so that, for example, a search can be accomplished to locate one defect shape on another product substrate by ignoring color attributes, which are likely to be highly variable from one process layer or product to the next. The user also can enable or disable other descriptive groups such as texture or shape.
Once a database of images has been represented as features, this list is maintained in a database and indexed for efficient retrieval. The goal of indexing is to organize the image features such that a ranked list of nearest neighbors can be retrieved without performing an exhaustive comparison with all the records in the database. For AIR, the database is indexed by building a binary decision tree of the image features; for this work we have adapted an approximate-nearest-neighbor (ANN) indexing and search method that builds on kd-tree methods.4 Whereas an exhaustive nearest-neighbor search of the n vectors (i.e., images) in the database would be of O(n) computations, the retrieval efficiency of the ANN method is proportional to O((1/*)d/2log(n)), where d is the dimension of the feature space, n is the number of data points, and e is the nearest neighbor error.
The basic component of the AIR system is the indexing and retrieval engine, which is implemented as a Microsoft dynamic link library (DLL). In addition to the core AIR DLL, the system incorporates an ORACLE database, a set of interface DLLs and executables, and graphical user interfaces. The basic architecture of the AIR system includes a representation of the relational database that stores associated process data along with the image features.
Results from the field
In tests at two semiconductor manufacturing sites, the AIR system demonstrated good indexing and retrieval performance. For a typical database containing 80,000 optical and SEM images, the system is able to retrieve 128 nearest-neighbor images in about 8 s on a 750 MHz Windows PC. The time required to extract 111 defect and substrate features, associate them with the indexing structure, and add them to the database is approximately 0.7 s per image. These indexing and retrieval times are more than adequate for this data environment, and they demonstrate that the AIR system can maintain a sustained data rate of up to 100,000 images per day if required.
Of even greater interest is the system’s ability to predict information about the manufacturing process based on the visual similarity between a query image and retrieved results. This was tested by treating the AIR system as a k-nearest neighbor (k-NN) classifier. For this test we looked at a query image’s nonvisual information, e.g. the process layer or lot number from which a defect came, and attempted to predict this value by voting among the nearest k-returns from the database. The result was 70% correct classification performance for process layers in which there were more than 100 classes represented. The system performed well as a lot predictor, too, showing 62% correct classification out of the approximately 1100 lots represented. Although the system would not necessarily be used this way in practice, it demonstrates our basic premise that similar manufacturing processes generate visually similar imagery.
The AIR technology is now being integrated into commercial DMS systems for the semiconductor industry, but development activities continue in parallel. For example, inclusion of nonvisual information characterization data will provide an even stronger association of defects imagery with specific processes. Without the addition of content-based image retrieval, this large image repository of semiconductor data will remain virtually untapped as a resource for rapidly resolving manufacturing problems. Progress to date, however, gives every indication that CBIR-based technologies will find an important home in the field of automated inspection.
(By Kenneth Tobin and Thomas Karnowski,
Oak Ridge National Laboratory, OEMagazine, 2001, July)