Segmentation based object categorization

Segmentation based object categorization

The image segmentation problem is concerned with partitioning an image into multiple regions according to some homogeneity criterion. This article is primarily concerned with graph theoretic approaches to image segmentation.

Applications of Image Segmentation

* Image Compression
** Segment the image into homogeneous components, and use the most suitable compression algorithm for each component to improve compression.
* Medical Diagnosis
** Automatic segmentation of MRI images for identification of cancerous regions.
* Mapping and Measurement
** Automatic analysis of remote sensing data from satellites to identify and measure regions of interest.

egmentation using Normalized Cuts

Graph theoretic formulation

The set of points in an arbitrary feature space can be represented as a weighted undirected complete graph G = (V, E), where the nodes of the graph are the points in the feature space. The weight w_{ij} of an edge (i, j) in E is a function of the similarity between the nodes i and j. In this context, we can formulate the image segmentation problem as a graph partitioning problem that asks for a partition V_1, cdots, V_k of the vertex set V, where, according to some measure, the vertices in any set V_i have high similarity, and the vertices in two different sets V_i, V_j have low similarity.

Normalized Cuts

Let G = (V, E) be a weighted graph. Let A and B be two subsets of vertices.

Let:

w(A, B) = sum limits_{i in A, j in B} w_{ij}

ncut(A, B) = frac{w(A, B)}{w(A, V)} + frac{w(A, B)}{w(B, V)}

nassoc(A, B) = frac{w(A, A)}{w(A, V)} + frac{w(B, B)}{w(B, V)}

In the normalized cuts approach [Jianbo Shi and Jitendra Malik (1997): "Normalized Cuts and Image Segmentation", IEEE Conference on Computer Vision and Pattern Recognition, pp 731-737 ] , for any cut (S, overline{S}) in G, ncut(S, overline{S}) measures the similarity between different parts, and nassoc(S, overline{S}) measures the total similarity of vertices in the same part.

Since ncut(S, overline{S}) = 2 - nassoc(S, overline{S}), a cut (S^{*}, {overline{S^{*}) that minimizes ncut(S, overline{S}) also maximizes nassoc(S, overline{S}).

Computing a cut (S^{*}, {overline{S^{*}) that minimizes ncut(S, overline{S}) is an NP-hard problem. However, we can find in polynomial time a cut (S, overline{S}) of small normalized weight ncut(S, overline{S}) using spectral techniques.

The Ncut Algorithm

Let D be an n imes n diagonal matrix with d on the diagonal, and let W be an n imes n symmetrical matrix with W_{ij} = w_{ij}.

After some algebraic manipulations, we get:

min limits_{(S, overline{S})} ncut(S, overline{S}) = min limits_y frac{y^T (D - W) y}{y^T D y}

subject to the constraints:
* y_i in {1, -b }, for some constant -b
* y^t D 1 = 0

Minimizing frac{y^T (D - W) y}{y^T D y} subject to the constraints above is NP-hard. To make the problem tractable, we relax the constraints on y, and allow it to take real values. The relaxed problem can be solved by solving the generalized eigenvalue problem (D - W)y = lambda D yfor the second smallest generalized eigenvalue.

The partitioning algorithm:
# Given a set of features, set up a weighted graph G = (V, E), compute the weight of each edge, and summarize the information in D and W.
# Solve (D - W)y = lambda D y for eigenvectors with the smallest eigenvalues.
# Use the eigenvector with the smallest eigenvalue to bipartition the graph.
# Decide if the current partition should be subdivided.
# Recursively partition the segmented parts, if necessary.

Example

Figures 1-7 exemplify the Ncut algorithm.

Limitations

Solving a standard eigenvalue problem for all eigenvectors (using the QR algorithm, for instance) takes O(n^3) time. This is impractical for image segmentation applications where n is the number of pixels in the image.

OBJ CUT

OBJ CUT [M. P. Kumar, P. H. S. Torr, and A. Zisserman. Obj cut. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, pages 18-25, 2005.] is an efficient method that automatically segments an object. The OBJ CUT method is a generic method, and therefore it is applicable to any object category model.Given an image D containing an instance of a known object category, e.g. cows, the OBJ CUT algorithm computes a segmentation of the object, that is, it infers a set of labels m.

Let m be a set of binary labels, and let Theta be a shape parameter(Theta is a shape prior on the labels from a Layered Pictorial Structure (LPS) model). We define an energy function E(m, Theta) as follows.

E(m, Theta) = sum phi_x(D|m_x) + phi_x(m_x|Theta) + sum Psi_{xy}(m_x, m_y) + phi(D|m_x, m_y) (1)

The term phi_x(D|m_x) + phi_x(m_x|Theta) is called a unary term, and the term Psi_{xy}(m_x, m_y) + phi(D|m_x, m_y) is called a pairwise term.An unary term consists of the likelihood phi_x(D|m_x) based on color, and the unary potential phi_x(m_x|Theta) based on the distance from Theta. A pairwise term consists of a prior Psi_{xy}(m_x, m_y) and a contrast term phi(D|m_x, m_y).

The best labeling m^{*} minimizes sum limits_i w_i E(m, Theta_i), where w_i is the weight of the parameter Theta_i.

m^{*} = arg min limits_m sum limits_i w_i E(m, Theta_i) (2)

The OBJ CUT algorithm

# Given an image D, an object category is chosen, e.g. cows or horses.
# The corresponding LPS model is matched to D to obtain the samples Theta_1, cdots, Theta_s
# The objective function given by equation (2) is determined by computing E(m, Theta_i) and using w_i = g(Theta_i|Z)
# The objective function is minimized using a single MINCUT operation to obtain the segmentation m.

Example

Figures 8-11 exemplify the OBJ CUT algorithm.

Other approaches

* Jigsaw approach [ E. Borenstein, S. Ullman: Class-specic, top-down segmentation. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pages 109-124, 2002.]
* Image parsing [Z. Tu, X. Chen, A. L. Yuille, S. C. Zhu: Image Parsing: Unifying Segmentation, Detection, and Recognition. Toward Category-Level Object Recognition 2006: 545-576]
* Interleaved segmentation [B. Leibe, A. Leonardis, B. Schiele: An Implicit Shape Model for Combined Object Categorization and Segmentation. Toward Category-Level Object Recognition 2006: 508-524]
* LOCUS [ J. Winn, N. Joijic. Locus: Learning object classes with unsupervised segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Beijing, 2005.]
* LayoutCRF [J. M. Winn, J. Shotton: The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects. CVPR (1) 2006: 37-44]

References


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Bag of words model in computer vision — This is an article introducing the Bag of words model (BoW) in computer vision, especially for object categorization. From now, the BoW model refers to the BoW model in computer vision unless explicitly declared.Before introducing the BoW model,… …   Wikipedia

  • Scale-invariant feature transform — Feature detection Output of a typical corner detection algorithm …   Wikipedia

  • Computer vision — is the field concerned with automated imaging and automated computer based processing of images to extract and interpret information. It is the science and technology of machines that see. Here see means the machine is able to extract information …   Wikipedia

  • Speech perception — is the process by which the sounds of language are heard, interpreted and understood. The study of speech perception is closely linked to the fields of phonetics and phonology in linguistics and cognitive psychology and perception in psychology.… …   Wikipedia

  • linguistics — /ling gwis tiks/, n. (used with a sing. v.) the science of language, including phonetics, phonology, morphology, syntax, semantics, pragmatics, and historical linguistics. [1850 55; see LINGUISTIC, ICS] * * * Study of the nature and structure of… …   Universalium

  • Arabic literature — Introduction       the body of written works produced in the Arabic language.       The tradition of Arabic literature stretches back some 16 centuries to unrecorded beginnings in the Arabian Peninsula. At certain points in the development of… …   Universalium

  • Technical features new to Windows Vista — This article is part of a series on Windows Vista New features Overview Technical and core system Security and safety Networking technologies I/O technologies Management and administration Removed features …   Wikipedia

  • Kadir Brady saliency detector — The Kadir Brady saliency detector is designed to detect representative region in images. It performs well in the context of object class recognition. It was invented by http://www.robots.ox.ac.uk/ timork/ Timor Kadir] and [http://www.robots.ox.ac …   Wikipedia

  • Harris affine region detector — In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or …   Wikipedia

  • Information society — For other uses, see Information society (disambiguation). The aim of the information society is to gain competitive advantage internationally through using IT in a creative and productive way. An information society is a society in which the… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”