[IEEE 2014 22nd Signal Processing and Communications Applications Conference (SIU) - Trabzon, Turkey...
Transcript of [IEEE 2014 22nd Signal Processing and Communications Applications Conference (SIU) - Trabzon, Turkey...
SEYREK TEMSİLİYET VE YÖNLÜLÜK YAPISI TAŞIYAN SÖZLÜKLERLE TEK GÖRÜNTÜLERDE ÇÖZÜNÜRLÜK ARTIRIMI
SINGLE IMAGE SUPER RESOLUTION BASED ON SPARSE REPRESENTATION VIA DIRECTIONALLY STRUCTURED DICTIONARIES
Fahime Farhadifard, Elham Abar, Mahmoud Nazzal, Hüseyin Ozkaramanlı
Electrical and Electronic Engineering Dep.
EMU University
ÖZETÇE
Bu makalede yönlülük yapısı taşıyan öğrenilen sözlüklerle görüntülerin çözünürlüğünün artırılması için bir yöntem önerilmektedir. Yüksek ve alçak çözünürlükteki eğitme verileri şablonlar kullanılarak değişik yönlere kümelendirilmiştir. Her bir yönlü küme için bir alçak bir de yüksek çözünürlükte sözlük öğrenilmiştir. Yüksek çözünürlükteki sinyalin geriçatımında ise yamalar yönlerine göre sınıflandırılarak en iyi sözlük seçilmiştir. Önce alçak cözünürlükteki yamanın seyrek kodları hesaplanmış; daha sonra bu kodların yüksek çözünürlükteki kodlarla eşit olduklarından kullanılarak yüksek çözünürlükteki yama hesaplanmıştır. Doğru model (sözlük) seçildiği durumda önerilen yöntem yönlülük yapısı kullanmayan yöntemlere göre Kodak veritabanı ve bazı standard görüntülerde 0.2 dB bir iyileştirme yaratmaktadır. Bu iyileştirme görsel olarak da gözlemlenebilmektedir.
Anahtar Kelimeler: süper çözünürlük, yapısal yönlü sözlük.
ABSTRACT
This paper introduces a single-image super-resolution algorithm based on selective sparse coding over several directionally structured learned dictionaries. The sparse coding of high-resolution (HR) image patch over a HR dictionary is assumed to be identical to that of the corresponding low-resolution (LR) patches as coded over a coupled LR dictionary. However, the training patches are clustered by measuring the similarity between a patch and a number of directional templates sets. Each template set characterizes directional variations possessing a specific directional structure. For each cluster, a pair of directionally structured dictionaries is learned; one dictionary for each resolution level. An analogous clustering is performed in the reconstruction phase; each LR image patch is decided to belong to a specific cluster based on its directional structure. This decision allows for selective sparse coding of image patches, with improved representation quality and reduced computational complexity[1]. With appropriate sparse model selection, the proposed algorithm is shown to out-perform a leading super-resolution algorithm which uses a pair of universal dictionaries. Simulations validate this result both visually and quantitatively, with an average of 0.2 dB improvement in PSNR over Kodak set and some benchmark images.
Keywords: super resolution, structurally directional dictionary.
1. INTRODUCTION
Due to the high demand on HR images in many applications, the
problem of super-resolution is getting rise as an active area of
research. Although a HR image can be obtained using high-end
cameras, not only it is very costly and impractical to install for some
applications, but also, the resolution of obtaining images is not
sufficient in some cases such as medical or satellite imaging and
computer vision.
The most recent and successful approach which overcomes the
above difficulties, is to use sparse representation to enhance the
quality of an image. According to the Sparseland model [2], a signal
can be effectively approximated as a linear combination of a few
prototype signals. Such signals form the columns of a dictionary
which is typically obtained by learning over natural image patches.
In this approach, sparsity is effectively used as a regularizer to the
ill-posed super-resolution problem. The key idea for such
regularization is the assumption that linear relationships of HR
images are preserved in their LR projections [3].
An algorithm of single image super-resolution via sparse
representation is proposed by Yang et al.[4,5] and Zeyde et al. [6].
The algorithm of [4] uses a set of HR images, then by
downsampling and blurring operators, the corresponding LR images
are obtained. The mean value of each patch is subtracted and then
patches are used to learn for dictionary pairs. A pair of HR and LR
dictionaries is trained. The main assumption behind these
algorithms is to use the same sparse representation for both HR and
LR patches. At the end, each HR patch is reconstructed using the
HR dictionary with the sparse representation of its corresponding
LR patch.
In [6], the main focus is super-resolving the high frequency (HF)
components of an image which is most difficult task where bicubic
interpolation is not successful. With this algorithm, both the HR and
LR dictionaries are dedicated for the representation of the HF
components of image patches. The K-SVD algorithm [7] is used
herein for dictionary learning. Then, in the reconstruction part, by
employing the OMP algorithm [8], the sparse representation of the
LR patches is calculated and is then used together with the HR
dictionary to recover the HR patches. Then the image of combining
those patches is added to the mid-resolution (bicubic interpolation
version of the LR image) and produces the HR output image.
In this paper, we use J. Yang’s idea of assuming that both the HR
and LR patches approximately have the same coefficients, and
extend the concept by designing directionally structured dictionaries
and applying them to the process of SR. A structural classification
of image patches based on directional structure is defined to provide
information about the edges where are the most difficult parts of any
image to super-resolve. Then, eight pairs of HR and LR dictionary
pair are designed, each for every structural cluster. Besides, we also
consider a pair of dictionaries for non-directional patches. The LR
input image patches are super-resolved using the most appropriate
HR dictionary chosen based on the representation error
corresponding to all LR dictionaries.
The remainder of this paper is organized as follows. The super-
resolution via sparse representation approach reviewed in Section 2.
1718
2014 IEEE 22nd Signal Processing and Communications Applications Conference (SIU 2014)
Section 3 presents the proposed super-resolution algorithm.
Quantitative and qualitative Simulations are provided in Section 4,
with the conclusion in Section 5.
2. SUPER RESOLUTION VIA SPARSE REPRESENTATION
Obtaining a HR image from a single LR image is known as
“single image super-resolution (SISR)”. The LR image is a version of the HR image which has lost most of its higher frequency
information during acquisition, transmission or storage. In order to
solve such a problem which has so many solutions, two constraints
are assumed: (i) a reconstruction constraint: based on image
observation model, the reconstructed image should be in
agreement with the image ; and (ii) a sparsity prior: image can
be sparsely represented over an over-complete dictionary and it can
be recovered from the LR version.
To be more precise, considering the LR image which is a
downsampled and blurred version of the HR image , and
assuming that there is a HR over-complete dictionary of bases which is a large matrix learned using HR image patches, the
vectorized patches of image , denoted by can be sparsely
represented over the dictionary . Therefore, each patch can be
represented as where is a vector with few
nonzero elements ( ). The relationship between a HR patch and
its LR counterpart can be expressed as: (1)
where is a downsampling operator, is a blurring filter and
denotes their combined effect. Substituting the representation of
into (1) and noting that , one obtains: (2)
An implication of (2) is that the will also have the same sparse
representation coefficients . Now, given the LR patches, one can
obtain the representation coefficients using a vector selection
method such as OMP. After obtaining the sparse representation
coefficients, one can reconstruct the HR patch as: (3)
3. THE PROPOSED SUPER RESOLUTION APPROACH
The proposed algorithm consists of two main phases, training and
reconstruction as summarized in Figures 1 and 2. During the
training phase, several sets of directionally structured HR and LR
dictionary pairs are designed. These dictionaries are then used to
reconstruct HR images during the reconstruction phase.
3.1. Training Phase
This phase starts by collecting a set of HR images. Then LR
versions of these images are constructed using downsampling and
blurring operators. To reach the HR image dimension, LR images
are scaled up to the size of HR images via bicubic interpolation and
are termed mid-resolution images. This scaling serves mainly for
rendering the coding part easily. The main concern in this phase is
to learn the most appropriate dictionaries to accurately represent
edges and texture contents of an image. Edge and texture regions
constitute the HF components of an image.
To follow this idea, in order to learn the HR dictionaries from the
HF components particularly, HR images are subtracted from the
mid-resolution one. Then, local patches are extracted and vectorized
to form the HR training set . In order to ensure that the
reconstructed HR patch is the best estimate, the calculated
coefficients must fit to the most relevant part of LR images. Thus,
two-dimensional (2D) Laplacian and Gradient high-pass filters are
employed to remove low-frequency contents of LR images as done
in the approaches of [4-6]. This choice is reasonable as the human
visual system is sensitive to the HF components of an image. The
same way as done with HR training patches, the local mid-
resolution patches (features) are extracted and vectorized to form a
LR training set .
Instead of using a single pair of HR and LR dictionaries, we
propose the use of several directionally structured dictionary pairs.
In this work, directional patterns of the 2-D space are effectively
accounted for by sampling this space into eight directional
orientations. Each orientation is represented by a set of directional
variations that are aligned in a certain direction. Therefore the
angular orientation between each two successive directions
is . The role of these directional patterns is to effectively
cluster all the training patches based on their directional structure.
Along with these directional clusters, a non-directional cluster is
also used to contain patches that are not aligned along any of the
aforementioned directional orientations. These patches are the ones
whose correlation with the defined directional templates is less than
a specific threshold, which is empirically set based on a correlation
histogram.
After clustering, and before dictionary learning, Principal
Component Analysis (PCA) is applied to the clustered patches as a
dimensionality reduction tool. This tool has been shown able to
significantly reduce the energy of training patches, and therefore
reduce the dictionary learning computational complexity. To this
end, patches of each cluster are fed to the K-SVD algorithm to train
for a LR dictionary. Moreover, sparse representation coefficients of
these LR patches, along with the corresponding HR patches are used
to calculate the corresponding HR dictionary. As outlined in [6], the
HR dictionary calculation is formulated as in (4). (4)
where is the matrix coefficients contains all coefficients vector
obtained from the LR dictionaries for each cluster .
The main steps of the proposed directionally structured dictionary
learning phase are outlined in Algorithm 1.
Algorithm 1 The Training phase
Input: A set of HR training images. Obtaining the LR images. Feature and patch extraction. Designing eight directional templates. Clustering patches and features into eight
directional clusters and one non-directional one. Employing the K-SVD algorithm to learn
corresponding LR dictionaries. Calculating HR dictionaries.
Output: HR and LR structurally directional dictionaries.
Fig 1: Training phase of the proposed algorithm.
3.2. Reconstruction Phase
The reconstruction phase adopts the same rationale as done in the
training phase. Given a LR image to be reconstructed, it first needs
to be rescaled to the same size as the HR image. This is done by
bicubic interpolation. Using this so called mid-resolution image, it
is first filtered to extract the meaningful features in exactly the
1719
2014 IEEE 22nd Signal Processing and Communications Applications Conference (SIU 2014)
same way as done in the learning phase. Then, features (being
patches of an image interpolated to the suitable mid-resolution
level) are extracted to be recovered using the most suitable
dictionary.
Each and every LR feature is sparsely coded over all LR
dictionaries and the representation error between the reconstructed
feature and the original one is calculated. This error is adopted as
clustering criterion in this work; the LR dictionary that corresponds
to the smallest representation error decides on which cluster that
patch belong to, and thus which dictionary pair will be used for that
particular patch for the sparse-based reconstruction. The rationale
of the testing phase is outlined in Algorithm 2.
Algorithm 2 The Testing phase
Input: Given a LR image and nine pairs of dictionaries. Feature extraction. Dictionary selection model. Sparse coding patches using LR dictionary. Reconstruction HR patches.
Output: Combine HR patches and add the output image to
the mid-resolution one to construct the HR image.
Fig 2: Reconstruction phase of the proposed algorithm.
4. SIMULATION AND RESULTS
This section presents experiments inspecting the performance of
the proposed algorithm, as compared to that of the algorithm of
baseline of Zeyde et al. [6] and bicubic interpolation. The algorithm
of [6] is referred to as the baseline algorithm throughout this paper.
Tests are conducted on images of the Kodak set along with some
other benchmark images. Besides, results also include the
performance of the proposed algorithm when it is armed with
correct model selection.
4.1. Quantitative Experimentation (PSNR)
Herein, test images are downsampled and blurred a quarter of their
dimension to obtain their LR versions. LR versions are then scaled-
up to the dimension of their original sizes via the aforementioned
three approaches. The Peak Signal-to-Noise Ratio (PSNR)
measure, described in (5), is used for quantitative performance
evaluation. (5)
where z is the true image of size and its estimate and denotes the mean-square-error (MSE) between z and
which is defined as:
∑∑
(6)
Table 1 lists PSNR values of the conducted simulations. In this
simulation, the proposed algorithm uses with sparsity 3 and 3×3
patch sizes for the LR images (6×6 for HR images) such that the
trained HR dictionaries have a dimension of 36×130. 20 K-SVD
iterations are used to train for the LR dictionaries. The baseline
method is used with the same configuration and the dictionary
dimension is 9× (36×130).
According to the Table 1, the proposed algorithm has almost the
same performance as the baseline algorithm of [6]. It is worth
mentioning that choosing the best HR dictionary based on features
is not a trivial task and the correct HR dictionary is not always
chosen. Considering designed directionally structured dictionaries
with a model selection which always chooses the correct HR
dictionary, the proposed algorithm has an average PSNR
improvement of 0.2 dB over the baseline algorithm and 1.5dB
improvement over Bicubic interpolation.
From the results in this Table, it can be observed that for the
images with directional nature, PSNR improvements are noticeable.
As an example, considering the proposed algorithm with correct
model selection, the PSNR improvement of the Barbara image over
the baseline algorithm is 0.5 dB, and 1.18 dB over Bicubic
interpolation. Besides, that PSNR improvement for the Zone-plate
image which is a curvy directional image, is around 1 dB over the
baseline algorithm and 1.54 dB over bicubic interpolation.
Table 1: PSNR (dB) comparison, corresponding to Bicubic
interpolation, the baseline algorithm, the proposed algorithm
and the proposed algorithm with correct model selection.
Image Bicubic
Interpolation
Baseline
Algorithm
Proposed
Algorithm
Proposed Algorithm
with correct model
selection
K.1 26.7 27.85 27.93 28.15
K.2 34.0 35.04 35.02 35.14
K.3 35.0 36.66 36.58 36.66
K.4 34.6 36.03 35.99 36.03
K.5 27.1 28.95 28.95 29.21
K.6 28.3 29.42 29.35 29.66
K.7 34.3 36.33 36.22 36.36
K.8 24.3 25.5 25.46 25.94
K.9 33.1 35.04 35.03 35.16
K.10 32.9 34.75 34.75 34.79
K.11 29.9 31.14 31.13 31.39
K.12 33.6 35.58 35.38 35.58
K.13 24.7 25.54 25.5 25.8
K.14 29.9 31.3 31.25 31.47
K.15 32.9 34.9 34.72 34.9
K.16 32.1 32.84 32.82 33.1
K.17 32.9 34.38 34.38 34.47
K.18 28.8 29.89 29.85 30.07
K.19 28.8 30.04 30.01 30.43
K.20 32.4 34.11 34.04 34.18
K.21 29.3 30.36 30.32 30.65
K.22 31.4 32.59 32.56 32.72
K.23 35.9 37.9 37.98 37.97
K.24 27.6 28.62 28.57 28.82
Baboon 24.9 25.46 25.46 25.77
Barbara 28.0 28.66 28.55 29.14
Foreman 34.1 33.78 33.75 33.97
Face 34.8 35.56 35.54 35.64
Lena 34.7 36.23 36.23 36.29
Man 29.2 30.51 30.48 30.69
Zebra 30.6 33.21 33.12 33.35
Zone-
plate 12.7 13.27 13.2 13.97
Elaine 31.1 31.31 31.31 31.32
Average 30.29 31.59 31.56 31.77
1720
2014 IEEE 22nd Signal Processing and Communications Applications Conference (SIU 2014)
Figure 3: Visual comparison of Barbara and Zone-plate, from left to right correspond to insets of: original, Bicubic, baseline method [6], proposed method with correct model selection.
4.2 Qualitative Experimentation
Figure 3 presents original scenes of the Barbara and Zone-Plate
benchmark images as compared to their reconstructions obtained with
bicubic interpolation, the baseline algorithm and the proposed
algorithm with correct model selection, respectively. Selected zoomed
scenes are particularly shown to clarify the comparison. For both
image reconstruction examples, it can be observed that the
reconstruction of bicubic interpolation is blurry. Blur is less as sever in
the reconstruction of the baseline algorithm. However, the proposed
algorithm is better able to reconstruct sharper edges. Specifically,
directions are more correctly reconstructed with the proposed
algorithm. These visual results are in line with the PSNR results
presented in Table 1.
5. CONCLUSION
A single-image super-resolution algorithm is presented in this paper.
This algorithm is based on sparse representation over directionally-
structured dictionaries. The key idea is to classify image patches based
on their directional structures, and selectively code each patch using
the most appropriate dictionary. In total, eight directional and one non-
directional dictionary pairs are used. The K-SVD algorithm is used
herein for dictionary learning. Sparse representation error is the
criterion used for this classification. The effect of dictionary
redundancy is empirically studied and it is found that a patch of size
3×3 and dictionary size of 9x130 is a good compromise between
representation quality and computational complexity.
Experimental results indicate the usefulness of the proposed
selective sparse coding over directionally-structured dictionaries. As
compared to bicubic interpolation, the proposed algorithm has an
average PSNR improvement of 1.3 dB as tested over the Kodak set
and some benchmark images. However, it produces comparable
results to the baseline algorithm of Zeyde et al.[6], when only
information from the LR image patches are used for the classification
process. When the classification infers some information from the true
HR image patches, the proposed algorithm out-performs the baseline
algorithm with an average PSNR improvement of 0.2 dB. Visual tests
verify this quantitative result. This result suggests the need for a more
carefully designed sparse model selection for the testing phase.
6. REFRENCES
[1] M.Elad, I.Yavneh, "A Plurality of Sparse Representation is
Better than the Sparsest One Alone," IEEE Transaction on Information Theory, vol. 55, pp. 4701-4714, 2009.
[2] M. Elad, M. Aharon, "Image denoising via sparse and
redundant representations over learned dictionaries," IEEE Transactions On Image Processing, vol. 15, pp. 3736-3745,
2006.
[3] D. L. Donoho, "Compressed sensing," IEEE Transactions on Information Theory, vol. 52, pp. 1289-1306, 2006.
[4] J. Yang, J. Wright, T. Huang and Yi. Ma, "Image super-
resolution as sparse representation of raw image patches," in
IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, 2008, pp. 1 - 8.
[5] J. Yang, J. Wright, T. Huang and Yi Ma, "Image super-
resolution via sparse representation," IEEE Transactions on Image Processing, vol. 19, pp. 2861-2873, 2010.
[6] R. Zeyde, M. Elad and M. Protter, "On single image scale up
using sparse representation," in the 7th International conference on Curves and Surfaces, Avignon, France, 2012,
pp. 711-730.
[7] M. Aharon, M. Elad and A. Bruckstein, "K-SVD: an algorithm
for designing overcomplete dictionaries for sparse
representation," IEEE Transactions on Signal Processing, vol.
54, pp. 4311-4322, 2006.
[8] Y. C. Pati, R. Rezaiifar and P. S. Krishnaprasad, "Orthogonal
matching pursuit: Recursive function approximation with
applications to wavelet," in Conference on Signals, Systems and computers Asilomar, Pacific Grove, CA, 1993.
1721
2014 IEEE 22nd Signal Processing and Communications Applications Conference (SIU 2014)