Multi-Label Classification of Traffic Scenes

Ivan Sikirić, Karla Brkić, Ivan Horvatin and Siniša Šegvić

Abstract

This work deals with multi-label classification of traffic scene images. We introduce a novel labeling scheme for the traffic scene dataset FM2. Each image in the dataset is assigned up to five labels: settlement, road, tunnel, traffic and overpass. We propose representing the images with (i) bag-of-words and (ii) GIST descriptors. The bag-of-words model detects SIFT features in training images, clusters them to form visual words, and then represents each image as a histogram of visual words. On the other hand, the GIST descriptor represents an image by capturing perceptual features meaningful to a human observer, such as naturalness, openness, roughness, etc. We compare the two representations by measuring classification performance of Support Vector Machine and Random Forest classifiers. Labels are assigned by applying binary one-vs-all classifiers trained separately for each class. Categorization success is evaluated over multiple labels using a variety of parameters. We report good classification results for easier class labels (road, F1 = 98% and tunnel, F1 = 94%), and discuss weaker results (overpass, F1 < 50%) that call for use of more advanced methods.

Files

Full Paper as PDF

BibTeX Citation

DOI

10.20532/ccvw.2014.0011

https://doi.org/10.20532/ccvw.2014.0011

BibTeX

@InProceedings{10.20532/ccvw.2014.0011,
  author =       {Ivan Sikiri{\' c} and Karla Brki{\' c} and Ivan
                  Horvatin and Sini{\v s}a {\v S}egvi{\' c}},
  title =        {Multi-Label Classification of Traffic Scenes},
  booktitle =    {Proceedings of the Croatian Compter Vision Workshop,
                  Year 2},
  pages =        {9-14},
  year =         2014,
  editor =       {Lon{\v c}ari{\' c}, Sven and Suba{\v s}i{\' c},
                  Marko},
  address =      {Zagreb},
  month =        {September},
  organization = {Center of Excellence for Computer Vision},
  publisher =    {University of Zagreb},
  abstract =     {This work deals with multi-label classification of
                  traffic scene images. We introduce a novel labeling
                  scheme for the traffic scene dataset FM2. Each image
                  in the dataset is assigned up to five labels:
                  settlement, road, tunnel, traffic and overpass.  We
                  propose representing the images with (i)
                  bag-of-words and (ii) GIST descriptors. The
                  bag-of-words model detects SIFT features in training
                  images, clusters them to form visual words, and then
                  represents each image as a histogram of visual
                  words. On the other hand, the GIST descriptor
                  represents an image by capturing perceptual features
                  meaningful to a human observer, such as naturalness,
                  openness, roughness, etc. We compare the two
                  representations by measuring classification
                  performance of Support Vector Machine and Random
                  Forest classifiers. Labels are assigned by applying
                  binary one-vs-all classifiers trained separately for
                  each class. Categorization success is evaluated over
                  multiple labels using a variety of parameters. We
                  report good classification results for easier class
                  labels (road, $F1 = 98\%$ and tunnel, $F1 = 94\%$),
                  and discuss weaker results (overpass, $F1 < 50\%$)
                  that call for use of more advanced methods.},
  doi =          {10.20532/ccvw.2014.0011},
  url =          {https://doi.org/10.20532/ccvw.2014.0011}
}