Classification of Detectors, Extractors and Matchers

Classification of detectors, extractors and matchers

I understand how FAST, SIFT, SURF work but can't seem to figure out
which ones of the above are only detectors and which are extractors.

Basically, from that list of feature detectors/extractors (link to articles: FAST, GFTT, SIFT, SURF, MSER, STAR, ORB, BRISK, FREAK, BRIEF), some of them are only feature detectors (FAST, GFTT) others are both feature detectors and descriptor extractors (SIFT, SURF, ORB, FREAK).

If I remember correctly, BRIEF is only a descriptor extractor, so it needs features detected by some other algorithm like FAST or ORB.

To be sure which is which, you have to either browse the article related to the algorithm or browse opencv documentation to see which was implemented for the FeatureDetector class or which was for the DescriptorExtractor class.

Q1: classify the types of detectors, extractors and matchers based on
float and uchar, as mentioned, or some other type of classification?

Q2: explain the difference between the float and uchar classification
or whichever classification is being used?

Regarding questions 1 and 2, to classify them as float and uchar, the link you already posted is the best reference I know, maybe someone will be able to complete it.

Q3: mention how to initialize (code) various types of detectors,
extractors and matchers?

Answering question 3, OpenCV made the code to use the various types quite the same - mainly you have to choose one feature detector. Most of the difference is in choosing the type of matcher and you already mentioned the 3 ones that OpenCV has. Your best bet here is to read the documentation, code samples, and related Stack Overflow questions. Also, some blog posts are an excellent source of information, like these series of feature detector benchmarks by Ievgen Khvedchenia (The blog is no longer available so I had to create a raw text copy from its google cache).

Matchers are used to find if a descriptor is similar to another descriptor from a list. You can either compare your query descriptor with all other descriptors from the list (BruteForce) or you use a better heuristic (FlannBased, knnMatch). The problem is that the heuristics do not work for all types of descriptors. For example, FlannBased implementation used to work only with float descriptors but not with uchar's (But since 2.4.0, FlannBased with LSH index can be applied to uchar descriptors).

Quoting this App-Solut blog post about the DescriptorMatcher types:

The DescriptorMatcher comes in the varieties “FlannBased”,
“BruteForceMatcher”, “BruteForce-L1” and “BruteForce-HammingLUT”. The
“FlannBased” matcher uses the flann (fast library for approximate
nearest neighbors) library under the hood to perform faster but
approximate matching. The “BruteForce-*” versions exhaustively searche
the dictionary to find the closest match for an image feature to a
word in the dictionary.

Some of the more popular combinations are:

Feature Detectors / Decriptor Extractors / Matchers types

(FAST, SURF) / SURF / FlannBased
(FAST, SIFT) / SIFT / FlannBased
(FAST, ORB) / ORB / Bruteforce
(FAST, ORB) / BRIEF / Bruteforce
(FAST, SURF) / FREAK / Bruteforce

You might have also noticed there are a few adapters (Dynamic, Pyramid, Grid) to the feature detectors. The App-Solut blog post summarizes really nicely their use:

(...) and there are also a couple of adapters one can use to change
the behavior of the key point detectors. For example the Dynamic
adapter which adjusts a detector type specific detection threshold
until enough key-points are found in an image or the Pyramid adapter
which constructs a Gaussian pyramid to detect points on multiple
scales. The Pyramid adapter is useful for feature descriptors which
are not scale invariant.

Further reading:

This blog post by Yu Lu does a very nice summary description on SIFT, FAST, SURF, BRIEF, ORB, BRISK and FREAK.
These series of posts by Gil Levi also do detailed summaries for several of these algorithms (BRIEF, ORB, BRISK and FREAK).

What is the difference between feature detection and descriptor extraction?

Feature detection

In computer vision and image processing the concept of feature detection refers to methods that aim at computing abstractions of image information and making local decisions at every image point whether there is an image feature of a given type at that point or not. The resulting features will be subsets of the image domain, often in the form of isolated points, continuous curves or connected regions.
Feature detection = how to find some interesting points (features) in the image. (For example, find a corner, find a template, and so on.)

Feature extraction

In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction. When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (much data, but not much information) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full-size input.
Feature extraction = how to represent the interesting points we found to compare them with other interesting points (features) in the image. (For example, the local area intensity of this point? The local orientation of the area around the point? And so on)

Practical example: You can find a corner with the harris corner method, but you can describe it with any method you want (Histograms, HOG, Local Orientation in the 8th adjacency for instance)

You can see here some more information in this Wikipedia article.

feature detectors and descriptors comparison

I can advise you to use Hessian-Affine and MSER for detection, if you need invariance to different factors (e.g., viewpoint change) or FAST, if you need real time.
FAST is doing similar job to the Harris, but much faster.

You can look into "Local Invariant Feature Detectors: A Survey", and "A Comparison of Affine Region Detectors" where many detectors are tested and described very well.

Update: "WxBS: Wide Baseline Stereo Generalizations" does extended benchmark of the novel and classical detectors and descriptors.

Second, the description part is usually slower than detection, so to be real-time you have to use GPU or binary descriptor like BRIEF or FREAK.

Update2: "HPatches (Homography Patches) dataset and benchmark" and corresponding workshop at ECCV 2016. http://www.iis.ee.ic.ac.uk/ComputerVision/DescrWorkshop/index.html .

Update3: "Comparative Evaluation of Hand-Crafted and Learned Local Features" Descriptors (and a bit detectors) evaluation on large-scale 3D reconstruction task CVPR 2017 .

Update4: "Interest point detectors stability evaluation on ApolloScape dataset" Detector evaluation on authonomous driving dataset, ECCVW2018 .

Update5: "From handcrafted to deep local invariant features" Huuuge survey-overview paper about handcrafted and learned features, 2018.

Update6: "Image Matching across Wide Baselines: From Paper to Practice" Large scale benchmark of the abovementioned and more recent methods for the camera pose estimation. IJCV, 2020.

One stage vs two stage object detection

Instead of "region detection + object classification", its "(1)region proposal + (2)classification and localization in two stage detectors.

(1-region proposal) is done by what is called a Region Proposal Network (RPN, for short). RPN is used to decide “where” to look in order to reduce the computational requirements of the overall inference process. The RPN quickly and efficiently scans every location in order to assess whether further processing needs to be carried out in a given region. It does that by outputting k bounding box proposals each with 2 scores representing probability of object or not at each location. In other words, it is used to find up to a predefined number(~2000) of regions (bounding boxes), which may contain objects.

An important problem within object detection is generating a variable-length list of bounding boxes. The variable-length problem is solved in the RPN by using anchors: fixed sized reference bounding boxes which are placed uniformly throughout the original image. Instead of having to detect where objects are, we model the problem into two parts. For every anchor, we ask:

Does this anchor contain a relevant object?
How would we adjust this anchor to better fit the relevant object?

After having a list of possible relevant objects and their locations in the original image, it becomes a more straightforward problem to solve. Using the features extracted by the CNN and the bounding boxes with relevant objects, we apply Region of Interest (RoI) Pooling and extract those features which would correspond to the relevant objects into a new tensor.

Next in second stage, R-CNN module uses above information to:

Classify the content in the bounding box (or discard it, using
“background” as a label).
Adjust the bounding box coordinates (so it better fits the object).

feature matching/detection on brain images

First of all, you should specify what kind of features or for which purpose, the experiment is going to be performed.
Feature extraction is highly subjective in nature, it all depends on what type of problem you are trying to handle. There is no generic feature extraction scheme which works in all cases.
For example if the features are pointing out to some tumor classification or lesion, then of course there are different softwares you can use to extract and define your features.

There are different methods to detect the relevant features regarding to the application:
SURF algorithm (Speeded Up Robust Features)
PLOFS: It is a fast wrapper approach with a subset evaluation.
ICA or 'PCA

This paper is a very great review about brain MRI data feature extraction for tissue classification:
https://pdfs.semanticscholar.org/fabf/a96897dcb59ad9f04b5ff92bd15e1bd159ef.pdf

I found this paper very good o understand the difference between feature extraction techniques.
https://www.sciencedirect.com/science/article/pii/S1877050918301297

Classification of Detectors, Extractors and Matchers