SGN-41007 Course Blog: 2019

perjantai 25. tammikuuta 2019

Jan 24: More on the LDA classifier

Before the actual content, we discussed the course groupwork. Each group should submit their predictions to Kaggle as described in the assignment text.

Next, we studied a demo of projecting 2D data to 1D with different degrees of separation. The below animation recaps the idea: a good projection makes the classes nicely distinct in the 1D projected space. This is also seen in the Fisher score: bigger score means better separation. Luckily we do not need to try all directions (like in the demo), but can use the LDA formula to find the best direction with one line of code.

The LDA projection vector can be found in two ways: 1) Solve a generalized eigenvalue problem, or, 2) use the simpler formula:

w = (S₁ + S₂)^-1 (m₁ - m₂),

with S₁ and S₂ the covariance matrices of the two classes and m₁ and m₂ the respective class means. It was also mentioned that if the covariance matrices are identity matrices, the vector w simplifies to a line connecting the centers of mass of the two classes (m₁ - m₂). In this case, the classes would look circular. However, if the distributions appear elliptic, this is not enough and the covariance term is needed for correcting the direction.

It should be noted that LDA assumes that the distributions are Gaussian. In real life this is seldom the case, and the severity of the violation depends on the case (LDA may or may not work--you just need to experiment).

In the eigenvector based approach, we discussed the use of LDA for dimensionality reduction. Namely, the more often used dimensionality reduction technique, the PCA, compresses the data to dimensions of maximum variance. However, having a large variance may not indicate the importance to classification. Therefore, LDA could provide an alternative, as it finds subset of directions that maximize class separation (instead of variance).

Jan 21: KNN and linear classifiers

Today we started our work with classification theory and related sklearn tools.

The first classifier was K-Nearest Neighbor, which searches K (e.g., K=5) nearest samples and copies the most frequent class label to the query sample. The benefit is its simplicity and flexibility, but a major drawback is the slow speed at prediction time.

Linear classifiers are a simpler alternative. A linear classifier is characterized by a linear decision boundary, and described by the decision rule:

F(x) = "class 1" if w^Tx > b
F(x) = "class 0" if w^Tx <= b

Here, w and b are the weights and constant offset learnt from the data and x is the new sample to be classified.

torstai 17. tammikuuta 2019

Jan 17: Detection theory, ROC and AUC

Today we released the competition. This year the task is to predict the surface ty on which a mobile robot is rolling on its wheels. This should be done using inertial measurement unit data (accelerometers etc.). The competition is open for everyone, and there is already a baseline, that all teams should exceed. The exact requirements (e.g., the form of the report, where to return etc) to pass the course will be defined later.

Next, looked at an opencv human detection example code. This links nicely the detection theory with machine learning. The code uses Histogram of Oriented Gradients (HOG) representation of the image and classifies each image location as "person" or "not person" using a support vector machine. The code is available at our github, and the essential lines are below.

    # Initialize HOG detector.
    
    hog = cv2.HOGDescriptor()
    detectorCoefficients = cv2.HOGDescriptor_getDefaultPeopleDetector()
    hog.setSVMDetector( detectorCoefficients )

    # Load test image
    
    filename = 'person1.jpg'    
    img = cv2.imread(filename)
    
    # Detect humans
    
    found, w = hog.detectMultiScale(img, 
                                    winStride=(8,8), 
                                    padding=(32,32), 
                                    scale=1.05,
                                    hitThreshold = -1)

    draw_detections(img, found)

During the example, a phone was circulating in the audience demonstrating in-device detection. If you dare, you may test an image detection/recognition demo using deep learning from Google's Tensorflow project. It is available for Android phones (just search Play store for "tensorflow object detection"). Below is one detection example from last weekend.

We also saw that the sensitivity of detector can be tuned to balance between different kinds of errors: missed detections (we did not find all persons in the picture) and false alarms (we found a ghost in the picture). More formal treatise of error types can be found from Wikipedia.

After the example, we studied the model-based approach for detecting a sinusoid embedded under the Gaussian noise. Since there is a model of the signal, the detection can be formulated mathematically, and detection scores are obtained accurately using integration. The same applies also for discrete case (e.g. human detection), but counting is used in place of integrals.

At the end of the lecture, we studied receiver operating characteristic curve and the derived AUC accuracy metric. We also showed how to manually solve AUC tasks such as Question 5a in this exam.

[Edit] In some exercise groups, the solution of task 2 was a bit unclear. Therefore, the "official" solution is shown below.

maanantai 14. tammikuuta 2019

Jan 14: Estimation and detection

On the third lecture, we looked more thoroughly at estimation theory, which is in our context closely linked with regression: prediction of numerical values (regression) as opposed to prediction of classes (classification).

We repeated the main principle: write down the probability of observing exactly these samples x[0], x[1], ... x[n-1] given a parameter value (e.g., A). Then the question is only to maximize this likelihood function with respect to the unknown parameter (e.g., A). In most cases the trick is to take the logarithm before differentiation--otherwise the mathematics is too hard to solve.

On the second lecture, we saw an example MLE problem solved on the blackboard. The example was from June 2017 exam, and is as follows:

In this case, the maximum likelihood estimate for parameter theta is given as theta = 2N / sum(x[n]).

As the last item on parameter estimation, we looked at the german tank problem. In this case, it turns out that the maximum likelihood solution suffers from severe bias: the estimated total number of tanks is simply the largest ID of all tanks encountered so far. On the other hand, we saw that the minimum variance unbiased estimator (MVUE) solution was very close to the true number of tanks in the wartime. MVUE is outside the scope of this course, but if you are interested, refer to my old SSP course.

On the second lecture, we looked at detection of sinusoidal signal (with known frequency) embedded under Gaussian noise. This can be posed as a hypothesis testing question, and solved optimally using a model of the data.

sunnuntai 13. tammikuuta 2019

Jan 10: Estimation theory

Today we studied estimation theory (recap of least squares and first look at maximum likelihood).

The least squares part is assumed familiar from earlier courses and was only discussed by an example. Namely, we looked at predicting the house prices in the Hervanta region. There is a database from which one can scrape a set of realized house prices with attributes of each apartment. The example is available as a Jupyter Notebook. There is also a recent M.Sc. thesis on implementing the house price prediction as a web service (in etuovi.com).

Next, we looked at maximum likelihood estimation. The main principle is to choose first a model for the data, e.g., a noisy sinusoid:

x[n] = A * sin(2*pi*f*n + phi) + w[n]

Next, we write down the probability of observing exactly these samples x[0], x[1], ... x[n-1] given a parameter value (e.g., A). Then the question is only to maximize this likelihood function with respect to the unknown parameter (e.g., A). In most cases the trick is to take the logarithm before differentiation--otherwise the mathematics is too hard to solve.

At the end of the week it turned out that the Friday exercise sessions are overcrowded. We will have a more strict monitoring of the attendance starting this week. Please go to the groups for which you have signed.

Jan 7: Course overview, intro to scientific Python

Today was the first lecture and we discussed course organization and took a brief look at using Python for Machine Learning.

Things to remember from today:

Passing the course requires: 1) exercises (at least 60%), 2) assignment (competition; TBA) and final exam.
Register for the exercises in POP.
Remember to register a group for the competition (max 4 members). The deadline is 14.1.
You can use classroom (TC303) computers or your own laptop in the exercise sessions. If you use your own, we recommend to install anaconda python (or miniconda with appropriate packages).

The lecture slides are available at the course website.

The first hour concentrated mostly on the organization of the course. On the second hour, we looked at the beginning of the first slide set. First we emphasized the difference between model based and training based approach for solving recognition and detection problems.

If, for example, the problem is to detect whether a sinusoidal beep is present in an audio signal, there is no point to solve it by showing examples. This is because there is a perfect model (formula) for the sinusoid, and we can mathematically define exactly what we are looking for.
On the other hand, if the task is to classify pictures of cats and dogs apart, the model based approach is no longer useful: there is no formula that would describe all possible pictures of cats or dogs.