Computational Learning Theory
July 4th, 2012Authors: Michael Kearns and Umesh Vazirani
Avrim Blum taught a course on Machine Learning Theory last semester but as I was busy taking some other courses I didn’t have the chance to attend. I however am quite interested in the computer scientist’s perspective of machine learning and decided to read Kearns’ and Vazirani’s short textbook as an introduction to the subject. I really had very little exposure to computational learning theory (despite working in statistical learning theory), but I found the textbook to be quite a reasonable introduction to the field. Kearns and Vazirani highlight some of the main definitions and concepts and go over some key ideas but omit many other important results in the field, either for lack of space or because the field is quite active.
At the onset, Kearns and Vazirani must first define what learning means, and for this (and throughout the field) they use Valiant’s Probably-Approximately-Correct (PAC) learning framework. This is the most fundamental model of learning in the learning theory community, because it captures the essential aspects of the learning problem while abstracting away everything else. Numerous variants and extensions have been proposed and heavily studied, capturing additional aspects of learning including active learning, learning in the presence of noise, and the bayesian persepective.
A brief summary of the book follows: The PAC-learning is developed in chapter one via a series of insufficient definitions and subsequent refinements. Along the way, we see examples of some learning problems, algorithms, and proofs. Chapter 2 explores the principle of Occam’s razor and show thats algorithms that output simple explanations of the data are PAC-learning algorithms. Chapter 3 unifies all learnability results with the beautiful Vapnik-Chervonenkis (VC) theory, which gives both upper and lower bounds on sample complexity for PAC-learning a concept class in terms of
a quantity called VC dimension, which captures the complexity of class. Chapter 4 discusses the classical and practically popular idea of boosting, whereby “weak” learning algorithms can be grouped together intelligently into a PAC-learning algorithm. Chapter 5 introduces the notion of noise, the statistical query (SQ) model, and shows the connection between learning in SQ and learning in the presence of classification noise. Chapter 6 and 7 focus on unlearnability, first establishing concept classes that cannot be learned in polynomial time (under complexity-theoretic assumptions) and then showing how reductions can be used to prove more unlearnability results. Finally, Chapter 8 shows how active learning, where the learner can construct instances and ask for their labels, can be used to overcome this inherent unlearnability.
One of the reasons I read Kearns and Vazirani is that (in my opinion) in statistics, it is hard to capture notions of computational efficiency but it seems to be essential in many estimation problems. The large body of work from the optimization community (many estimators are solutions to optimization problems), has developed a nice, although different from the CS-theorist’s, characterization of computational efficiency, but when studying limits of estimation (i.e. lower bounds), statisticians have no way to capture computational constraints. We typically study the optimality of estimators via information theory, which ignores computational considerations, but allows one to derive fundamental limits on estimation. In many problems this is sufficient, since we can give efficient algorithms whose performance matches the information-theoretic limit, but in others we are not sure if we can. A natural question in this latter setting is if computational considerations makes the problem harder, so that no efficient algorithm can match the information theoretic limit. The fact that computational learning theory works with a good model of computation and can develop such unlearnability results makes me think that bridging the gap between statistical and computational learning can help answer these questions.
So, from that perspective I think it was a very worthwhile read. The book has gotten me excited about a new perspective on learning and acquainted me with a new community of researchers and I think it is a promising direction for answering questions about computational and statistical tradeoffs in learning. I will say that the book is certainly not comprehensive, but it is a good, short introduction and I found it fairly easy to read. I would recommend to anyone interested in the field, and suggest Professor Blum’s course notes for further reading.