There was a moment when logistic regression was used to classify just one thing: If you drink a vial of poison, are you likely to be labeled “living” or “deceased”? Times have changed. Today, calling emergency services provides a better answer to that question, but logistic regression is at the very heart of deep learning.
Poison control: The logistic function dates to the 1830s, when the Belgian statistician P.F. Verhulst invented it to describe population dynamics: Over time, an initial explosion of exponential growth flattens as it consumes available resources, resulting in the characteristic logistic curve. More than a century passed before American statistician E. B. Wilson and his student Jane Worcester devised logistic regression to figure out how much of a given hazardous substance would be fatal. How they gathered their training data is a subject for another essay.
Fitting the function: Logistic regression fits the logistic function to a dataset in order to predict the probability, given an event (say, ingesting strychnine), that a particular outcome will occur (say, an untimely demise).
- Training adjusts the curve’s center location horizontally and its middle vertically to minimize error between the function’s output and the data.
- Adjusting the center to the right or the left means that it would take more or less poison to kill the average person. A steep slope signifies certainty: Before the halfway point, most people survive; beyond the halfway point, sayonara. A gentle slope is more forgiving: lower than midway up the curve, more than half survive. Farther up, less than half.
- Set a threshold of 0.5 between one outcome and another, and the curve becomes a classifier. Just enter the dose into the model, and you’ll know whether you should be planning a party or a funeral.
More outcomes: Verhulst’s work found the probabilities of binary outcomes, ignoring further possibilities like which side of the afterlife a poison victim might land in. His successors extended the algorithm.
- Working independently in the late 1960s, British statistician David Cox and Dutch statistician Henri Theil adapted logistic regression for situations that have more than two possible outcomes.
- Further work yielded ordered logistic regression, in which the outcomes are ordered values.
- To deal with sparse or high-dimensional data, logistic regression can take advantage of the same regularization techniques as linear regression.
Versatile curve: The logistic function describes a wide range of phenomena with fair accuracy, so logistic regression provides serviceable baseline predictions in many situations. In medicine, it estimates mortality and risk of disease. In political science, it predicts winners and losers of elections. In economics, it forecasts business prospects. More relevant to deep learning, it drives a portion of the neurons, in which the nonlinearity is a sigmoid, in a wide variety of neural networks.