Here's a summary of the themes discussed on Hacker News regarding backpropagation and its origins:
The Relationship Between Automatic Differentiation and Calculus
A central theme is the clarification of what reverse-mode automatic differentiation (AD), the mathematical engine behind backpropagation, actually is. While some users initially equated it to integration or PID loops due to the "forward and reverse" terminology, others emphasized it's a method of calculating derivatives.
- Eigenspace clarified: "Reverse move automatic differentiation is not integration. It's still differentiation, but just a different method of calculating the derivative than the one you'd think to do by hand." They go on to explain it applies the chain rule in reverse and its efficiency for high-dimensional input, low-output problems.
- Imtringued contrasted forward and reverse modes: "Forward mode automatic differentiation creates a formula for each scalar derivative. If you have a billion parameters you have to calculate each derivative from scratch. ... Reverse mode automatic differentiation starts from the root of the symbolic expression and calculates the derivative for each subexpression simultaneously." They likened the difference to recursive vs. iterative Fibonacci calculation, highlighting avoidance of redundant work.
- Uoaei suggested a nuanced view: "You could, I suppose, argue that the big innovation is the application of vectorization to the chain rule (by virtue of the matmul-based architecture of your usual feedforward network) which is a true combination of two mathematical technologies."
- Dicroce questioned its fundamental novelty: "Isn't it just kinda a natural thing once you have the chain rule?"
- Bjornsing humorously suggested historical figures could have "invented" it: "The chain rule was explored by Gottfried Wilhelm Leibniz and Isaac Newton in the 17th century. Either of them would have ”invented” backpropagation in an instant. It’s obvious."
Historical Roots in Control Theory and Aerospace
A significant portion of the discussion revolves around claims that backpropagation's core ideas, particularly methods for optimizing trajectories, originated in 1950s and 1960s control theory, with some applications even extending to the Apollo space program.
- Pncnmnp initiated this line of inquiry by referencing an essay by Michael Jordan: "In it, he stated the following: > Indeed, the famous “backpropagation” algorithm that was rediscovered by David Rumelhart in the early 1980s, and which is now viewed as being at the core of the so-called “AI revolution,” first arose in the field of control theory in the 1950s and 1960s. One of its early applications was to optimize the thrusts of the Apollo spaceships as they headed towards the moon." They sought primary sources for this claim.
- Costates-maybe offered a link to Ben Recht's discussion on the topic: "Ben Recht has a discussion of the relationship between techniques in optimal control that became prominent in the 60's, and backpropagation: https://archives.argmin.net/2016/05/18/mates-of-costate/"
- Dataflow provided references likely pertaining to this: ChatGPT suggested "what you’re thinking of is the “adjoint/steepest-descent” optimal-control method (the same reverse-mode idea behind backprop), developed in aerospace in the early 1960s and applied to Apollo-class vehicles." Specific papers cited included work by Henry J. Kelley and A.E. Bryson & W.F. Denham.
- Drsopp followed up with a direct link to a potential source: "Henry J. Kelley (1960). Gradient Theory of Optimal Flight Paths. [1] https://claude.ai/public/artifacts/8e1dfe2b-69b0-4f2c-88f5-0..."
- Pncnmnp, after looking up Kelley, found further connections: "I looked up Henry J. Kelley on Wikipedia, and in the notes I found a citation to this paper from Stuart Dreyfus (Berkeley): "Artificial Neural Networks, Back Propagation and the Kelley-Bryson Gradient Procedure" (https://gwern.net/doc/ai/nn/1990-dreyfus.pdf)."
- Duped suggested other related concepts: "They're probably talking about Kalman Filters (1961) and LMS filters (1960)."
- Pjbk noted similarities with optimization in control systems: "To be fair, any multivariable regulator or filter (estimator) that has a quadratic component (LQR/LQE) will naturally yield a solution similar to backpropagation when an iterative algorithm is used to optimize its cost or error function through a differentiable tangent space."
- Cubefox, however, questioned the interpretation of the initial quote: "> ... first arose in the field of control theory in the 1950s and 1960s. One of its early applications was to optimize the thrusts of the Apollo spaceships as they headed towards the moon. I think "its" refers to control theory, not backpropagation."
The Nature of Invention and Credit Attribution (The "Schmidhuber Problem")
A significant undercurrent, and perhaps the primary driver of the discussion linking to the original article, is the contentious issue of credit attribution in AI and the alleged attempts by some (like Jürgen Schmidhuber) to reclaim credit for foundational work that later became widely known through other researchers. This leads to a discussion about "reinventions" and perceived intellectual property disputes.
- Aaroninsf noted a common teaching: "When I worked on neural networks, I was taught David Rumelhart."
- Cs702 directly addressed the perceived stance of the author of the linked article: "Whatever the facts, the OP comes across as sour grapes. The author, Jürgen Schmidhuber, believes Hopfield and Hinton did not deserve their Nobel Prize in Physics, and that Hinton, Bengio, and LeCun did not deserve their Turing Award."
- Icelancer echoed this sentiment: "Didn't click the article, came straight to the comments thinking 'I bet it's Schmidhuber being salty.' Some things never change."
- Empiko interpreted the underlying tension: "I think the unspoken claim here is that the North American scientific establishment takes credit from other sources and elevates certain personas instead of the true innovators who are overlooked."
- Mindcrime acknowledged the cyclical nature of invention and credit: "Who didn't? Depending on exactly how you interpret the notion of 'inventing backpropagation' it's been invented, forgotten, re-invented, forgotten again, re-re-invented, etc, about 7 or 8 times." They pointed to "Talking Nets: An Oral History of Neural Networks" as a source for such history.
- PunchTornado observed a personal slight: "Funny that hinton is not mentioned. Like how childish can the author be?"
- Uoaei framed the issue as mislabeling: "Calling the implementation of chain rule 'inventing' is most of the problem here."
- Anon84 humorously summarized the credit issue: "Can we back propagate credit?"
The Role and Perception of LLM Output in Discussions
A brief but notable tangent emerged regarding the use of LLMs (like ChatGPT) to find information and contribute to discussions, sparking debate about its perceived "effort," value, and how it should be presented.
- Throawayonthe expressed a negative view: "It's rude to show people your llm output."
- Danieldk elaborated on this: "Because it is terribly low-effort. People are here for interesting and insightful discussions with other humans. If they were interested in unverified LLM output… they would ask an LLM?"
- Drsopp defended its use, especially if helpful: "Who cares if it is low effort? I got lots of upvotes for my link to Claude about this, and pncnmnp seems happy."
- LcnpylgdnU4H9of countered: "It's a weird thing to wonder after so many people expressed their dislike of the upthread low-effort comment with a down vote... The point is that a reader may want to know that the text they're reading is something a human took the time to write themselves. That fact is what makes it valuable."
- Aeonik offered a different perspective, appreciating LLM contributions when clearly marked: "I don't think it's rude, it saves me from having to come up with my own prompt and wade through the back and forth to get useful insight from the LLMs... Also, I quite love it when people clearly demarcate which part of their content came from an LLM, and specifies which model."
Broad Historical Context and Alternative Perspectives
The discussion also touched on other potentially related or contributing mathematical concepts and historical viewpoints, suggesting that backpropagation might not be an isolated invention but rather an emergent property of existing mathematical tools.
- Imtringued offered an analogy for efficiency: Comparing reverse AD to iterative Fibonacci calculation highlights the avoidance of redundant work.
- Digikata suggested crossover with control theory: "There are large bodies of work for optimization of state space control theory that I strongly suspect as a lot of crossover for AI, and at least has very similar mathematical structure."
- Convolvatron asked about adaptive filters: "don't undergrad adaptive filters count? https://en.wikipedia.org/wiki/Adaptive_filter doesn't need a differentiation of the forward term, but if you squint it looks pretty close"
- Fizz_buzz shared a personal learning experience: "For me it was the other way around. I always knew how to compute the chain rule. But really only understood what the chain rule means when I read up on what back propagation was."