WE MUST ADD STRUCTURE TO DEEP LEARNING BECAUSE...

Publicado 2024-04-01
Dr. Paul Lessard and his collaborators have written a paper on "Categorical Deep Learning and Algebraic Theory of Architectures". They aim to make neural networks more interpretable, composable and amenable to formal reasoning. The key is mathematical abstraction, as exemplified by category theory - using monads to develop a more principled, algebraic approach to structuring neural networks.

We also discussed the limitations of current neural network architectures in terms of their ability to generalise and reason in a human-like way. In particular, the inability of neural networks to do unbounded computation equivalent to a Turing machine. Paul expressed optimism that this is not a fundamental limitation, but an artefact of current architectures and training procedures.

The power of abstraction - allowing us to focus on the essential structure while ignoring extraneous details. This can make certain problems more tractable to reason about. Paul sees category theory as providing a powerful "Lego set" for productively thinking about many practical problems.

Towards the end, Paul gave an accessible introduction to some core concepts in category theory like categories, morphisms, functors, monads etc. We explained how these abstract constructs can capture essential patterns that arise across different domains of mathematics.

Paul is optimistic about the potential of category theory and related mathematical abstractions to put AI and neural networks on a more robust conceptual foundation to enable interpretability and reasoning. However, significant theoretical and engineering challenges remain in realising this vision.

Please support us on Patreon. We are entirely funded from Patreon donations right now.
patreon.com/mlst
If you would like to sponsor us, so we can tell your story - reach out on mlstreettalk at gmail

Links:
Categorical Deep Learning: An Algebraic Theory of Architectures
Bruno Gavranović, Paul Lessard, Andrew Dudzik,
Tamara von Glehn, João G. M. Araújo, Petar Veličković
Paper: categoricaldeeplearning.com/

Symbolica:
twitter.com/symbolica
www.symbolica.ai/

Dr. Paul Lessard (Principal Scientist - Symbolica)
www.linkedin.com/in/paul-roy-lessard/

Neural Networks and the Chomsky Hierarchy (Grégoire Delétang et al)
arxiv.org/abs/2207.02098

Interviewer: Dr. Tim Scarfe

Pod: podcasters.spotify.com/pod/show/machinelearningstr…

Transcript:
docs.google.com/document/d/1NiHJKTkeqYdpcgr6lGCTwq…

More info about NNs not being recursive/TMs:
   • Can ChatGPT Handle Infinite Possibili...  

Geometric Deep Learning blueprint:
   • GEOMETRIC DEEP LEARNING BLUEPRINT  

TOC:
00:00:00 - Intro
00:05:07 - What is the category paper all about
00:07:19 - Composition
00:10:42 - Abstract Algebra
00:23:01 - DSLs for machine learning
00:24:10 - Inscrutability
00:29:04 - Limitations with current NNs
00:30:41 - Generative code / NNs don't recurse
00:34:34 - NNs are not Turing machines (special edition)
00:53:09 - Abstraction
00:55:11 - Category theory objects
00:58:06 - Cat theory vs number theory
00:59:43 - Data and Code are one and the same
01:08:05 - Syntax and semantics
01:14:32 - Category DL elevator pitch
01:17:05 - Abstraction again
01:20:25 - Lego set for the universe
01:23:04 - Reasoning
01:28:05 - Category theory 101
01:37:42 - Monads
01:45:59 - Where to learn more cat theory

Todos los comentarios (21)
  • @johntanchongmin
    I like Dr. Paul's thinking - clear, concise and very analytical. LLMs don't reason, but they can do some form of heuristic search. When used on some structure, it can lead to very powerful search over the structure provided and increase their reliability.
  • @aitheignis
    This is an amazing video. I really love this tape. The idea about building formal language based on category theory to reason about some systems isn't limited to just application in neural network for sure. I can definitely see this being used in gene regulatory pathway. Thank you for the video, and I will definitely check out the paper.
  • @AliMoeeny
    Yet another exceptionally invaluable episode. Thank you Tim
  • @jonfe
    Reasoning for me is like having a gigant graph of "things" or "concepts" in your brain, learning the relationships between them thru experience, for example you can relate parts of an event to different one, just by finding correlations in the relationships between their internal parts, and doing that you can pass the learning of one event to the other.
  • @alivecoding4995
    @Tim What paper are you refering to when speaking about “type-two generalization“ and reasoning?
  • LLMs store information in giant matrices of weights. Is there any model that can process a large amount of text and creat a relational database structure where the tables and fields are generated by the model as well as the data in them.
  • @derricdubois1866
    The point of abstraction is to enable one to achieve a view of some particular forest by avoiding being blinded to such by the sight of some trees.
  • @chadx8269
    Professor Van Nostram do you allow questions?
  • @FranAbenza
    Is human biological machinery better understood as a functional-driven system or OO? Why? from cell to cognition?
  • @thecyberofficial
    As an abstract handle theorist, everything is my nail, my screw, my bolt, ... :) Often, the details thrown away by categorisation are exactly what matters, otherwise you just end up working with the object theory in the roundabout Cat (or Topoi) meta-language.
  • what was that template metaprogramming hack to pick the correct sorting algorithms? any references for that please? sounds super interesting
  • @jabowery
    Removing the distinction between a function and data type is at the heart of Algorithmic Information. AND gee guess what? That is at the heart of Ockham's Razor!
  • @jumpstar9000
    With regard to inscruitability around the 26 minute mark. My personal feeling is that the issue we face is with overloading of models. As an example, let's take an LLM. Current language models take a kitchen sink approach where we are pressing them to generate both coherent output and also apply reasoning. This doesn't really scale well when we introduce different modalities like vision, hearing or the central nervous system. We don't really want to be converting everything to text all the time and running it through a black box. Not simply because it is inefficient, but more that it isn't the right abstraction. It seems to me we should be training multiple models as an ensemble that compose from the outset where we have something akin to the pre-frontal cortex that does the planning in response to stimuli from other systems running in parallel. I have done quite a bit of thinking on this and I'm reasonably confident it can work. As for category theory and how it applies. If I squint I can kind of see it, but mostly in an abstract sense. I have built some prototypes for this that I guess you could say were type safe and informed by category theory. I can see it might help to have the formalism at this level to help with interpretability (because that's why I built them). Probabalistic category theory is more along the lines of what I have been thinking.
  • @oncedidactic
    Great stuff! I enjoyed Paul’s way of talking about math — first the precise definition and then why do we care, part by part. Good work dragging it out until the pump primed itself 😅
  • @alvincepongos
    Say you apply category theory on NNs and you do find a geometric algebra that operationally formalizes the syntax and semantics of the system. Is it possible that the resulting algebra is exactly what's built in, compositions of activated linear equations? If that is the case, no insights are gained. To prevent this problem, how are CT/ML scientists posing the approach such that category theory's insights are deeper than that?
  • @mobiusinversion
    Apologies for the pragmatism, but is this applicable in any realistic engineering driven effort?
  • @Daniel-Six
    This was an incredibly good discussion. Tim and company are definitely on to something elusive to articulate but crucial to appreciate regarding the real limitations of current machine "intelligence," and I can at least vaguely fathom how this will be made clear in the coming years.
  • @darylallen2485
    1:57 - Its been several years since took calculus, but I remember being exposed to some functions that calculated the area of a shape where the domain of the function was negative infinity to positive infinity, but the area was a finite number. Mathematically, seems it should be possible to achieve finite solutions with infinite inputs.
  • @hnanacc
    why is the nature infinite? what if it's just the same things repeating but with some variance? So also a plausible assumption is there is a large amount of information to be memorized, which needs further scaling but the model can emulate the variance.
  • @erikowsiak
    I love your podcasts it seems you get all the right people to talk to :) just when I needed it :)