The origin of set theory - Ciencias Básicas

You might think that creating a theory where you group objects with similar characteristics and define basic concepts of membership and inclusion could be intuitive at first glance because of its simplicity and that it has always existed in people’s minds, but the origin of set theory is more complex and impactful than you think.

The set theory we typically study in introductory mathematics courses — with its Venn diagrams, union and intersection operations, and the convenient universal set — is extraordinarily useful and necessary. With it we can define relations, functions, algebraic structures, and virtually the entire edifice of modern mathematics.

But why was this theory created in the first place? Georg Cantor did not invent it on a whim. Three concrete problems compelled him to develop it:

Mathematizing infinity. Cantor needed well-defined infinite sets as complete entities — not as mere endless processes — in order to solve advanced problems in mathematical analysis, such as the structure of derived sets from trigonometric series.
Overcoming Galileo’s paradox. Back in the seventeenth century, Galileo had noticed something disturbing: the natural numbers $ (1, 2, 3, 4, \ldots) $ and their perfect squares $ (1, 4, 9, 16, \ldots) $ can be matched one-to-one, even though the squares are “only a part” of the naturals. This contradicted the intuition that “the whole is greater than its parts.” Cantor, rather than shying away from this strangeness, turned it into a definition: a set is infinite precisely when it can be placed in correspondence with a proper subset of itself.
Discovering the hierarchy of infinities. Cantor proved that not all infinities are equal: there exist different “sizes” of infinity (the transfinite numbers). The set of real numbers is strictly “larger” than the set of natural numbers — a result that changed our understanding of infinity forever.

Yet this theory has limits. The original version — known as naïve set theory (or “elementary” set theory) — contained logical cracks that eventually forced mathematicians to rebuild the foundations from scratch in order to preserve the theory (the one you and I use today) from that point forward.

This article tells that complete story: where the idea of infinity came from, how infinity helped Cantor build his theory, what Cantor did to get his theory accepted, the paradoxes that shattered it, how far the elementary version reaches today, and how the formal axioms (ZFC, NBG) repaired it.

Infinity before Cantor: two thousand years of caution

Before examining Cantor’s reasons for creating this theory, it is essential to understand the context in which mathematicians of his time viewed infinity. If we look back into history, we find a tradition of more than two millennia that treated infinity with deep mistrust. Infinity was not always an object of study; for centuries it was, at best, a process — and at worst, a taboo.

Aristotle’s veto (4th century BC)

The story begins with Aristotle, who introduced a distinction that would dominate Western thought for over 2,000 years, summarized in this table:

	Potential infinity	Actual infinity
What it is	A process that never ends	A completed totality
Example	“I can always add one more”	“The set of all natural numbers”
Exists as…	A possibility, a tendency	A finished object
Aristotle’s verdict	✔ Acceptable	✘ Rejected

Aristotle needed this distinction to answer Zeno of Elea’s paradoxes, such as the famous Achilles and the tortoise: if a distance already contains infinitely many points, how is it possible to traverse it in finite time?

Aristotle’s solution was ingenious: a line is divisible infinitely (potentially), but it is never “already divided” into infinitely many parts, because that is an act that can never be completed — there are always more line segments yet to divide. Infinity exists as a growing capacity, not as a reality.

Analogy: It is like saying “I can always cut a cake in half” versus “here I have infinitely many slices of a cake.” The first is a process; the second, a complete and accepted object — and Aristotle only accepted the first, because the second was impossible and illogical to realize.

The Middle Ages: infinity as a divine attribute

During the Middle Ages, actual infinity did not disappear entirely — it took refuge in theology. Scholastic philosophers such as Thomas Aquinas held that:

Only God, being infinite, could grasp an infinite totality all at once.
The human mind, being finite, could only comprehend processes that extend without end (potential infinity).
Treating infinity as a finished mathematical object was considered almost a blasphemy — an attempt by finite reason to usurp an exclusively divine attribute.

This stance kept actual infinity “under lock and key” for centuries: it existed, but only in the mind of God, beyond the reach of mathematics.

Infinitesimal calculus: “tending toward…” (17th–18th centuries)

When Newton and Leibniz developed infinitesimal calculus, they used “infinitesimals” (infinitely small quantities) and “infinities” (quantities that grow without bound). But always under the umbrella of potential infinity:

The method of exhaustion (used since Archimedes) approximates the area of a circle using polygons with increasing numbers of sides. One never said “the circle is a polygon with infinitely many sides”, but rather “as the number of sides increases, the difference tends to zero.”
Variables “tend to infinity” ($ x \to \infty $): they grow beyond any assigned finite quantity, but remain within the realm of the potential. They never “arrive.”

Infinity was therefore a direction, not a destination.

Gauss’s horror (1831)

Even at the dawn of modern mathematics, the greatest geniuses rejected actual infinity. In 1831, Carl Friedrich Gauss — the “Prince of Mathematicians” — wrote in a celebrated letter:

“I protest against the use of an infinite quantity as a completed thing, which is never permitted in mathematics. Infinity is merely a way of speaking… it is a limit toward which certain ratios approach as closely as desired.” — C. F. Gauss, letter to Schumacher (1831)

For Gauss, $ \infty $ was a convenient shorthand, not a real number or a legitimate object.

Infinity as a verb

In short, for more than two millennia the consensus was clear:

Era	Key figure	Stance on infinity
Antiquity	Aristotle	Exists only as a process (potential)
Middle Ages	Thomas Aquinas	An exclusive attribute of God
17th–18th centuries	Newton, Leibniz	A tool of calculus, never a totality
19th century (before Cantor)	Gauss	Explicitly forbidden as a completed object

Before Cantor, infinity was a verb (something happening). After Cantor, it would also become a noun (an object that can be studied directly).

Cantor and the revolution of infinity

Everything changed with Georg Cantor (1845–1918). In the late nineteenth century, Cantor dared to do what Gauss had forbidden: treat infinity as a complete and finished object.

Actual infinity: Cantor’s wager

Cantor proposed that we can consider the complete collection of natural numbers $ \{ 1, 2, 3, 4, \ldots \} $ as a finished mathematical object and work with it as if it were a “thing.” This is actual infinity.

Analogy: Think of an imaginary hotel with infinitely many rooms (Hilbert’s famous Hotel). Potential infinity says “I can always build one more room.” Actual infinity says “the hotel already has all its rooms, and we assume that all of them have been numerically counted.”

Not all infinities are equal

Cantor’s most astonishing discovery was that some infinities are larger than others. He proved that:

$ \mathbb{N} $ (the naturals) and $ \mathbb{Q} $ (the rationals) have the same cardinality (number of elements): $ \aleph_0 $. Both are countably infinite (they can be placed in one-to-one correspondence with the natural numbers).
$ \mathbb{R} $ (the reals) has a strictly greater cardinality: $ 2^{\aleph_0} $. It is an uncountable infinity.

As an aside, what is the difference between $ \infty $ and $ \aleph_0 $? Although both symbols represent infinity, they differ in some important ways:

$ \infty $ $ \aleph_0 $
What is it? A directional symbol: “grow without bound” A concrete cardinal number: the size of $ \mathbb{N} $
Where is it used? In calculus and analysis ($ \lim_{x \to \infty} $, $ \sum_{n=1}^{\infty} $) In set theory (cardinality)
Is it a number? No — it is an idea of “without bound” Yes — it is the first transfinite number
Type of infinity Potential (“never arrives”) Actual (“already complete”)

In short: the symbol $ \infty $ tells us “this keeps growing without bound” while $ \aleph_0 $ tells us “this already has exactly this many complete elements.” Essentially, Cantor invented $ \aleph_0 $ precisely to distinguish it from $ \infty $, since $ \infty $ was not the right symbol for comparing the sizes of infinite sets, as explained in the table above.

	\( \infty \)	\( \aleph_0 \)
What is it?	A directional symbol: “grow without bound”	A concrete cardinal number: the size of \( \mathbb{N} \)
Where is it used?	In calculus and analysis (\( \lim_{x \to \infty} \), \( \sum_{n=1}^{\infty} \))	In set theory (cardinality)
Is it a number?	No — it is an idea of “without bound”	Yes — it is the first transfinite number
Type of infinity	Potential (“never arrives”)	Actual (“already complete”)

Back to the main thread. To prove that the reals are uncountable, Cantor invented his celebrated diagonal argument. Through a proof by contradiction, he shows that any countable list of real numbers between 0 and 1 omits at least one number, by changing the digits along the diagonal of the list (a method we will explain in detail in a future post). This proved that the set of real numbers is strictly larger than the set of natural numbers. The following table illustrates — without full explanation — the central idea of Cantor’s argument:

Naturals	Set (0–1)
1	0.26549874495010086…
2	0.08755531372264108…
3	0.40052195295295490…
4	0.72157637954770104…
5	0.84880125940120761…
6	0.92651926519702301…
7	0.21222469637354102…
8	0.17050000232864141…
$⋮$ ⋮	$⋮$ ⋮

Cantor’s infinite list

Note: Cantor’s diagonal argument is a profound result that will be developed with full rigor in a later post dedicated to the real numbers and transfinite cardinal arithmetic. For now, the key takeaway is simply this: not all infinities are the same size.

The opposition: Kronecker against Cantor

Not everyone accepted these ideas. Leopold Kronecker, one of the most influential mathematicians of the era (and a former teacher of Cantor), opposed them fiercely. Kronecker was a constructivist: he believed that only mathematical objects that could be constructed in a finite number of steps were admissible — essentially, Gauss’s stance taken to its extreme.

His celebrated phrase sums up his position:

“God made the integers; all else is the work of man.” — Leopold Kronecker

Kronecker dismissed Cantor’s ideas as “mathematical madness,” blocked the publication of his papers in the influential Journal de Crelle, and publicly vilified him, calling him a “corrupter of youth.” This persecution, compounded by the general incomprehension of the academic community, had a devastating impact on Cantor’s mental health; he suffered episodes of depression for the rest of his life.

Even after Kronecker’s death, the resistance continued. In the early twentieth century, L.E.J. Brouwer led the intuitionist movement, which attempted to return mathematics to potential infinity: since we cannot “finish constructing” an infinite set in our minds, it cannot be considered “realized.”

Despite everything, history vindicated Cantor. But this did not happen overnight.

From “charlatanism” to the foundation of mathematics: how Cantor’s theory was accepted

The transition of set theory — from a rejected idea to a universal foundation — was a process that took decades and required strategic allies, unexpectedly practical applications, and ultimately a generational shift.

Support from the “heavyweights”

Cantor was fortunate to have influential allies at a moment when the criticism threatened to bury his work:

Richard Dedekind: More than a colleague, he was an intellectual collaborator. Dedekind published parallel work that helped rigorously define what an “infinite set” is, validating Cantor’s intuition with formal tools.
David Hilbert: The most influential mathematician of his generation, Hilbert elevated Cantor’s theory to an almost sacred status. His defense was both political and technical: he included Cantor’s problems (such as the Continuum Hypothesis, which we will examine below) in his celebrated list of 23 problems for the twentieth century (1900), effectively forcing the entire mathematical community to work on them. His famous remark on the matter became a rallying cry:

“No one shall expel us from the paradise that Cantor has created for us.” — David Hilbert (1926)

From metaphysics to practical utility

Cantor’s great problem was that talking about infinities “larger than others” sounded like theology, not mathematics. But the theory became indispensable for eminently practical reasons:

Real analysis: Mathematicians were stuck trying to understand trigonometric series and the points of discontinuity in functions. Set theory made it possible to “classify” and “measure” these complex point-sets in a way that had previously been impossible.
The Lebesgue measure: Henri Lebesgue used Cantor’s ideas to revolutionize the concept of the integral. Without set theory, much of the advanced calculus used today in physics and engineering would not exist.

The search for a universal language

In the late nineteenth century, mathematics was an archipelago of separate islands: geometry on one side, arithmetic on another, algebra on a third.

It was discovered that virtually any mathematical object — a number, a line, a function, a relation — could be defined as a set. This promised a true “Theory of Everything” for mathematics. By accepting Cantor’s sets, mathematicians finally had a unified language in which everything fit together.

Axiomatization: damage control

The initial rejection that greeted this theory had a legitimate basis: Cantor’s original (“naïve”) theory permitted devastating paradoxes, such as Russell’s paradox (which we will examine shortly). To save the theory without surrendering its power:

Zermelo and Fraenkel established clear rules (axioms) that prohibited the formation of sets that were “too large” or contradictory.
By setting precise limits, the theory ceased to be inconsistent. It went from being an incomplete idea to a system of impeccable logical rules: what is now called ZFC, the standard used to this day in mathematical theories far more advanced than elementary set theory can handle — though elementary set theory remains perfectly useful in more basic areas of mathematics.

We will examine these flaws and their corrections in the sections that follow.

The generational shift and the fall of Kronecker

The opposition to Cantor was not purely intellectual — it was a war of academic power and influence.

The outcome: When Kronecker died in 1891, the main barrier collapsed. The new talents — such as Felix Klein and Hilbert himself — did not share the philosophical prejudices of the previous generation. They saw in Cantor’s work a creative freedom (what Cantor himself called “the freedom of mathematics”) that their predecessors had denied them.
Over time, set theory went from being a contested curiosity to the invisible foundation on which all of modern mathematics is built.

The paradoxes of naïve set theory

Cantor’s naïve theory rested on an apparently harmless rule called the Unrestricted Comprehension Principle: for any property $ P(x) $, one can form the set of all objects satisfying it, $ {x \mid P(x)} $, without any restriction. This principle works perfectly well with “normal” properties (such as “being a prime number”). But when the property refers to the set itself, everything collapses — as the following three paradoxes demonstrate.

Russell’s Paradox (1901)

Bertrand Russell formulated the most famous and devastating contradiction in naïve set theory. His reasoning is strikingly simple.

He asked: Can a set contain itself as an element?

The set of all the books in a library is not a book, so it does not contain itself.
But if we define $ T = \{ \text{all sets with more than 3 elements} \} $, and $ T $ itself turns out to have more than 3 elements, then $ T \in T $. It contains itself!

Russell then defines a special set:

\[ R = \{ x \mid x \notin x \} \]

That is, $ R $ is “the set of all sets that do not contain themselves.”

According to the Comprehension Principle, this set should exist. But when we ask whether $ R $ belongs to itself, we reach a contradiction:

Assumption	Consequence	Outcome
$ R \in R $	Then $ R $ satisfies $ x \notin x $, so $ R \notin R $	Contradiction
$ R \notin R $	Then $ R $ satisfies the property, so $ R \in R $	Contradiction

We are left with: $ R \in R \iff R \notin R $. A logical absurdity.

The Barber Paradox (an everyday version of the same idea):

In a town there is a single barber who shaves all the men who do not shave themselves, and only those men.

Question: Does the barber shave himself?

If he does → he should not (he only shaves those who do not shave themselves).

If he does not → he should (he shaves all those who do not shave themselves).

Conclusion: Such a barber cannot exist. And in exactly the same way, such a set $ R $ cannot exist. But the Comprehension Principle says it must exist. This means the principle is flawed.

The Burali-Forti Paradox (1897)

Even before Russell, the Italian mathematician Cesare Burali-Forti detected an anomaly involving ordinal numbers. While cardinals measure “how many” elements a set has, ordinals measure “what position” each element occupies within a well-ordered sequence: 1st, 2nd, 3rd, …, and beyond infinity: $ \omega $ (the first infinite ordinal), $ \omega + 1 $, $ \omega + 2 $, and so on.

If we form the “set of all ordinals” $ \Omega $, this set would have its own ordinal (call it $ \alpha $), which by definition would be greater than any ordinal in $ \Omega $. But $ \Omega $ contains all ordinals, so $ \alpha $ would have to be inside $ \Omega $. This forces $ \alpha < \alpha $, which is absurd.

In plain terms: The ordinals form an infinite staircase. If you try to put them all into a box, the box itself generates a new step that should have been inside — but cannot be.

Cantor’s Paradox (ironic, isn’t it?)

Cantor himself discovered a paradox involving the power set theorem — the very theorem that made him famous. Recall that the power set $ \mathcal{P}(A) $ is the set of all subsets of $ A $. For example, if $ A = \{ 1, 2 \} $, then $ \mathcal{P}(A) = \{ \emptyset, \{ 1 \}, \{ 2 \}, \{ 1, 2 \} \} $ — it has 4 elements, more than the 2 in the original. The theorem generalizes this:

Cantor’s Theorem: For any set $ A $: $ |\mathcal{P}(A)| > |A| $. That is, the power set is always strictly larger than the original set.

If there existed a “universal set” $ U $ containing absolutely everything, then $ \mathcal{P}(U) $ would be a collection of sets, so $ \mathcal{P}(U) \subset U $, which would imply $ |\mathcal{P}(U)| \leq |U| $. But Cantor’s theorem says $ |\mathcal{P}(U)| > |U| $. Contradiction.

In plain terms: There cannot be a “set of everything,” because one can always construct something larger using the power set. There is no supreme infinity.

Summary of the paradoxes

Paradox	Year	Mechanism	What it shows
Burali-Forti	1897	Set of all ordinals	There can be no maximum ordinal
Cantor	~1899	Universal set + power set	There is no “set of everything”
Russell	1901	Self-reference + negation	The Comprehension Principle is inconsistent

How far does naïve set theory reach?

Despite these failures, naïve set theory remains perfectly valid in the vast majority of applications. When can we use it with confidence, and when can we not?

Field	Level of risk	Is naïve set theory enough?
Primary and secondary education	None	✔ Entirely sufficient
Engineering and computing	Low	✔ Finite and bounded models
Real analysis and probability	Medium	✔ With implicit conventions
Mathematical logic	Very high	✘ Requires formal axioms (ZFC)
Category theory	Critical	✘ Requires classes or universes

Practical rule: If you are working with subsets of a fixed, well-defined universe (such as $ \mathbb{R} $, or the students at a school, or the bits of a program), naïve set theory is safe. Problems arise only when you try to talk about the “set of all sets” or about self-referential collections.

What naïve set theory cannot resolve

Beyond the classical paradoxes, there are contemporary problems that reveal the limits of naïve set theory.

The Continuum Hypothesis

What is cardinality?

Before stating the hypothesis, we need a key idea: the cardinality of a set is, informally, its “size” — the number of elements it contains. For finite sets this is straightforward: $ \{ a, b, c \} $ has cardinality 3. But for infinite sets, things get strange.

Our intuition tells us that a “larger-looking” set should be “bigger.” Yet Cantor showed that this intuition is deeply misleading when it comes to infinite sets:

Set	Naïve intuition	Actual cardinality
$ \mathbb{N} = \{ 1, 2, 3, 4, \ldots \} $	The baseline	$ \aleph_0 $
$ \mathbb{Z} = \{ \ldots, -2, -1, 0, 1, 2, \ldots \} $	“Should be twice as big” (includes negatives)	$ \aleph_0 $ — the same
$ \mathbb{Q} $ (the rationals)	“Should be much larger” (infinitely many between 0 and 1)	$ \aleph_0 $ — the same
$ \mathbb{R} $ (the reals)	“Obviously bigger”	$ 2^{\aleph_0} $ — strictly larger

This is counterintuitive: between 0 and 1 there are infinitely many rational numbers ($ 1/2, 1/3, 2/5, 7/11, \ldots $), which makes $ \mathbb{Q} $ seem “enormously larger” than $ \mathbb{N} $. Yet Cantor proved that the rationals are enumerable — that is, they can be placed in one-to-one correspondence with the natural numbers — using an ingenious zigzag traversal technique that we will cover in a future post. The same applies to the integers $ \mathbb{Z} $: although they include the negatives, they are also countable.

Fundamental lesson: The “size” of an infinite set is not measured by how “dense” or “spread out” it seems, but by whether it can be put in one-to-one correspondence with the natural numbers. If it can, it has cardinality $ \aleph_0 $ (it is countable). If it cannot — as is the case with $ \mathbb{R} $ — then it is strictly larger.

Cantor’s conjecture

With this context in place, the Continuum Hypothesis reads as follows: Cantor conjectured that there is no cardinal between $ \aleph_0 $ (the naturals) and $ 2^{\aleph_0} $ (the reals). In other words, the “next” infinity after the countable ones would be directly the infinity of the reals, with nothing in between.

This question turned out to be undecidable: it can be neither proved nor refuted from the standard axioms of mathematics.

Kurt Gödel (1938): Proved that the Continuum Hypothesis cannot be refuted within ZFC.
Paul Cohen (1963): Proved that it cannot be proved either.

Analogy: The axioms of ZFC are like the rules of chess. The Continuum Hypothesis is a position on the board where the rules do not determine whether you win or lose. You can add a new rule declaring it true, or one declaring it false, and the game remains consistent in either case.

This reveals that set theory, even in its axiomatic form, is not a complete system: there are truths about infinity that lie beyond its reach.

Category theory and collections “too large” to be sets

There is an advanced branch of mathematics — category theory — that needs to include collections such as “all sets” or “all groups.” But as we have just seen, collections of that size recreate exactly the paradoxes of Cantor and Russell. This forced the development of extensions of ZFC capable of handling these “giant” collections without contradiction. We will examine the main such extension — the NBG system — in the sections that follow.

The solution: formal theories of sets

Zermelo-Fraenkel with Choice (ZFC)

To eliminate the paradoxes, mathematicians replaced the Unrestricted Comprehension Principle with a system of carefully designed axioms. The dominant system today is ZFC (Zermelo-Fraenkel with the Axiom of Choice), consisting of 9 axioms, each serving a specific role:

The 9 axioms of ZFC

Group 1 — The ground rules (what a set is and how sets are compared):

#	Axiom	What it says (intuitively)	What problem it solves
1	Extensionality	Two sets are equal if and only if they have exactly the same elements.	Defines when two sets are “the same.” Without this, there would be no way to compare sets.
2	Empty set	There exists at least one set: the empty set $ \emptyset $.	Guarantees that the theory does not speak about nothingness — there is a starting point.
3	Regularity (Foundation)	Every nonempty set contains an element that shares no members with it.	Prohibits self-reference: implies $ x \notin x $ for every set, eliminating the root of Russell’s Paradox.

Group 2 — The construction tools (how to build new sets from existing ones):

#	Axiom	What it says (intuitively)	What problem it solves
4	Pairing	Given two sets $ a $ and $ b $, the set $ \{ a, b \} $ exists.	Allows the construction of pairs. Without this, we could not group two objects together.
5	Union	Given a set of sets, there exists a set that “flattens” everything into a single collection.	Allows merging collections: $ \bigcup \{ \{1, 2 \}, \{ 3 \} \} = \{ 1, 2, 3 \} $.
6	Separation (Specification)	Given a set $ A $ and a property $ P $, the set $ \{ x \in A \mid P(x) \} $ exists.	Replaces Unrestricted Comprehension. Only creates subsets of already-existing sets, preventing sets “from nothing.”
7	Power set	For any set $ A $, the set $ \mathcal{P}(A) $ (the set of all its subsets) exists.	Allows “leveling up” in size. Guarantees the existence of larger infinities.

Group 3 — The power tools (for handling infinity and choice):

#	Axiom	What it says (intuitively)	What problem it solves
8	Infinity	There exists at least one infinite set (essentially, $ \mathbb{N} $).	Without this axiom, all sets would be finite. It guarantees that infinity exists as an object.
9	Choice	Given a family of nonempty sets, one can always “choose” one element from each simultaneously.	Allows infinitely many choices at once. It is essential for proving many theorems in analysis, algebra, and topology.

Note: Some authors also count the Axiom of Replacement (the image of a set under a function is also a set) as part of ZFC, bringing the total to 10. This axiom was added by Fraenkel and is what allows the construction of the full hierarchy of ordinals and transfinite infinities that Cantor envisioned.

The key correction: Unrestricted Comprehension → Separation

The essential difference between naïve set theory and ZFC comes down to a single change:

	Unrestricted Comprehension (Cantor)	Axiom of Separation (Zermelo)
Formula	$ B = \{ x \mid P(x) \} $	$ B = \{x \in A \mid P(x) \} $
Requirement	None — any property creates a set	Requires a prior set $ A $ from which to “separate”
Result	Can create sets “from nothing”	Only creates subsets of existing sets
Safety	✘ Inconsistent (paradoxes)	✔ Consistent

Why does this fix Russell’s Paradox?

With Separation, $ R = \{ x \in A \mid x \notin x \} $ simply says that certain elements of $ A $ do not contain themselves. This produces no contradiction — it simply proves that $ R \notin A $, meaning $ R $ does not belong to the set from which it was extracted. No absurdity, just a result about the relationship between $ R $ and $ A $.

Complementarily, the Axiom of Regularity directly prohibits any set from containing itself:

\[ \forall x \neq \emptyset,\ \exists y \in x : y \cap x = \emptyset \]

This implies $ x \notin x $ for every set $ x $, eliminating at the root the self-reference that fueled the paradoxes.

In short: Naïve set theory had a single all-powerful rule (Comprehension) that turned out to be contradictory. ZFC replaces it with 9 specialized rules that, taken together, allow all of mathematics to be built without falling into paradoxes — each axiom does exactly what is needed, no more and no less.

The problem of category theory and the Von Neumann-Bernays-Gödel (NBG) solution

First, what is category theory?

Before explaining NBG, it helps to understand why it is needed. Category theory is a branch of mathematics that studies mathematical structures not by what they contain, but by how they relate to one another. Rather than examining the individual elements of a group, a space, or a set, category theory “zooms out” and observes the transformations (called morphisms) between all structures of the same type.

Analogy: If set theory is a microscope (it examines the elements inside each set), category theory is a satellite map (it sees how sets, groups, spaces, and other structures connect to one another at a large scale).

The problem is that, to do this, category theory needs to talk about things like “the category of all sets” or “the category of all groups” — collections so vast that ZFC forbids them (they would be “sets of all sets,” recreating the paradoxes). This is where NBG comes in.

The solution: sets vs. proper classes

The NBG system introduces an elegant distinction between two levels of collection:

	Sets	Proper classes
Size	“Small”	“Too large”
Can it be an element of another collection?	Yes	No
Examples	$ \mathbb{N}, \mathbb{R}, \{ 1, 2, 3 \} $	The class of all sets, the class of all ordinals

This distinction resolves the paradoxes elegantly: the collection of all sets exists as a proper class, but since it cannot be an element of anything, it cannot be applied to itself — cutting off the self-reference that feeds the paradoxes.

Conclusion

The elementary set theory we study in class is like a reliable car: it gets us everywhere in everyday life. But if we try to take it to a Formula 1 race (advanced logic, category theory, infinities of infinities), we need something more robust.

Russell’s, Burali-Forti’s, and Cantor’s paradoxes did not destroy set theory — they improved it. The result was axiomatic systems such as ZFC and NBG, which:

Are robust: They eliminate the paradoxes by restricting which collections can be sets.
Are compact: They reduce the full complexity of mathematics to a handful of fundamental axioms.
Are honest: They acknowledge their own limits (such as the undecidability of the Continuum Hypothesis).

For the student of basic mathematics, the most important lesson is this: the theory we use in class is correct within its domain. The paradoxes arise only when we try to form sets that are “too ambitious” — such as the set of all sets or the set of all ordinals. As long as we work within a well-defined universe and avoid self-reference, the ground is firm. And as a reminder: