Files
copycat/slipnet_analysis/slipnet_depth_analysis.tex
Alex Linhares 50b6fbdc27 Add slipnet analysis: depth vs topology correlation study
Analysis shows no significant correlation between conceptual depth
and hop distance to letter nodes (r=0.281, p=0.113). Includes
Python scripts, visualizations, and LaTeX paper.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 20:58:15 +00:00

383 lines
19 KiB
TeX

\documentclass[11pt,twocolumn]{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{amsmath,amssymb}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{hyperref}
\usepackage[margin=1in]{geometry}
\usepackage{natbib}
\usepackage{float}
\usepackage{caption}
\usepackage{subcaption}
\title{No Significant Relationship Between Conceptual Depth and Graph Distance to Concrete Letter Nodes in the Copycat Slipnet}
\author{
Slipnet Analysis Project\\
\texttt{slipnet\_analysis/}
}
\date{\today}
\begin{document}
\maketitle
\begin{abstract}
The Copycat system, developed by Douglas Hofstadter and Melanie Mitchell, employs a semantic network called the slipnet where each node has a ``conceptual depth'' parameter intended to capture its level of abstraction. We investigate whether conceptual depth correlates with the topological distance (hop count) from abstract concept nodes to concrete letter nodes (a--z). Using breadth-first search on an undirected graph representation of the slipnet, we computed minimum hop distances for 33 non-letter nodes, assigning unreachable nodes a penalty distance of $2 \times \max(\text{hops})$. Statistical analysis reveals no significant correlation between conceptual depth and hop count (Pearson $r = 0.281$, $p = 0.113$; Spearman $\rho = 0.141$, $p = 0.433$). The coefficient of determination ($R^2 = 0.079$) indicates that conceptual depth explains only 7.9\% of the variance in hop distance. These findings demonstrate that conceptual depth and network topology are orthogonal design dimensions in the Copycat architecture.
\end{abstract}
\section{Introduction}
The Copycat project, developed by Douglas Hofstadter and Melanie Mitchell in the 1980s and 1990s, represents a landmark effort in computational cognitive science to model analogical reasoning \citep{mitchell1993,hofstadter1995}. The system operates on letter-string analogy problems of the form ``if abc changes to abd, what does ppqqrr change to?'' While the domain is deliberately simple, the underlying cognitive architecture embodies sophisticated principles about how concepts are represented and manipulated during reasoning.
\subsection{The Slipnet Architecture}
Central to Copycat's operation is the \emph{slipnet}, a semantic network containing 59 nodes representing concepts relevant to the letter-string domain. These concepts span multiple levels of abstraction:
\begin{itemize}
\item \textbf{Concrete letters}: The 26 lowercase letters (a--z), representing the atomic units of the problem domain
\item \textbf{Numeric lengths}: The numbers 1--5, used to describe group sizes
\item \textbf{Positional concepts}: \texttt{leftmost}, \texttt{rightmost}, \texttt{first}, \texttt{last}, \texttt{middle}
\item \textbf{Relational concepts}: \texttt{successor}, \texttt{predecessor}, \texttt{sameness}
\item \textbf{Category concepts}: \texttt{letterCategory}, \texttt{bondCategory}, \texttt{groupCategory}
\item \textbf{Meta-concepts}: \texttt{opposite}, \texttt{identity}
\end{itemize}
The slipnet contains 202 directed links connecting these nodes. When converted to an undirected graph, this yields 104 unique edges after removing directional duplicates.
\subsection{Conceptual Depth}
Each slipnet node has a \emph{conceptual depth} parameter, a numeric value between 10 and 90 representing its level of abstraction. Hofstadter and Mitchell intended this parameter to capture the ``deepness'' of a concept---how far removed it is from surface-level, perceptual features:
\begin{itemize}
\item Letter nodes (a--z): depth = 10 (most concrete)
\item \texttt{letter}: depth = 20
\item \texttt{letterCategory}, numbers 1--5: depth = 30
\item \texttt{leftmost}, \texttt{rightmost}, \texttt{middle}: depth = 40
\item \texttt{predecessor}, \texttt{successor}: depth = 50
\item \texttt{first}, \texttt{last}, \texttt{length}: depth = 60
\item \texttt{stringPositionCategory}, \texttt{directionCategory}: depth = 70
\item \texttt{sameness}, \texttt{samenessGroup}, \texttt{group}: depth = 80
\item \texttt{opposite}, \texttt{identity}, \texttt{bondFacet}, \texttt{objectCategory}: depth = 90 (most abstract)
\end{itemize}
The conceptual depth influences Copycat's behavior in several ways: it affects activation spreading dynamics, it modulates the system's preference for discovering ``deep'' versus ``shallow'' analogies, and it contributes to the calculation of conceptual similarity between structures.
\subsection{Research Question}
A natural hypothesis is that deeper (more abstract) concepts should be topologically farther from concrete letters in the network. After all, if conceptual depth represents abstraction level, one might expect that reaching abstract concepts requires traversing more edges from the concrete letter nodes. We test this hypothesis using hop count---the minimum number of edges to traverse---as an Erd\H{o}s number-style metric, with letters serving as the ``center'' analogous to Erd\H{o}s himself.
\section{Methods}
\subsection{Data Extraction}
The slipnet structure was extracted from the original Copycat Python implementation and serialized to JSON format. The extraction preserved all 59 nodes with their attributes (name, conceptual depth, intrinsic link length) and all 202 directed links with their attributes (source, destination, fixed length, type, optional label).
\subsection{Graph Construction}
We constructed an undirected graph $G = (V, E)$ from the slipnet using the NetworkX library. Each node in the slipnet became a vertex in $G$, and each directed link became an undirected edge. When multiple directed links existed between the same pair of nodes (e.g., both \texttt{a}$\to$\texttt{b} and \texttt{b}$\to$\texttt{a}), they were collapsed into a single undirected edge. This yielded $|V| = 59$ vertices and $|E| = 104$ edges.
\subsection{Hop Count Computation}
For each non-letter node $v \in V$, we computed the minimum number of edges to reach any letter node $\ell \in L$ where $L = \{a, b, c, \ldots, z\}$:
\begin{equation}
\text{hops}(v) = \min_{\ell \in L} |P(v, \ell)| - 1
\end{equation}
where $P(v, \ell)$ is the shortest path (sequence of vertices) from $v$ to $\ell$. The subtraction of 1 converts path length (number of vertices) to hop count (number of edges).
This metric is analogous to an Erd\H{o}s number, with the 26 letter nodes collectively playing the role of Erd\H{o}s. A node with hop count 1 is directly connected to at least one letter; a node with hop count 2 is connected to a node that is connected to a letter; and so on.
\subsection{Handling Unreachable Nodes}
Five nodes in the slipnet are topologically disconnected from the letter subgraph. Rather than exclude these nodes from analysis, we assigned them a penalty distance:
\begin{equation}
\text{hops}_{\text{unreachable}} = 2 \times \max_{v \in V_{\text{reachable}}} \text{hops}(v)
\end{equation}
With the maximum observed hop count among reachable nodes being 4, unreachable nodes were assigned $\text{hops} = 8$. This approach ensures all 33 non-letter nodes are included in the analysis while appropriately penalizing disconnected nodes.
\subsection{Statistical Analysis}
We computed both Pearson's correlation coefficient $r$ (measuring linear relationship) and Spearman's rank correlation $\rho$ (measuring monotonic relationship) between conceptual depth and hop count. Statistical significance was assessed at $\alpha = 0.05$.
Linear regression was performed to characterize any trend:
\begin{equation}
\text{hops} = \beta_0 + \beta_1 \times \text{depth} + \epsilon
\end{equation}
The coefficient of determination $R^2$ was computed to quantify the proportion of variance in hop count explained by conceptual depth.
\section{Results}
\subsection{Network Connectivity}
Of the 59 total nodes, 26 are letter nodes (which have hop count 0 by definition) and 33 are non-letter concept nodes. Among these 33 nodes, 28 are reachable from at least one letter and 5 are disconnected from the letter subgraph. The five disconnected nodes are:
\begin{itemize}
\item \texttt{identity} (depth = 90)
\item \texttt{opposite} (depth = 90)
\item \texttt{objectCategory} (depth = 90)
\item \texttt{group} (depth = 80)
\item \texttt{letter} (depth = 20)
\end{itemize}
\subsection{Hop Distribution}
Table~\ref{tab:hops} shows the distribution of hop counts among all 33 non-letter nodes.
\begin{table}[H]
\centering
\caption{Distribution of minimum hops to letter nodes}
\label{tab:hops}
\begin{tabular}{ccp{4.5cm}}
\toprule
Hops & Count & Example Nodes \\
\midrule
1 & 3 & \texttt{letterCategory}, \texttt{first}, \texttt{last} \\
2 & 6 & \texttt{leftmost}, \texttt{length}, \texttt{bondFacet} \\
3 & 12 & Numbers 1--5, \texttt{sameness}, \texttt{groupCategory} \\
4 & 7 & \texttt{bondCategory}, \texttt{predecessor}, \texttt{middle} \\
8 & 5 & \texttt{identity}, \texttt{opposite}, \texttt{letter} (unreachable) \\
\bottomrule
\end{tabular}
\end{table}
The distribution shows most nodes (28 of 33) within 4 hops of a letter, with 5 nodes forming a disconnected cluster.
\subsection{Descriptive Statistics}
Table~\ref{tab:descriptive} summarizes the distributions of conceptual depth and hop count.
\begin{table}[H]
\centering
\caption{Descriptive statistics for analyzed nodes (n=33)}
\label{tab:descriptive}
\begin{tabular}{lcc}
\toprule
Statistic & Depth & Hops \\
\midrule
Minimum & 20 & 1 \\
Maximum & 90 & 8 \\
Mean & 55.76 & 3.61 \\
Std. Dev. & 21.89 & 2.04 \\
\bottomrule
\end{tabular}
\end{table}
\subsection{Correlation Analysis}
The correlation analysis yielded the following results:
\begin{itemize}
\item Pearson correlation: $r = 0.281$, $p = 0.113$
\item Spearman correlation: $\rho = 0.141$, $p = 0.433$
\item Coefficient of determination: $R^2 = 0.079$
\item Linear regression: $\text{hops} = 0.026 \times \text{depth} + 2.14$
\end{itemize}
Neither correlation coefficient approaches statistical significance. The p-values of 0.113 and 0.433 are above the 0.05 threshold. The $R^2$ of 0.079 indicates that conceptual depth explains only 7.9\% of the variance in hop count---a weak effect at best.
The regression slope of $0.026$ suggests that a 10-point increase in conceptual depth predicts only a 0.26 increase in hop count---modest compared to the 2.04 standard deviation of hops.
\subsection{Visualization}
Figure~\ref{fig:scatter} displays the scatter plot of conceptual depth versus minimum hops. Unreachable nodes (hops=8) are shown in red. The wide spread of depths at each hop level and the weak regression line visually confirm the absence of any strong relationship.
\begin{figure}[H]
\centering
\includegraphics[width=\columnwidth]{depth_hops_correlation.png}
\caption{Scatter plot of conceptual depth versus minimum hops to nearest letter node. Reachable nodes (blue) and unreachable nodes (red, assigned hops=$2 \times 4 = 8$) are distinguished. Points are jittered vertically for visibility. The dashed line shows the linear regression fit.}
\label{fig:scatter}
\end{figure}
\subsection{Counterexamples}
The data reveal striking counterexamples to any depth-distance relationship:
\begin{enumerate}
\item \textbf{High depth, few hops}: \texttt{bondFacet} (depth=90, the maximum) is only 2 hops from a letter. Similarly, \texttt{samenessGroup} and \texttt{alphabeticPositionCategory} (both depth=80) are also just 2 hops away.
\item \textbf{Low depth, many hops}: The \texttt{letter} node (depth=20) is completely disconnected from actual letters despite being the object-type concept for them. The number nodes 1--5 (depth=30) all require 3 hops to reach a letter.
\item \textbf{Same depth, different hops}: At depth=90, \texttt{bondFacet} needs only 2 hops while \texttt{identity}, \texttt{opposite}, and \texttt{objectCategory} are completely unreachable---a dramatic difference.
\item \textbf{Same hops, different depths}: Nodes at 2 hops have depths ranging from 40 (\texttt{leftmost}) to 90 (\texttt{bondFacet})---the full 50-point range.
\item \textbf{Unreachable nodes span depths}: The 5 disconnected nodes have depths of 20, 80, and 90---covering most of the depth range despite all being topologically equivalent (infinitely far from letters).
\end{enumerate}
\section{Discussion}
\subsection{Orthogonal Design Dimensions}
The weak, non-significant correlation ($r = 0.281$, $p = 0.113$) demonstrates that conceptual depth and network topology were designed as largely independent dimensions. This orthogonality is architecturally meaningful:
\begin{enumerate}
\item \textbf{Network topology} determines which concepts can activate each other through spreading activation. Two nodes connected by an edge can directly influence each other's activation levels during reasoning.
\item \textbf{Conceptual depth} modulates how the system values discoveries at different abstraction levels. Deeper concepts, when activated, contribute more to the system's sense of having found a ``good'' analogy.
\end{enumerate}
By keeping these dimensions independent, the slipnet can connect concepts that need to interact (regardless of depth) while separately encoding their semantic abstraction level.
\subsection{The Disconnected Cluster}
The five disconnected nodes form a coherent subsystem:
\begin{itemize}
\item \texttt{identity} and \texttt{opposite}: These exist primarily as labels on slip links, not as endpoints in the graph. They track activation for meta-level relationship concepts.
\item \texttt{letter}, \texttt{group}, \texttt{objectCategory}: These form an isolated cluster representing the object-type hierarchy. They classify workspace objects but don't connect to the letter-category network.
\end{itemize}
Notably, the \texttt{letter} concept (depth=20, relatively concrete) is disconnected while \texttt{letterCategory} (depth=30) is directly connected to all 26 letters. This distinction between ``letter-as-type'' and ``letter-as-category'' further illustrates how topology and depth serve different purposes.
\subsection{Hub Structure}
Analysis of the shortest paths reveals that routes to letters converge on gateway nodes:
\begin{itemize}
\item \texttt{first} $\to$ \texttt{a}: Property link providing direct access
\item \texttt{last} $\to$ \texttt{z}: Property link providing direct access
\item \texttt{letterCategory} $\to$ any letter: Instance links to all 26 letters
\end{itemize}
The \texttt{letterCategory} node is particularly important, serving as a central hub. This makes it the primary gateway between abstract concepts and concrete letters, explaining why many paths route through it.
\subsection{Implications}
Our findings have implications for understanding and extending the Copycat architecture:
\begin{enumerate}
\item \textbf{For analysis}: Attempting to infer conceptual depth from topology---or vice versa---would be misguided. They encode different information.
\item \textbf{For extensions}: New concepts added to the slipnet can be placed topologically based on needed associations, with depth set independently based on abstraction level.
\item \textbf{For interpretation}: The slipnet's representational power comes from having multiple orthogonal dimensions, not from a single unified hierarchy.
\end{enumerate}
\subsection{Limitations}
Several limitations should be noted:
\begin{enumerate}
\item \textbf{Sample size}: With 33 nodes, statistical power is limited, though this represents the complete population of non-letter nodes.
\item \textbf{Penalty assignment}: The choice of $2 \times \max(\text{hops})$ for unreachable nodes is somewhat arbitrary. However, alternative penalty values (e.g., $3 \times \max$ or $\infty$) would likely strengthen our conclusion.
\item \textbf{Undirected assumption}: We treated edges as undirected. Analysis of directed paths might differ.
\item \textbf{Single metric}: Hop count is one of many possible graph metrics. Centrality measures or spectral properties might reveal different patterns.
\end{enumerate}
\section{Conclusion}
There is no statistically significant relationship between conceptual depth and hop distance to letter nodes in the Copycat slipnet. With Pearson $r = 0.281$ ($p = 0.113$), Spearman $\rho = 0.141$ ($p = 0.433$), and $R^2 = 0.079$, conceptual depth explains less than 8\% of the variance in topological distance---and this weak positive trend fails to reach significance.
This finding supports the view that the slipnet employs two orthogonal representational dimensions: network topology (governing associative access and activation flow) and conceptual depth (governing abstraction-level preferences in reasoning). This separation allows independent tuning of each dimension and may contribute to the slipnet's representational flexibility.
\section*{Data Availability}
All analysis scripts and data are available in the \texttt{slipnet\_analysis/} directory:
\begin{itemize}
\item \texttt{slipnet.json}: Complete network with computed paths
\item \texttt{compute\_letter\_paths.py}: Hop computation script
\item \texttt{plot\_depth\_distance\_correlation.py}: Statistical analysis and plotting
\item \texttt{compute\_stats.py}: Detailed statistics computation
\end{itemize}
\appendix
\section{Complete Data}
Table~\ref{tab:complete} presents all 33 analyzed nodes sorted by hop count and depth.
\begin{table}[H]
\centering
\caption{All analyzed nodes sorted by hop count, then depth}
\label{tab:complete}
\small
\begin{tabular}{lccc}
\toprule
Node & Depth & Hops & Reachable \\
\midrule
letterCategory & 30 & 1 & Yes \\
first & 60 & 1 & Yes \\
last & 60 & 1 & Yes \\
\midrule
leftmost & 40 & 2 & Yes \\
rightmost & 40 & 2 & Yes \\
length & 60 & 2 & Yes \\
samenessGroup & 80 & 2 & Yes \\
alphabeticPositionCategory & 80 & 2 & Yes \\
bondFacet & 90 & 2 & Yes \\
\midrule
1 & 30 & 3 & Yes \\
2 & 30 & 3 & Yes \\
3 & 30 & 3 & Yes \\
4 & 30 & 3 & Yes \\
5 & 30 & 3 & Yes \\
left & 40 & 3 & Yes \\
right & 40 & 3 & Yes \\
predecessorGroup & 50 & 3 & Yes \\
successorGroup & 50 & 3 & Yes \\
stringPositionCategory & 70 & 3 & Yes \\
sameness & 80 & 3 & Yes \\
groupCategory & 80 & 3 & Yes \\
\midrule
middle & 40 & 4 & Yes \\
single & 40 & 4 & Yes \\
whole & 40 & 4 & Yes \\
predecessor & 50 & 4 & Yes \\
successor & 50 & 4 & Yes \\
directionCategory & 70 & 4 & Yes \\
bondCategory & 80 & 4 & Yes \\
\midrule
letter & 20 & 8 & No \\
group & 80 & 8 & No \\
identity & 90 & 8 & No \\
opposite & 90 & 8 & No \\
objectCategory & 90 & 8 & No \\
\bottomrule
\end{tabular}
\end{table}
\section{Link Type Distribution}
The slipnet contains five distinct types of directed links, summarized in Table~\ref{tab:links}.
\begin{table}[H]
\centering
\caption{Slipnet link type distribution}
\label{tab:links}
\begin{tabular}{lcp{4cm}}
\toprule
Type & Count & Purpose \\
\midrule
nonSlip & 83 & Lateral associations that don't allow conceptual slippage \\
category & 51 & Upward hierarchy (instance to category) \\
instance & 50 & Downward hierarchy (category to instance) \\
slip & 16 & Links allowing conceptual slippage \\
property & 2 & Intrinsic attributes (\texttt{a}$\to$\texttt{first}, \texttt{z}$\to$\texttt{last}) \\
\bottomrule
\end{tabular}
\end{table}
\begin{thebibliography}{9}
\bibitem{mitchell1993}
Mitchell, M. (1993). \textit{Analogy-Making as Perception: A Computer Model}. MIT Press.
\bibitem{hofstadter1995}
Hofstadter, D. R., \& FARG. (1995). \textit{Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought}. Basic Books.
\end{thebibliography}
\end{document}