copycat/papers/draft.tex

\documentclass[a4paper]{article}

%% Language and font encodings
\usepackage[english]{babel}
\usepackage[utf8x]{inputenc}
\usepackage[T1]{fontenc}

%% Sets page size and margins
\usepackage[a4paper,top=3cm,bottom=2cm,left=3cm,right=3cm,marginparwidth=1.75cm]{geometry}

%% Useful packages
\usepackage{listings}
\usepackage{amsmath}
\usepackage{graphicx}
\usepackage[colorinlistoftodos]{todonotes}
\usepackage[colorlinks=true, allcolors=blue]{hyperref}

\definecolor{lightgrey}{rgb}{0.9, 0.9, 0.9}
\lstset{ %
    backgroundcolor=\color{lightgrey}}

\title{Distributed Behavior in a Fluid Analogy Architecture}
\author{Lucas Saldyt, Alexandre Linhares}

\begin{document}
\maketitle

\begin{abstract}
This project focuses on effectively simulating intelligent processes behind fluid analogy making through increasingly distributed decision-making.
Specifically, the humanistic search algorithm, the Parallel Terraced Scan, is modified and tested.
[Enumerate changes made to the Parallel Terraced Scan]
The produced answer distributions of each resulting branch of the copycat software were then cross-compared with a Pearson's $\chi^2$ distribution test.
Based on this cross-comparison, [Result Summary].
\end{abstract}

\section{Introduction}

This paper stems from Melanie Mitchell's (1993) and Douglas Hofstadter's \& FARG's (1995) work on the copycat program.
This project focuses on effectively simulating intelligent processes through increasingly distributed decision-making.
In the process of evaluating the distributed nature of copycat, this paper also proposes a "Normal Science" framework.
Copycat's behavior is based on the "Parallel Terraced Scan," a humanistic-inspired search algorithm.
The Parallel Terraced Scan is, roughly, a mix between a depth-first and breadth-first search.
To switch between modes of search, FARG models use the global variable \emph{temperature}.
\emph{Temperature} is ultimately a function of the workspace rule strength then the importance and happiness of each workspace structure.
Therefore, \emph{temperature} is a global metric, but is sometimes used to make local decisions.
Since copycat means to simulate intelligence in a distributed nature, it should make use of local metrics for local decisions.
This paper explores the extent to which copycat's behavior can be improved through distributing decision making.

Specifically, the effects of temperature are first tested.
Then, once the statistically significant effects of temperature are understood, work is done to replace temperature with a distributed metric.
Initially, temperature is removed destructively, essentially removing any lines of code that mention it, simply to see what effect it has.
Then, a surgical removal of temperature is attempted, leaving in tact affected structures or replacing them by effective distributed mechanisms.

To evaluate the distributed nature of copycat, this paper focuses on the creation of a `normal science' framework.
By `Normal science,' this paper means the term created by Thomas Kuhn--the collaborative enterprise of furthering understanding within a paradigm.
Today, "normal science" is simply not done on FARG architectures (and on most computational cognitive architectures too... see Addyman \& French 2012).
Unlike mathematical theories or experiments, which can be replicated by following the materials and methods, computational models generally have dozens of particularly tuned variables, undocumented procedures, multiple assumptions about the users computational environment, etc.
It then becomes close to impossible to reproduce a result, or to test some new idea scientifically.
This paper focuses on the introduction of statistical techniques, reduction of "magic numbers", improvement and documentation of formulas, and proposals for statistical human comparison.
Each of these methods will reduce the issues with scientific inquiry in the copycat architecture.

To evaluate two different versions of copycat, the resulting answer distributions from a problem are compared with a Pearson's $\chi^2$ test.
Using this, the degree of difference between distributions can be calculated.
Then, desirability of answer distributions can be found as well, and the following hypotheses can be tested:

\begin{enumerate}
    \item $H_i$ Centralized variables constrict copycat's ability.
    \item $H_0$ Centralized variables either improve or have no effect on copycat's ability.
\end{enumerate}

\subsection{Objective}

    The aim of this paper is to create and test a new version of the copycat software that makes effective use of a multiple level description.
    Until now, copycat has made many of its decisions based on a global variable, \emph{temperature}.
    ...

\subsection{Theory}

    Since computers are universal and have vastly improved in the past five decades, it is clear that computers are capable of simulating intelligent processes.
    [Cite Von Neumann].
    The primary obstacle blocking strong A.I. is \emph{comprehension} of intelligent processes.
    Once the brain is truly understood, writing software that emulates intelligence will be a (relatively) simple engineering task when compared to understanding the brain.

    In making progress towards understanding the brain fully, models must remain true to what is already known about intelligent processes.
    Outside of speed, the largest difference between the computer and the brain is the distributed nature of computation.
    Specifically, our computers as they exist today have central processing units, where literally all of computation happens.
    Brains have some centralized structures, but certainly no single central location where all processing happens.
    Luckily, the speed advantage and universality of computers makes it possible to simulate the distributed behavior of the brain.
    However, the software that is meant to emulate the behavior of the brain must be programmed with concern for this distributed nature.

    This distribution is more of a design issue than a speed issue.
    Making copycat truly parallel would only provide marginal performance gains.

    It is clear from basic classical psychology that the brain contains some centralized structures.
    For example, Broca's area and Wernicke's area are specialized for linguistic input and output.
    Another great example is the hippocampi.
    If any of these specialized chunks of brain are surgically removed, for instance, then performing certain tasks becomes impossible.
    To some extent, the same is true for copycat.
    For example, removing the ability to update the workspace would be \emph{*roughly*} equivalent to removing both hippocampi from a human.
    However, replacing the centralized structure of temperature with distributed multi-level metrics may improve copycat's ability to solve fluid analogy problems.

    %% Editing marker: stopped here 2:22 Tuesday, December 5th, 2017

    Other structures in copycat, like the workspace itself, or the coderack, are also centralized.
    Arguably, these centralized structures are not constraining.
    Still, their unifying effect should be taken into account.
    For example, the workspace must be atomic, just like centralized structures in the brain, like the hippocampi, must also be atomic.

    From a function-programming perspective (i.e. LISP, the original language of copycat), the brain should simply be carrying out the same function in many locations (i.e. mapping neuron.process() across each of its neurons, if you will...)
    Note that this is more similar to the behavior of a GPU than a CPU.
    However, this model doesn't work when code has to synchronize to access global variables.

    If copycat can be run such that -- during the majority of the program's runtime -- codelets may actually execute at the same time (without pausing to access globals), then it will much better replicate the human brain.

    Convolution in the temperature calculation is \emph{unnecessary}.
    Ideally, a future version of copycat, or an underlying FARG architecure will remove this convolution, and make temperature calculation simpler, streamlined, documented, understandble.

    A global description of the system is, at times, potentially useful.
    However, in summing together the values of each workspace object, information is lost regarding which workspace objects are offending.
    In general, the changes that occur will eventually be object-specific.
    So, it seems to me that going from object-specific descriptions to a global description back to an object-specific action is a waste of time.
    I don't think that a global description should be \emph{obliterated} (removed 100\%).
    I just think that a global description should be reserved for when global actions are taking place.
    For example, when deciding that copycat has found a satisfactory answer, a global description should be used, because deciding to stop copycat is a global action.
    However, when deciding to remove a particular structure, a global description should not be used, because removing a particular offending structure is NOT a global action.

    On the other hand (I've never met a one-handed researcher), global description has some benefits.
    For example, the global formula for temperature converts the raw importance value for each object into a relative importance value for each object.
    If a distributed metric was used, this importance value would have to be left in its raw form.

%%    Alternatively, codelets could be equated to ants in an anthill (see anthill analogy in GEB).
%%    Instead of querying a global structure, codelets could query their neighbors, the same way that ants query their neighbors (rather than, say, relying on instructions from their queen).
%%

\section{Methods}

    \subsection{Formula Documentation}

        Many of copycat's formulas use magic numbers and marginally documented formulas.
        This is less of a problem in the original LISP code, and more of a problem in the twice-translated Python3 version of copycat.
        However, even in copycat's LISP implementation, formulas have redundant parameters.
        For example, if given two formulas: $f(x) = 2x$ and $g(x) = x^2$, a single formula can be written $h(x) = 2x^2$ (The composed and then simplified formula).
        Ideally, the adjustment formulas within copycat could be reduced in the same way, so that much of copycat's behavior rested on a handful of parameters in a single location, as opposed to more than ten parameters scattered throughout the repository.
        Also, often parameters in copycat have little statistically significant effect.
        As will be discussed in the $\chi^2$ distribution testing section, any copycat formulas without a significant effect will be hard-removed.

    \subsection{Testing the Effect of Temperature}

        To begin with, the existing effect of the centralizing variable, temperature, will be analyzed.
        As the probability adjustment formulas are used by default, very little effect is had.
        To evaluate the effect of temperature-based probability adjustment formulas, a spreadsheet was created that showed a color gradient based on each formula.
        [Insert spreadsheet embeds]
        Then, to evaluate the effect of different temperature usages, separate usages of temperature were individually removed and answer distributions were compared statistically (See section: $\chi^2$ Distribution Testing).

    \subsection{Temperature Probability Adjustment}

        Once the effect of temperature was evaluated, new temperature-based probability adjustment formulas were proposed that each had a significant effect on the answer distributions produced by copycat.
        Instead of representing a temperature-less, decentralized version of copycat, these formulas are meant to represent the centralized branch of copycat.
        [Insert formula write-up]

    \subsection{Temperature Usage Adjustment}

        Once the behavior based on temperature was well understood, experimentation was made with hard and soft removals of temperature and features that depend on it.
        For example, first probability adjustments based on temperature were removed.
        Then, the new branch of copycat was $\chi^2$ compared against the original branch.
        Then, breaker-fizzling, an independent temperature-related feature was removed from the original branch and another $\chi^2$ comparison was made.
        The same process was repeated for non-probability temperature-based adjustments, and then for the copycat stopping decision.
        Then, a temperature-less branch of the repository was created and tested.
        Then, a branch of the repostory was created that removed probability adjustments, value adjustments, and fizzling, and made all other temperature-related operations use a dynamic temperature calculation.
        All repository branches were then cross compared using a $\chi^2$ distribution test.

    \subsection{$\chi^2$ Distribution Testing}

        To test each different branch of the repository, a scientific framework was created.
        Each run of copycat on a particular problem produces a distribution of answers.
        Distributions of answers can be compared against one another with a (Pearson's) $\chi^2$ distribution test.
        [Insert $\chi^2$ formula]
        [Insert $\chi^2$ calculation code snippets]

    \subsection{Effectiveness Definition}

        Quantitatively evaluating the effectiveness of a cognitive architecture is difficult.
        However, for copycat specifically, effectiveness can be defined as a function of the frequency of desirable answers and equivalently as the inverse frequency of undesirable answers.
        Since answers are desirable to the extent that they respect the original transformation of letter sequences, desirability can also be approximated by a concrete metric.
        A simple metric for desirability is simply the existing temperature formula, or some variant of it.
        So, a given version of copycat is quantitatively better if it produces lower-temperature answers more frequently.
        However, recognizing lower-quality answers is also a sign of intelligence.
        So, the extent to which copycat provides poor answers at low frequency and low desirability could be accounted for as well.
        Arguably, though, copycat isn't explicitly programmed to do this.
        For simplicity, desirability will be measured as the frequency of lower-temperature answers.

        Luckily, the definition for desirability of answer distributions is modular, such that each branch of copycat could be evaluated for answer desirability on each separate problem.

\section{Results}

    \subsection{Cross $\chi^2$ Table}

        The below table summarizes the results of comparing each copycat-variant's distribution with each other copycat-variant.
        [Insert cross $\chi^2$ table]

\section{Discussion}

    \subsection{Distributed Computation Accuracy}

        [Summary of introduction, elaboration based on results]

    \subsection{Prediction}

        Even though imperative, serial, centralized code is turing complete just like functional, parallel, distributed code, I predict that the most progressive cognitive architectures of the future will be created using functional programming languages that run distributedly and in true parallel.
        I also predict that, eventually, distributed code will be run on hardware closer to the architecture of a GPU than of a CPU.
        Arguably, the brain is more similar to a GPU than a CPU given its distributed nature.

\bibliographystyle{alpha}
\bibliography{sample}

\end{document}