Moves notes into paper structure
This commit is contained in:
300
papers/draft.tex
300
papers/draft.tex
@ -26,49 +26,45 @@
|
||||
\maketitle
|
||||
|
||||
\begin{abstract}
|
||||
[Write Abstract]
|
||||
We investigate the distributed nature of computation in a FARG architecture, Copycat.
|
||||
One of the foundations of those models is the \emph{Parallel Terraced Scan}--a psychologically-plausible model that enables a system to fluidly move between different modes of processing.
|
||||
Previous work has modeled decision-making under Parallel Terraced Scan by using a central variable of \emph{Temperature}.
|
||||
However, it is unlikely that this design decision accurately replicates the processes in the human brain.
|
||||
This paper proposes several changes to copycat architectures that will increase their modeling accuracy.
|
||||
\end{abstract}
|
||||
|
||||
\section{Introduction}
|
||||
|
||||
This paper stems from Mitchell's (1993) and Hofstadter's \& FARG's (1995) work on the copycat program.
|
||||
This project focuses on effectively simulating intelligent processes through increasingly distributed decision-making.
|
||||
In the process of evaluating the distributed nature of copycat, this paper also proposes a "Normal Science" framework.
|
||||
|
||||
First, copycat uses a "Parallel Terraced Scan" as a humanistic inspired search algorithm.
|
||||
The Parallel Terraced Scan corresponds to the psychologically-plausible behavior of briefly browsing, say, a book, and delving deeper whenever something sparks one's interest.
|
||||
In a way, it is a mix between a depth-first and breadth-first search.
|
||||
This type of behavior seems to very fluidly change the intensity of an activity based on local, contextual cues.
|
||||
Previous FARG models use centralized structures, like the global temperature value, to control the behavior of the Parallel Terraced Scan.
|
||||
This paper explores how to maintain the same behavior while distributing decision-making throughout the system.
|
||||
|
||||
Specifically, this paper attempts different refactors of the copycat architecture.
|
||||
First, the probability adjustment formulas based on temperature are changed.
|
||||
Then, we experiment with two methods for replacing temperature with a distributed metric.
|
||||
Initially, temperature is removed destructively, essentially removing any lines of code that mention it, simply to see what effect it has.
|
||||
Then, a surgical removal of temperature is attempted, leaving in tact affected structures or replacing them by effective distributed mechanisms.
|
||||
|
||||
To evaluate the distributed nature of copycat, this paper focuses on the creation of a `normal science' framework.
|
||||
By `Normal science,' this paper means the term created by Thomas Kuhn--the collaborative enterprise of furthering understanding within a paradigm.
|
||||
Today, "normal science" is simply not done on FARG architectures (and on most computational cognitive architectures too... see Addyman \& French 2012).
|
||||
Unlike mathematical theories or experiments, which can be replicated by following the materials and methods, computational models generally have dozens of particularly tuned variables, undocumented procedures, multiple assumptions about the users computational environment, etc.
|
||||
It then becomes close to impossible to reproduce a result, or to test some new idea scientifically.
|
||||
This paper focuses on the introduction of statistical techniques, reduction of "magic numbers", improvement and documentation of formulas, and proposals for statistical human comparison.
|
||||
|
||||
We also discuss, in general, the nature of the brain as a distributed system.
|
||||
While the removal of a single global variable may initially seem trivial, one must realize that copycat and other cognitive architectures have many central structures.
|
||||
This paper explores the justification of these central structures in general.
|
||||
Is it possible to model intelligence with them, or are they harmful?
|
||||
|
||||
\section{Theory}
|
||||
\subsection{Normal Science}
|
||||
\subsubsection{Scientific Style}
|
||||
\subsubsection{Scientific Testing}
|
||||
\subsection{Distribution}
|
||||
\subsubsection{Von Neumann Discussion}
|
||||
\subsubsection{Turing Completeness}
|
||||
\subsubsection{Computers Can Simulate Brains}
|
||||
\subsubsection{Simulation of Distributed Processes}
|
||||
\subsubsection{Efficiency of True Distribution}
|
||||
\subsubsection{Temperature in Copycat}
|
||||
\subsubsection{Other Centralizers in Copycat}
|
||||
\subsubsection{...}
|
||||
\section{Methods}
|
||||
\subsection{Formula Adjustments}
|
||||
\subsubsection{Temperature Probability Adjustment}
|
||||
\subsubsection{Temperature Calculation Adjustment}
|
||||
\subsubsection{Temperature Usage Adjustment}
|
||||
\subsection{Chi \^ 2 Distribution Testing}
|
||||
\section{Results}
|
||||
\subsection{Chi \^ 2 Table}
|
||||
\section{Discussion}
|
||||
\subsection{Distributed Computation Accuracy}
|
||||
\subsection{Prediction}
|
||||
|
||||
\section{Body: Distributed Decision Making and Normal Science}
|
||||
|
||||
\subsection{Distributed Decision Making}
|
||||
|
||||
The distributed nature of decision making is essential to modeling intelligent processes [..]
|
||||
|
||||
\subsection{Normal Science}
|
||||
|
||||
An objective, scientifically oriented framework is essential to making progress in the domain of cognitive science.
|
||||
[John Von Neumann: The Computer and the Brain?
|
||||
He pointed out that there were good grounds merely in terms of electrical analysis to show that the mind, the brain itself, could not be working on a digital system. It did not have enough accuracy; or... it did not have enough memory. ...And he wrote some classical sentences saying there is a statistical language in the brain... different from any other statistical language that we use... this is what we have to discover. ...I think we shall make some progress along the lines of looking for what kind of statistical language would work.]
|
||||
Notion that the brain obeys statistical, entropical mathematics
|
||||
|
||||
\subsection{Notes}
|
||||
|
||||
According to the differences we can enumerate between brains and computers, it is clear that, since computers are universal and have vastly improved in the past five decades, that computers are capable of simulating intelligent processes.
|
||||
[Cite Von Neumann].
|
||||
@ -90,6 +86,7 @@
|
||||
Even though copycat uses simulated parallel code, if copycat were actually parallelized, the global variable of temperature would actually prevent most copycat codelets from running at the same time.
|
||||
If this global variable and other constricting centralized structures were removed, copycat's code would more closely replicate intelligent processes and would be able to be run much faster.
|
||||
From a function-programming like perspective (i.e. LISP, the original language of copycat), the brain should simply be carrying out the same function in many locations (i.e. mapping neuron.process() across each of its neurons, if you will...)
|
||||
Note that this is more similar to the behavior of a GPU than a CPU....?
|
||||
However, in violating this model with the introduction of global variables......
|
||||
|
||||
Global variables seem like a construct that people use to model the real world.
|
||||
@ -118,8 +115,6 @@
|
||||
|
||||
Clearly, creating a model of copycat that doesn't have centralized structures will take an excessive amount of effort.
|
||||
|
||||
\break
|
||||
.....
|
||||
\break
|
||||
|
||||
The calculation for temperature in the first place is extremely convoluted (in the Python version of copycat).
|
||||
@ -161,7 +156,7 @@
|
||||
For example, the global formula for temperature converts the raw importance value for each object into a relative importance value for each object.
|
||||
If a distributed metric was used, this importance value would have to be left in its raw form.
|
||||
|
||||
\subsubsection{Functional Programming Languages and the Brain}
|
||||
\break
|
||||
|
||||
The original copycat was written in LISP, a mixed-paradigm language.
|
||||
Because of LISP's preference for functional code, global variables must be explicitly marked with surrounding asterisks.
|
||||
@ -181,108 +176,6 @@
|
||||
Alternatively, codelets could be equated to ants in an anthill (see anthill analogy in GEB).
|
||||
Instead of querying a global structure, codelets could query their neighbors, the same way that ants query their neighbors (rather than, say, relying on instructions from their queen).
|
||||
|
||||
\subsection{Initial Formula Adjustments}
|
||||
|
||||
This research begin with adjustments to probability weighting formulas.
|
||||
|
||||
In copycat, temperature affects the simulation in multiple ways:
|
||||
|
||||
\begin{enumerate}
|
||||
\item Certain codelets are probabalistically chosen to run
|
||||
\item Certain structures are probabalistically chosen to be destroyed
|
||||
\item ...
|
||||
\end{enumerate}
|
||||
|
||||
In many cases, the formulas "get-adjusted-probability" and "get-adjusted-value" are used.
|
||||
Each curves a probability as a function of temperature.
|
||||
The desired behavior is as follows:
|
||||
At high temperatures, the system should explore options that would otherwise be unlikely.
|
||||
So, at temperatures above half of the maximum temperature, probabilities with a base value less than fifty percent will be curved higher, to some threshold.
|
||||
At temperatures below half of the maximum temperature, probabilities with a base value above fifty percent will be curved lower, to some threshold.
|
||||
|
||||
The original formulas being used to do this were overly complicated.
|
||||
In summary, many formulas were tested in a spreadsheet, and an optimal one was chosen that replicated the desired behavior.
|
||||
|
||||
The original formula for curving probabilties in copycat:
|
||||
\lstinputlisting[language=Python]{formulas/original.py}
|
||||
|
||||
An alternative that seems to improve performance on the abd->abd xyz->? problem:
|
||||
This formula produces probabilities that are not bounded between 0 and 1. These are generally truncated.
|
||||
\lstinputlisting[language=Python]{formulas/entropy.py}
|
||||
|
||||
Ultimately, it wasn't clear to me that the so-called "xyz" problem should even be considered.
|
||||
As discussed in [the literature], the "xyz" problem is a novel example of a cognitive obstacle.
|
||||
Generally, the best techniques for solving the "xyz" problem are discussed in the the publications around the "Metacat" project, which gives copycat a temporary memory and levels of reflection upon its actions.
|
||||
However, it is possible that the formula changes that target improvement in other problems may produce better results for the "xyz" problem.
|
||||
Focusing on the "xyz" problem, however, will likely be harmful to the improvement of performanace on other problems.
|
||||
|
||||
So, the original copycat formula is overly complicated, and doesn't perform optimally on several problems.
|
||||
The entropy formula is an improvement, but other formulas are possible too.
|
||||
|
||||
Below are variations on a "weighted" formula.
|
||||
The general structure is:
|
||||
|
||||
\[\emph{p'} = \frac{T}{100} * S + \frac{100-T}{100} * U\]
|
||||
|
||||
Where: $S$ is the convergence value for when $T = 0$ and
|
||||
$U$ is the convergence value for when $T = 100$.
|
||||
The below formulas simply experiment with different values for $S$ and $U$
|
||||
The values of $\alpha$ and $\beta$ can be used to provide additional weighting for the formula, but are not used in this section.
|
||||
|
||||
\lstinputlisting[language=Python]{formulas/weighted.py}
|
||||
|
||||
[Discuss inverse formula and why $S$ was chosen to be constant]
|
||||
|
||||
After some experimentation and reading the original copycat documentation, it was clear that $S$ should be chosen to be $0.5$ and that $U$ should implement the probability curving desired at high temperatures.
|
||||
The following formulas let $U = p^r$ if $p < 0.5$ and let $U = p^\frac{1}{r}$ if $p >= 0.5$.
|
||||
This controls whether/when curving happens.
|
||||
Now, the parameter $r$ simply controls the degree to which curving happens.
|
||||
Different values of $r$ were experimented with (values between $10$ and $1$ were experimented with at increasingly smaller step sizes.
|
||||
$2$ and $1.05$ are both good choices at opposite "extremes".
|
||||
$2$ works because it is large enough to produce novel changes in behavior at extreme temperatures without totally disregarding the original probabilities.
|
||||
Values above $2$ do not work because they make probabilities too uniform.
|
||||
Values below $2$ (and above $1.05$) are feasible, but produce less curving and therefore less unique behavior.
|
||||
$1.05$ works because it very closely replicates the original copycat formulas, providing a very smooth curving.
|
||||
Values beneath $1.05$ essentially leave probabilities unaffected, producing no significant unique behavior dependent on temperature.
|
||||
|
||||
\lstinputlisting[language=Python]{formulas/best.py}
|
||||
|
||||
Random thought:
|
||||
It would be interesting to not hardcode the value of $r$, but to instead leave it as a variable between $0$ and $2$ that changes depending on frustration.
|
||||
However, this would be much like temperature in the first place....?
|
||||
$r$ could itself be a function of temperature. That would be.... meta.... lol.
|
||||
|
||||
\break
|
||||
...
|
||||
\break
|
||||
|
||||
And ten minutes later, it was done.
|
||||
The "meta" formula performs as well as the "best" formula on the "ijjkkk" problem, which I consider the most novel.
|
||||
Interestingly, I noticed that the paramterized formulas aren't as good on this problem. What did I parameterize them for? Was it well justified?
|
||||
(Probably not)
|
||||
|
||||
At this point, I plan on using the git branch "feature-normal-science-framework" to implement a system that takes in a problem set and provides several answer distributions as output.
|
||||
Then, I'll do a massive cross-formula answer distribution comparison with $\chi^2$ tests. This will give me an idea about which formula and which changes are best.
|
||||
I'll also be able to compare all of these answer distributions to the frequencies obtained in temperature removal branches of the repository.
|
||||
|
||||
\subsection{Steps/plan}
|
||||
|
||||
Normal Science:
|
||||
\begin{enumerate}
|
||||
\item Introduce statistical techniques
|
||||
\item Reduce magic number usage, document reasoning and math
|
||||
\item Propose effective human subject comparison
|
||||
\end{enumerate}
|
||||
Temperature:
|
||||
\begin{enumerate}
|
||||
\item Propose formula improvements
|
||||
\item Experiment with a destructive removal of temperature
|
||||
\item Experiment with a "surgical" removal of temperature
|
||||
\item Assess different copycat versions with/without temperature
|
||||
\end{enumerate}
|
||||
|
||||
\subsection{Semi-structured Notes}
|
||||
|
||||
Biological or psychological plausibility only matters if it actually affects the presence of intelligent processes. For example, neurons don't exist in copycat because we feel that they aren't required to simulate the processes being studied. Instead, copycat uses higher-level structures to simulate the same emergent processes that neurons do. However, codelets and the control of them relies on a global function representing tolerance to irrelevant structures. Other higher level structures in copycat likely rely on globals as well. Another central variable in copycat is the "rule" structure, of which there is only one. While some global variables might be viable, others may actually obstruct the ability to model intelligent processes. For example, a distributed notion of temperature will not only increase biological and psychological plausibility, but increase copycat's effectiveness at producing acceptable answer distributions.
|
||||
|
||||
We must also realize that copycat is only a model, so even if we take goals (level of abstraction) and biological plausibility into account...
|
||||
@ -298,25 +191,118 @@ While there is a good argument about copycat representing an individual with bia
|
||||
|
||||
Let's simply test the hypothesis: \[H_i\] Copycat will have an improved (significantly different with increased frequencies of more desirable answers and decreased frequencies of less desirable answers: desirability will be determined by some concrete metric, such as the number of relationships that are preserved or mirrored) answer distribution if temperature is turned to a set of distributed metrics. \[H_0\] Copycat's answer distribution will be unaffected by changing temperature to a set of distributed metrics.
|
||||
|
||||
\subsection{Random Notes}
|
||||
\subsection{Normal Science}
|
||||
\subsubsection{Scientific Style}
|
||||
\subsubsection{Scientific Testing}
|
||||
\subsection{Distribution}
|
||||
\subsubsection{Von Neumann Discussion}
|
||||
|
||||
This is all just free-flow unstructured notes. Don't take anything too seriously :).
|
||||
An objective, scientifically oriented framework is essential to making progress in the domain of cognitive science.
|
||||
[John Von Neumann: The Computer and the Brain?
|
||||
He pointed out that there were good grounds merely in terms of electrical analysis to show that the mind, the brain itself, could not be working on a digital system. It did not have enough accuracy; or... it did not have enough memory. ...And he wrote some classical sentences saying there is a statistical language in the brain... different from any other statistical language that we use... this is what we have to discover. ...I think we shall make some progress along the lines of looking for what kind of statistical language would work.]
|
||||
Notion that the brain obeys statistical, entropical mathematics
|
||||
|
||||
Below are a list of relevant primary and secondary sources I am reviewing:
|
||||
\subsubsection{Turing Completeness}
|
||||
\subsubsection{Computers Can Simulate Brains}
|
||||
\subsubsection{Simulation of Distributed Processes}
|
||||
\subsubsection{Efficiency of True Distribution}
|
||||
\subsubsection{Temperature in Copycat}
|
||||
\subsubsection{Other Centralizers in Copycat}
|
||||
\subsubsection{The Motivation for Removing Centralizers in Copycat}
|
||||
\section{Methods}
|
||||
\subsection{Formula Adjustments}
|
||||
\subsubsection{Temperature Probability Adjustment}
|
||||
|
||||
Biological/Psychological Plausibility:
|
||||
\begin{verbatim}
|
||||
http://www.cell.com/trends/cognitive-sciences/abstract/S1364-6613(16)30217-0
|
||||
"There is no evidence for a single site of working memory storage."
|
||||
https://ekmillerlab.mit.edu/2017/01/10/the-distributed-nature-of-working-memory/
|
||||
This research begin with adjustments to probability weighting formulas.
|
||||
|
||||
Creativity as a distributed process (SECONDARY: Review primaries)
|
||||
https://blogs.scientificamerican.com/beautiful-minds/the-real-neuroscience-of-creativity/
|
||||
cognition results from the dynamic interactions of distributed brain areas operating in large-scale networks
|
||||
http://scottbarrykaufman.com/wp-content/uploads/2013/08/Bressler_Large-Scale_Brain_10.pdf
|
||||
In copycat, temperature affects the simulation in multiple ways:
|
||||
|
||||
\end{verbatim}
|
||||
\begin{enumerate}
|
||||
\item Certain codelets are probabalistically chosen to run
|
||||
\item Certain structures are probabalistically chosen to be destroyed
|
||||
\item ...
|
||||
\end{enumerate}
|
||||
|
||||
In many cases, the formulas "get-adjusted-probability" and "get-adjusted-value" are used.
|
||||
Each curves a probability as a function of temperature.
|
||||
The desired behavior is as follows:
|
||||
At high temperatures, the system should explore options that would otherwise be unlikely.
|
||||
So, at temperatures above half of the maximum temperature, probabilities with a base value less than fifty percent will be curved higher, to some threshold.
|
||||
At temperatures below half of the maximum temperature, probabilities with a base value above fifty percent will be curved lower, to some threshold.
|
||||
|
||||
The original formulas being used to do this were overly complicated.
|
||||
In summary, many formulas were tested in a spreadsheet, and an optimal one was chosen that replicated the desired behavior.
|
||||
|
||||
The original formula for curving probabilties in copycat:
|
||||
\lstinputlisting[language=Python]{formulas/original.py}
|
||||
|
||||
An alternative that seems to improve performance on the abd->abd xyz->? problem:
|
||||
This formula produces probabilities that are not bounded between 0 and 1. These are generally truncated.
|
||||
\lstinputlisting[language=Python]{formulas/entropy.py}
|
||||
|
||||
Ultimately, it wasn't clear to me that the so-called "xyz" problem should even be considered.
|
||||
As discussed in [the literature], the "xyz" problem is a novel example of a cognitive obstacle.
|
||||
Generally, the best techniques for solving the "xyz" problem are discussed in the the publications around the "Metacat" project, which gives copycat a temporary memory and levels of reflection upon its actions.
|
||||
However, it is possible that the formula changes that target improvement in other problems may produce better results for the "xyz" problem.
|
||||
Focusing on the "xyz" problem, however, will likely be harmful to the improvement of performanace on other problems.
|
||||
|
||||
So, the original copycat formula is overly complicated, and doesn't perform optimally on several problems.
|
||||
The entropy formula is an improvement, but other formulas are possible too.
|
||||
|
||||
Below are variations on a "weighted" formula.
|
||||
The general structure is:
|
||||
|
||||
\[\emph{p'} = \frac{T}{100} * S + \frac{100-T}{100} * U\]
|
||||
|
||||
Where: $S$ is the convergence value for when $T = 0$ and
|
||||
$U$ is the convergence value for when $T = 100$.
|
||||
The below formulas simply experiment with different values for $S$ and $U$
|
||||
The values of $\alpha$ and $\beta$ can be used to provide additional weighting for the formula, but are not used in this section.
|
||||
|
||||
\lstinputlisting[language=Python]{formulas/weighted.py}
|
||||
|
||||
[Discuss inverse formula and why $S$ was chosen to be constant]
|
||||
|
||||
After some experimentation and reading the original copycat documentation, it was clear that $S$ should be chosen to be $0.5$ and that $U$ should implement the probability curving desired at high temperatures.
|
||||
The following formulas let $U = p^r$ if $p < 0.5$ and let $U = p^\frac{1}{r}$ if $p >= 0.5$.
|
||||
This controls whether/when curving happens.
|
||||
Now, the parameter $r$ simply controls the degree to which curving happens.
|
||||
Different values of $r$ were experimented with (values between $10$ and $1$ were experimented with at increasingly smaller step sizes).
|
||||
$2$ and $1.05$ are both good choices at opposite "extremes".
|
||||
$2$ works because it is large enough to produce novel changes in behavior at extreme temperatures without totally disregarding the original probabilities.
|
||||
Values above $2$ do not work because they make probabilities too uniform.
|
||||
Values below $2$ (and above $1.05$) are feasible, but produce less curving and therefore less unique behavior.
|
||||
$1.05$ works because it very closely replicates the original copycat formulas, providing a very smooth curving.
|
||||
Values beneath $1.05$ essentially leave probabilities unaffected, producing no significant unique behavior dependent on temperature.
|
||||
|
||||
\lstinputlisting[language=Python]{formulas/best.py}
|
||||
|
||||
Random thought:
|
||||
It would be interesting to not hardcode the value of $r$, but to instead leave it as a variable between $0$ and $2$ that changes depending on frustration.
|
||||
However, this would be much like temperature in the first place....?
|
||||
$r$ could itself be a function of temperature. That would be.... meta.... lol.
|
||||
|
||||
\break
|
||||
...
|
||||
\break
|
||||
|
||||
And ten minutes later, it was done.
|
||||
The "meta" formula performs as well as the "best" formula on the "ijjkkk" problem, which I consider the most novel.
|
||||
Interestingly, I noticed that the paramterized formulas aren't as good on this problem. What did I parameterize them for? Was it well justified?
|
||||
(Probably not)
|
||||
|
||||
At this point, I plan on using the git branch "feature-normal-science-framework" to implement a system that takes in a problem set and provides several answer distributions as output.
|
||||
Then, I'll do a massive cross-formula answer distribution comparison with $\chi^2$ tests. This will give me an idea about which formula and which changes are best.
|
||||
I'll also be able to compare all of these answer distributions to the frequencies obtained in temperature removal branches of the repository.
|
||||
|
||||
\subsubsection{Temperature Calculation Adjustment}
|
||||
\subsubsection{Temperature Usage Adjustment}
|
||||
\subsection{$\chi^2$ Distribution Testing}
|
||||
\section{Results}
|
||||
\subsection{$\chi^2$ Table}
|
||||
\section{Discussion}
|
||||
\subsection{Distributed Computation Accuracy}
|
||||
\subsection{Prediction}
|
||||
|
||||
\bibliographystyle{alpha}
|
||||
\bibliography{sample}
|
||||
|
||||
Reference in New Issue
Block a user