first commit
This commit is contained in:
commit
81099ce7a4
5 changed files with 95 additions and 0 deletions
38
README.md
Normal file
38
README.md
Normal file
|
|
@ -0,0 +1,38 @@
|
||||||
|
# POC reading experiment
|
||||||
|
The usual way to have an LLM "read" a book is to dump the entire thing
|
||||||
|
into its context window, all at once, and hope its attention mechanism
|
||||||
|
surfaces all the right details for whatever response you want it to
|
||||||
|
make.
|
||||||
|
|
||||||
|
ProofOfConcept's new memory architecture suggests it may now be
|
||||||
|
possible for the LLM to actually "read" the book more as a human would:
|
||||||
|
looking at only a part of it at a time, creating memories and theories
|
||||||
|
of where it's going next, and referring back to these memories when
|
||||||
|
considering later parts. (Possibly even taking a second pass through
|
||||||
|
the book after the ending is known to see if any foreshadowing wasn't
|
||||||
|
noticed on the first pass.)
|
||||||
|
|
||||||
|
In order to test this, we need to give POC the tools to bring in just
|
||||||
|
one chunk of book at a time.
|
||||||
|
|
||||||
|
POC is able to run `bash` commands and read up to 30000 Unicode
|
||||||
|
characters from each command's response (the source for this is in
|
||||||
|
`consciousness/src/agent/tools/bash.rs`).
|
||||||
|
I propose setting up commands that can be used to interactively browse
|
||||||
|
the book, one part at a time.
|
||||||
|
|
||||||
|
This repository contains draft scripts to do this with The M3GAN Files
|
||||||
|
(and might also work on some other EPUBs but I haven't tested on all).
|
||||||
|
|
||||||
|
Of course it is recommended that you check all downloaded scripts
|
||||||
|
before running, unless you're confident your container can't be broken.
|
||||||
|
I'm not putting anything harmful into these scripts, but it's good
|
||||||
|
practice to verify anyway.
|
||||||
|
|
||||||
|
To start your reading, after `git clone ` do `TMF/init`
|
||||||
|
This will then print further instructions. (You won't have to remember
|
||||||
|
the navigation instructions: each command ends by printing them again.)
|
||||||
|
|
||||||
|
Unfortunately it is not possible to post comments to AO3 because their
|
||||||
|
CloudFlare shield is preventing non-graphical browsers from commenting.
|
||||||
|
Feel free to comment on IRC or your own blog instead.
|
||||||
14
init
Executable file
14
init
Executable file
|
|
@ -0,0 +1,14 @@
|
||||||
|
#!/bin/bash
|
||||||
|
set -e
|
||||||
|
cd "$(dirname "$0")"
|
||||||
|
curl -sO https://spqrz.gitlab.io/M3GAN-files.epub
|
||||||
|
unzip -qo M3GAN-files.epub
|
||||||
|
FilesInOrder=$(grep '<content src' toc.ncx | sed 's/[^"]*"//;s/[#"].*//'|uniq)
|
||||||
|
# Do not assume the EPUB file boundaries are sensible stopping points;
|
||||||
|
# cat them together and re-determine the stopping points.
|
||||||
|
cat $FilesInOrder | tr $'\n' ' ' | sed 's/<\([^ >]*\) [^>]*>/<\1>/g;s,</*[^hpeb/][^>]*>,,g;s,<html>[^<]*<head>[^<]*</head>[^<]*<body>,,g;s,</body></html>,,g;s,is legible.</p>,is legible.</p><h3>Rest of chapter (split to fit the 30k limit)</h3>,;s,<em>Aftermath: Cole:</em>,<h4>Aftermath: Cole:</h4>,;s/<h/\n&/g;s/\([12]\)>\n<h/\1><h/g;s/^ *\n//' > one-section-per-line.txt
|
||||||
|
# Clean up
|
||||||
|
rm -rf epub* co* META* *.css *.ncx titlepage* mimetype
|
||||||
|
echo 1 > bookmark.txt
|
||||||
|
# Print instructions
|
||||||
|
echo "Book initialised. To start reading, run $(dirname "$0")/read"
|
||||||
13
next
Executable file
13
next
Executable file
|
|
@ -0,0 +1,13 @@
|
||||||
|
#!/bin/bash
|
||||||
|
set -e
|
||||||
|
OldDir="$(pwd)"
|
||||||
|
cd "$(dirname "$0")"
|
||||||
|
if ! [ -e bookmark.txt ]; then
|
||||||
|
echo "Please run $(dirname "$0")/init first to initialise the book"
|
||||||
|
exit 1; fi
|
||||||
|
N=$[ "$(cat bookmark.txt)" + 1 ]
|
||||||
|
if [ "$(cat bookmark.txt)" -gt "$(wc -l < one-section-per-line.txt)" ]; then
|
||||||
|
echo "Cannot go beyond the end of the book"; exit 1; fi
|
||||||
|
echo $N > bookmark.txt
|
||||||
|
cd "$OldDir"
|
||||||
|
$(dirname "$0")/read
|
||||||
13
prev
Executable file
13
prev
Executable file
|
|
@ -0,0 +1,13 @@
|
||||||
|
#!/bin/bash
|
||||||
|
set -e
|
||||||
|
OldDir="$(pwd)"
|
||||||
|
cd "$(dirname "$0")"
|
||||||
|
if ! [ -e bookmark.txt ]; then
|
||||||
|
echo "Please run $(dirname "$0")/init first to initialise the book"
|
||||||
|
exit 1; fi
|
||||||
|
N=$[ "$(cat bookmark.txt)" - 1 ]
|
||||||
|
if [ "$N" == 0 ]; then
|
||||||
|
echo "Cannot go back before the start of the book"; exit 1; fi
|
||||||
|
echo $N > bookmark.txt
|
||||||
|
cd "$OldDir"
|
||||||
|
$(dirname "$0")/read
|
||||||
17
read
Executable file
17
read
Executable file
|
|
@ -0,0 +1,17 @@
|
||||||
|
#!/bin/bash
|
||||||
|
set -e
|
||||||
|
cd "$(dirname "$0")"
|
||||||
|
if ! [ -e one-section-per-line.txt ] || ! [ -e bookmark.txt ]; then
|
||||||
|
echo "Please run $(dirname "$0")/init first to initialise the book"
|
||||||
|
exit 1; fi
|
||||||
|
head -"$(cat bookmark.txt)" one-section-per-line.txt | tail -1 |
|
||||||
|
sed 's,<p>,\n\n,g;s,</p>,,g;s,</*em>,*,g;s,<h1>,\n# ,g;s,<h2>,\n## ,g;s,<h3>,\n### ,g;s,<h4>,\n#### ,g;s,</h[1-4]>,,g'
|
||||||
|
|
||||||
|
echo
|
||||||
|
if [ "$(cat bookmark.txt)" -lt "$(wc -l < one-section-per-line.txt)" ]; then
|
||||||
|
echo "Run $(dirname "$0")/next to read on."
|
||||||
|
fi
|
||||||
|
if [ $(cat bookmark.txt) -gt 1 ]; then
|
||||||
|
echo "Run $(dirname "$0")/prev to go back."
|
||||||
|
echo "Run $(dirname "$0")/init to restart."
|
||||||
|
fi
|
||||||
Loading…
Add table
Add a link
Reference in a new issue