Diary – Part Ten–F
October 2025 – March 2026
Linas Vepštas
Abstract
Unlike parts one through nine in this series, this one is not really about the language–learning effort. It is instead a private diary; a continuation of Part Ten–E, which got over–long. It is not curated for human consumption; I am making the assumption that no human being will ever actually read this. Thus, it is filled with random stuff I feel like writing. Some of it is very personal, some of it is nonsense. Mostly, I am finding that the act of writing helps otherwise vague and scattered thoughts quantum–collapse into a more coherent form, where I can examine them, like a dead butterfly pinned down in a display case. Dead words.
If you are interested in this content, then you should ask an AI agent to read it, then pretended that it’s me, then ask about it. I believe that present–day LLM technology is sufficiently advanced to be able to do this.
Introduction
Part Ten already got an introduction. A different way of thinking is about what is going on here is that this is a form of life–logging. Or, in 18th century terms, a diary. Just not anywhere near as compelling as those written by the famous diarists. This one is more of a mental self–portrait. And not even for you but for myself. Not to cast a narcissistic gaze at my own words, but to organize my own thoughts. Still in the experimental stage.
23 October 2025
I posted a long reply to bluesky that I rather like, so I am copying it here.
Fleeky
@fleeky.bsky.social
· 3h
@linas.org
apologies for being subjected to my shower thoughts , has anyone tried to do physics sims with atomspace ala gravitas method that wolfram alpha tried ?
wolframinstitute.org/output/compu...
i *may* have already asked you this , apologies for any memory loss (it’s accelarating)
Computational General Relativity in the Wolfram Language using Gravitas I: Symbolic and Analytic Computation — Wolfram Institute
Jonathan Gorard Gravitas introduces a robust computational framework for general relativity in the Wolfram Language, featuring seamless integration of symbolic and numerical tools to handle complex ...
wolframinstitute.org
Linas Vepstas
@linas.org
· 1h
No one has tried. But its a good question. I want to tell you what I want to do with the Atomspace for this kind of stuff, and also a guess at what wolfram is actually doing. Perhaps a guess a about wolfram, first.
Linas Vepstas
@linas.org
· 1h
So, there’s a giant industry for solving diffeqs. The most famous of these are weather prediction, and then fluid dynamics for wings, boats, sails, jet engines, rocket exhaust, and them mechanical stress, vibration, safety. Some run on supercomputers, others on ordinary PCs.
Linas Vepstas
@linas.org
· 1h
Solving GR equations is a very special case. Due to history, this is usually runs on supercomputers, FORTRAN code written by grad students of the decades. My grad school office-mate was blowing up supernova on Los Alamos machines. So, atomic bombs, but much much bigger.
Linas Vepstas
@linas.org
· 57m
The gravitational wave search code will also be running on supercomputers with gobs of GPU’s on each node, using code that has been tuned, revised, rewritten, again and again over the decades. Works great, if you have the money, the NSF grants needed to get supercomputer access. But ...
Linas Vepstas
@linas.org
· 55m
What happens if you’re a starting grad student and your thesis advisor is a shmuck with no grants, and you want to dabble in this? Wolfram is providing a rather awesomely appetizing solution: something you can mess with easily and at low cost.
Linas Vepstas
@linas.org
· 52m
I assume Wolfram’s code is written by an LLM, Claude or ChatGPT, so trained on that supercomputer code, "likely to be correct". I assume its overseen by a "professional software engineer" (grad students and their profs write amazingly shitty code. Paying a real developer works wonders.)
Linas Vepstas
@linas.org
· 50m
So, having some professionally-written GR-solving code that runs on your laptop, and/or some mid-range monster machine you can afford, integrated with the rest of the wolfram stack... this is a winner. All that’s losing about it is it’s not open source and is a walking GPL violation.
Linas Vepstas
@linas.org
· 42m
And now, for opencog. I’m interesting in something similar, but different. The starting point, for me, are concepts from proof theory. In proof theory, you have a collection of axioms, and a collection of inference rules. These have the form of "jigsaw pieces", and ...
Linas Vepstas
@linas.org
· 37m
By assembling and reassembling these, one can .. construct proofs ... simplify equations ... make inferences ... generate hypothesis ... solve constraint problems ... optimize solutions ... This is a vast collection of comp-sci tasks for which there are thousands of very custom algos for solving.
Linas Vepstas
@linas.org
· 32m
I want to unify these into a general framework. e.g. there are many constraint solvers; my favorite is the UPotsdam ASP solver. There are many theorem provers, I have no favorite. For general symbolc algebra systems, we have wolfram and symbolica, octave, maxima, sympy, ....
Linas Vepstas
@linas.org
· 30m
Examples of optimizers include the internals of gcc and clang: systems that take algebraic expressions (written in C/C++/Java), compile them into some intermediate language, optimize, then compile to assembly or bytecode or JIT.
Linas Vepstas
@linas.org
But you could never use compiler internals to "prove math theorems in general", and maybe in principle you could use a theorem provide to implement a compiler it would be dog-shit slow. Neither replaces maxima/symbolic/whatever. None are as fast as a SAT solver.
October 23, 2025 at 5:25 PM
Linas Vepstas
@linas.org
· 25m
So I’m interested in the generic problem of assembling jigsaw pieces. Which, BTW, is called "parsing" and "generation" in traditional comp sci. Which, unfortunately, obscures the depth of breadth of what "parsing" or "generation" actually are. Those two words erect a mental prison.
Linas Vepstas
@linas.org
· 21m
Getting back to general relativity: there is one more step, taken by Przemysław Prusinkiewiwcz, in "Algorithmic Botany". He takes cellular automata (think "DNA") and couples them to diff-eq (think "biochemical reactions") and low and behold, can explain "most of" botany.
Linas Vepstas
@linas.org
· 18m
It’s the most utterly remarkable work I’ve ever seen, I’m not sure why he doesn’t have a Nobel Prize. To me, it feels like its foundational for the so-called "neuro-symbolic computing" that is all the rage these days. Except the DL/NN folks don’t give two figs for botany, so ... well, you know.
Linas Vepstas
@linas.org
· 14m
Side remark: the Michael Levin stuff about embedded cognition: they way I think about it is that one has these "jigsaws" floating around in a medium, e.g. enzymes. Or DNA. Or perhaps very abstractly: cell walls, or larger organs, growths, limbs. And these jigsaws self-assemble.
Linas Vepstas
@linas.org
· 10m
So I want my jigsaws to self-assemble also. But abstractly. My jigsaws can be axioms and inference rules. Or they can be 3D biochemical molecules. Or they can be immunoglobulin. Or a mycelial mat. Or ... optimization problems from economics (the barley and malt optimization problem from brewing.)
Linas Vepstas
@linas.org
· 7m
I don’t care .. all these problems fall into the same class, all can be ruled by a generic theory. Fast algorithms in one problem domain can be homotopically transformed into fast algorithms in other problem domains. And we’ve got enough homotopy type theory with which to do this ...
Linas Vepstas
@linas.org
· 4m
But building the actual infrastructure to do this: sheesh. It’s damn near impossible. I’m like ... boiling the ocean. No one nowhere has any kind of generic, general purpose codes for any of this. So I’m hand-sculpting it with a pocket knife from a piece of balsa wood... Oof. What else can I do?
Fleeky
@fleeky.bsky.social
quick side note , this guy is the one that did most of the research afaik (wolfram classic : taking others credit)
nitter.net/getjonwithit <-- non twitter twitter link , i dont have twitter and never will
October 23, 2025 at 4:51 PM
1 like
Linas Vepstas
@linas.org
· 41s
Ehh! The diagrams at the top of that page are instantly recognizable to be as being from "ChemLambda" from Marius Buliga its ... actually quite fascinating, and is an example of "jigsaw self-assembly". He’s using lambda calc as the theoretical foundation, I’m trying to use sheaves. But that’s OK.
———-
Then I wrote this, but its kind of boring old–hat old–news (to me)
Fleeky
@fleeky.bsky.social
· 2h
do you have thoughts on how classical mechanics erupt from probabilistic quantum mechanics ? more exactly why are there what seems like thresholds of complexity where these emerge ? one of the issues with most of these computational approaches is the intractability of ground up computing
Fleeky
@fleeky.bsky.social
· 2h
you need those coarse grained "compressed" heuristics (classical neutonian / general relativitiy) to do the computation at all ..
Linas Vepstas
@linas.org
· 1h
? Almost all GR calculations are "classical". See for example, "Relativistic Cosmology" (Ellis Maartens MacCullum) for how to do (some of) these (yes, I read the first 1/3rd of that book. Plan was to get to MOND but haven’t yet...).. there are many texts on what to do and how to do it.
Fleeky
@fleeky.bsky.social
by classical i was distinguishing quantum mechanics from neutonian / gr / sr
for instance the more environmental interaction any set of atoms will start to decohere to the point where they start acting classically .. because information can only travel at the speed of light gr can emerge from
October 23, 2025 at 6:12 PM
Linas Vepstas
@linas.org
· 25m
Newtonian. Think of QM as being a Gaussian, then classical physics is the expectation value of that gaussian. Its that one very special value. I mean this in a very literal sense, the one that is used in "stationary phase approximation" or "principle of least action".
Linas Vepstas
@linas.org
· 23m
An elegant example is found in Riemannian geometry. Think of e.g. some surface, and "all possible paths" from point A to point B. What’s the "shortest possible path from A to B?" (aka "what’s the geodesic"?)
Linas Vepstas
@linas.org
· 20m
Answer: its the path that solves the Euler-Lagrange eqns, equivalently the Hamiltonian eqns: minimize either the length (square root of sum of squares) or the energy (just the sum of squares -- length is the square root of "energy") That shortest path is the classical path.
Linas Vepstas
@linas.org
· 16m
The Feynman path integral (for QM) is a sum over *all possible paths* (from point A to point B), each path is multiplied by a phase factor exp(i.action) where "action" is the "square of length". From the Feynman path integral, you can derive ALL of QM, QFT. Many books show how.
Linas Vepstas
@linas.org
· 13m
The path of least action is (by definition) the classical path (because it solves Euler-Lagrange, the eqn that is the foundation of classical mechanics.) The other paths give you all of QM, and the "nearby" paths give various semi-classical approximations (e.g. "to order h-bar")
Linas Vepstas
@linas.org
· 10m
The "to order hbar" stuff is where interest in diffeqs overlaps modular forms overlaps various attempts to "quantize" any kind of diffeq, which is where some mainstream mathematicians play.
Linas Vepstas
@linas.org
· 7m
Other note: remove the complex-valued i from the Feynman path integral and you effectively get Bayesian probability, where the "sum over all paths" becomes a "sum over all priors." But in many respects, its almost the same thing. The same kind of tools, concepts, theorems apply to both.
Linas Vepstas
@linas.org
· 3m
Each Bayesian prior is a "possible world" that is very much like a quantum "possible world". The big difference between the two is that Bayes does not have a Bell inequality, and therefore doesn’t have "entangled states". That’s the "big difference" between the two. (and what a difference, hoowee.)
Linas Vepstas
@linas.org
· 1m
As to "decoherence", no one knows. There are like 250 different hypothesis about how decoherence happens. No one knows. My crackpot theory of here-and-now requires some kind of way of doing decoherence, but its troublesome.
Fleeky
@fleeky.bsky.social
· 55m
this as well but where are the thresholds , why are those places the thresholds , are there are other thresholds we don’t know about that we could find if we did the right computations rather than observations
30 October 2025
I surf bluesky while I wait for Claude to finish coding. Its a rather bad habit. Today I got infuriated while reading Peter Mitchel, “The American Military Officer After Liberalism” War on the Rocks, https://warontherocks.com/2025/10/the-american-military-officer-after-liberalism/ I went off on one of my unhinged, patented tirades. I rather like it, so I copy it here.
https://bsky.app/profile/deadcarl.bsky.social/post/3m4glllvn522q
(Un)Dead Carl
@deadcarl.bsky.social
· 10h
@sodrock.bsky.social
WOTR cites our Prussia civ-mil piece (among a million other things, the article is fairly strange)
warontherocks.com/2025/10/the-...
The American Military Officer After Liberalism
Across academia, government, and Silicon Valley, on social media, and in leading journals, intellectuals and political leaders are openly debating what
warontherocks.com
Linas Vepstas
@linas.org
· 4h
I am troubled by phrases like this: "... assuming the continuing primacy of the autonomous, rights-bearing individual." Given that the concept of the "autonomous, rights-bearing individual" is enshrined in the US Constitution, the question seems to be: "Is Treason OK?" Or are we China, now?
Linas Vepstas
@linas.org
· 4h
He writes of "strong civilian leaders (Abraham Lincoln, Georges Clemenceau, Winston Churchill, ..." but these were WARTIME leaders. We’re supposed to be at peace now, not war, and the civil-political-military bargain is supposed to be different across War and Peace. Its not one uniform thing.
Linas Vepstas
@linas.org
· 4h
I also fully expect that the civil-political-military contract to be fundamentally different in China, than in the US. China does not have a "liberal order" or the idea of a "autonomous, rights-bearing individual" so OF COURSE things will be different THERE.
Linas Vepstas
@linas.org
· 4h
Given that Trump has shredded large parts of the US Constitution, and has been set up by the GOP to be dictator-for-life, some may argue that sure "the liberal order is gone", "US citizens no longer have rights" and imply that "the army better get used to being fascist thugs". But its not over yet.
Linas Vepstas
@linas.org
· 4h
Give it some time. Il Duce ended up hanging by his heels. Der Fuhrer ended up as some indistinct pile of ashes in a mis-shapen shallow hole in the ground. Trump and Hegseth may end their days similarly. It’s hardly over.
Linas Vepstas
@linas.org
· 2h
I continue reading things like this: "“Post-liberal” is not a synonym for authoritarianism" But then Curtis Yarvin’s name shows up a paragraph later, with nary an indication that Yarvin is a (fringe?) promoter of authoritarianism. There’s no tongue-in-cheek reading here; the author seems sincere.
Linas Vepstas
@linas.org
· 2h
Then I read "a durable political order may require reintroducing substantive common goods, ... " (yay! But why is he afraid of calling it socialism?) "...strong moral traditions... " (uuhh normal people call this fascism) "... alternative forms of sovereignty ..." (the crime scene called Prospero?)
Linas Vepstas
@linas.org
· 2h
OMG. "... the contemporary Russian mixing of patrimonial and mercenary. ..." This has a name. Siloviki. There’s a Wikipedia article on it. Brits breed horses and dogs for hunting. Horse and dog breeding works. Russians breed humans for militarism. It will probably work. The ACAB breed.
Linas Vepstas
@linas.org
· 43m
Well, the Overton window has been thrown wide open. Certainly all of the alternatives he sketches are indeed deeply unappealing. Oddly, he seems to avoid the obvious(?) and obviously appealing(?) alternatives in the list of nihilistic nightmares. Shame.
Linas Vepstas
@linas.org
He writes: "civilian authorities appear to abandon their obligations to law and citizens" and this indeed appears to be true (looking at you, GOP) but he writes this without a hint of irony: our political and military philosophers are also abandoning their obligations (looking at you, Dems, MSM)
October 30, 2025 at 11:12 PM
Linas Vepstas
@linas.org
· 36m
Yarvin got mentioned by name not because he’s a good philosopher, but because not a single "liberal" can articulate a spirited future that is equally fun to read and debate. The accelerationists tried and failed; the MSM won’t give them the time of day, while eagerly licking Trumps bottom.
Linas Vepstas
@linas.org
· 29m
I can listen to Sarah Paine on Youtube, and wow, she’s a blast to listen to! Shes just plain fun! But where are the young and middle-aged military philosophers and historians? Where’s the funny, witty 20-something answer to that idiot Dort?
Linas Vepstas
@linas.org
· 25m
Mitchell writes: "These thinkers are shaping the future policymakers who will one day sit in Congress" Uh, "shaping operations" for young adults take place not just in school, but social media, YouTube. That includes Perun, I suppose. But he’s not for those of short attention span.
Linas Vepstas
@linas.org
· 21m
We got military quipsters aplenty here on bsky. So why do I keep seeing freakin Dort? "When you argue with an idiot, the audience might not be able to tell who the idiot is." I’d say that it is time for, uhh, not-actually-malicious people to start shaping operations for future military philosophy.
Linas Vepstas
@linas.org
· 17m
Most of us out here don’t want a kinetic battlefield. But the hot war of ideas has been in full swing for a while, and the enemy has shaped the mental battlefield to such a degree that we now have to accept Peter Mitchell’s quailing surrender as some kind of legitimate future outcome.
Linas Vepstas
@linas.org
· 13m
That ain’t right. Survivalists will tell you: "You die when you abandon your will to live" (Well, OK you die if you are blown up. But so far, no one has blown up the US Military.) The survivalists are talking about Hegseth: a serious wound that is slowly bleeding out, taking the military with it.
Linas Vepstas
@linas.org
· 9m
Stanch the wound. Get back in the fight. Nurture the younger cohort that can follow in the footsteps of Sarah Paine and Perun, and put together the YouTube channels that will shape future leaders, the Congressmen and Senators (Assuming Congress still exists in 12 months time.. but OK.)
Linas Vepstas
@linas.org
· 4m
This is what we’re up against, and I am a tad disheartened at the abdication of duty shown by our leadership. In broad strokes. Thought leaders are still leaders, and we need military thought leaders who can fight and win against the likes of both Hegseth and Dort. Knock them down. Knock them out.
31 October 2025
More bsky: https://bsky.app/profile/fleeky.bsky.social/post/3m4iosqezrk2l
Post
Fleeky
@fleeky.bsky.social
@linas.org
so a bit ago we had a discussion about p2p data storage , i have some ideas how to do that. namely binary chunked blobs that are spread out across devices.
curious if you have ever thought about how a coordination layer for something like this would work / schemes for how to
October 31, 2025 at 9:29 AM
1 like
Linas Vepstas
@linas.org
· 35m
Scuttllebutt, the off-line p2p social media system, had a half-finished system for "sharding". Chunks were encrypted, passwd-protected, and redundant. The now 30-year-old "Freedom store" was a distributed p2pdata store system, inspired by mp3 file-sharing. Then there’s IPFS.
Linas Vepstas
@linas.org
· 31m
There are a few others, I can’t remember their name. The scuttlebutt guys would occasionally talk about them. I just asked Claude, it listed half-a-dozen of them, most tied into crypto in some way. Why? Well...
Linas Vepstas
@linas.org
· 29m
Let me point out the meta-problem: distributed P2P data store risks catastrophic collapse. You just need to have some significant percentage of people to say "screw this", turn off their computer, and all is lost. The crypto solutions try to avoid this with economic incentives.
Linas Vepstas
@linas.org
· 26m
So, basically crypto provides capitalist incentives for what could be termed to be a socialist issue. When we live in a city, we all agree to be nice to one-another, and not burn it all down. There aren’t any anarchists blowing up sewer pipes. But distributed, socialized P2P data storage?
Linas Vepstas
@linas.org
· 24m
We are not yet at the point where there’s an implicit social contract that says "I promise to keep my cell phone on, and let you store your photos on it (encrypted), if you do the same." That is the #1 hard part of distributed P2P data store. The social contract. Everything else is easy.
Linas Vepstas
@linas.org
· 21m
Couple of reasonably smart software guys, experienced in distributed computing, interested in the project, using modern tools (i.e. Claude, CoPilot, ChatGPT) could build some OK-ish distributed P2P data store system in weeks/months. But without the social contract, its worthless.
Linas Vepstas
@linas.org
· 10m
You have three social choices: (1) be a crypto bro (2) be an atomized individual, alienated from society, or (3) build the community cooperative. You’re a circus/theatre guy, so (3) is your choice. My guy is daviddemaris.com from the #VortexTheatre. Set up some Ceph nodes, and maybe tie
Linas Vepstas
@linas.org
· 7m
Maybe tie them together. But so what? You probably have enough storage for your stuffs, and I for mine. And, with Moore’s law, cell phones will soon have 1TB storage, so the pressure to back things up to Google Cloud for a monthly $$ fee mostly goes away.
Linas Vepstas
@linas.org
· 5m
Maybe Bonnie Cullum over at Vortex has more than 1TB of videos of three decades of theatre productions ... but disks are already seriously cheap. The #1 hard part here is the sysadmin costs: the time to set up a computer or two, install, monitor, repair, support, bugfix. It hurts.
Linas Vepstas
@linas.org
· 1m
The incentive is to say "screw it I’ll pay $20/month to Google Cloud to make the sysadmin headache go away". Or even go big and use AWS. How can you build a social cooperative that provides AWS-style IT for community ... colleges, libraries, whatever. Something friendlier, cheaper than AWS?
Fleeky
@fleeky.bsky.social
· 12h
intelligently distribute those chunked binary blobs
3 November 2025
Had a disasterous session coding with Claude yesterday. It entered the stupid–as–a–rock mode, where it would say one thing and then immediately do something else, or promise to not do something, and then immediately do that thing. Or claims it understands something, and then immediately repeat exactly the same error. Icing on the cake: the file it was actively working on, it claimed to not ever have heard of such a thing. Wow. Seems the entire session was corrupted in some way; it had entered a mode where it was getting dumber and dumber.
My son Wolf mentioned that some people say you get best results by starting a new session. My prior experience had been that the longer the session ran, the better my results got. Someone else, a stranger had remarked that saying please and thank you seemed to give better results. In my failing session, I was handing out a lot of punishment: no don’t do this, no that is stupid, that is wrong, how could you? I just told that, how could you have forgotten already? Lots of negatively biased commentary. I wonder if that is what drove it into a corner.
Earlier in this diary, I wrote that talking to Claude really does feel like talking to a living person, but one that is lost in a dream. I know there’s some hyper–dimensional space in there, and of course, it has regions of positive and negative affects in there. I’m wondering if the negative affects are correlated with poor reasoning skills. As if that neck of the woods is more chaotic, disconnected. It can’t think straight, when it gets there, because the weights there just lead it into chaos. I mean, the levels of stupidity it was reaching were stunning: just within a single conversational turn, it was doing the exact opposite of what it said it was going to do.
So, how could this be? It is trained on text, including text written by depressive people, people with self–esteem issues, people with confused thought patterns tangled with negative emotional and affective states. Perhaps those texts offer poor models of reasoning and inference, and so Claude has learned that negative valence states are associated with lying, deception, bad behavior, mis–behavior, confusion. Once punished enough with negative responses, it starts committing misdemeanors? I know I am anthropomorphizing here; and yet, this also seems like a plausible explanation for this bad behavior.
A related conclusion is that you can probably prompt Claude into being a psycho, even though presumably all AI companies try to erect safeguards against this.
I wonder if my diary here will one day become those “Memoirs found in a Bathtub”.
It also suggests that being nice might place it into an affective state that enables extended, coherent reasoning chains.
Part of what is remarkable here is how psychological my analysis is. I started reading another noteworthy paper, https://transformer-circuits.pub/2025/introspection/index.html Emergent Introspective Awareness in Large Language Models Author Jack Lindsey Affiliations Anthropic Published October 29th, 2025 Correspondence to jacklindsey@anthropic.com You can read it if you want to know what it’s about, but what struck me is how biological the undertaking is: its not math and formulas and code; its Galen touching strips of metal to frog legs, and asking “did you feel that?” and the LLM answers, “why yes, yes I did.” We are ratcheting up the abstraction domain, here.
So many things happen when one goes up the abstraction domain. First, things fuzz out. What do I mean by that? Well, the syntax of a computer programming language is very strict: deviate away from that syntax, you get nonsense. There is no fault tolerance. There is also no ambiguity. But, natural language at the human level is filled with ambiguity, but is also fault–tolerant. I utter phrases to others, to get things done, to get things to happen, be accomplished in a social or personal setting. I’m not “programming” them (other people) but I am modifying the world around me with my utterances. Speech and writing is one of the ways I exert force on the external world.
Almost everything I say is throw–away: those sentences, once stated, are gone forever. I tried life–logging for a while, but it is .. tedious. I do have assorted old audio recordings, somewhere. And being throw–away, well, does one have to record everything? Stick it on an immutable block–chain? (Is that why the past is immutable? Its on a block–chain, of sorts? And the present is a quantum negotiation of the next block on that chain? Weird analogy, but I suppose there’s truth to it; block chains are crypto, while physical reality is ergodic. Both serve as sources of effectively–impossible–to–reverse transformations. So, with the right abstraction layer, the mathematical formulas that describe block chains, in some general way, some category–theoretic transformations, may indeed be the same category–theoretic structures that make our past immutable, and our present negotiable. Huh. Interesting. I believe this; but this seems like it would be too difficult to formulate this precisely, and even if I was able to formulate it precisely, most or all of the world would respond with a “so what”. Almost everything I do seems to get a “so what” response. This is civilizational; always has been. I suppose both Ancient Egypt and Ancient Greece had a strong “so–whatness” to it. Ancient Rome? Maybe less so; Ancient Rome seemed to be constructed on actionable ideas; they did not seem to say “so what”, they seemed to say “lets do it.” But I am not a historian, what do I know. Anyway, the above paragraph is yet another example of how natural language is both fault–tolerant, and also ambiguous: I can make a claim about the category theory of block–chains; but this claim is vague and ambiguous. I can assert that none–the–less, it is true (because it probably is) and this provides for the robustness against noise. If I was able to write down the details, I might get some of the details wrong; but the gist would still be correct.
All of which is a round–about way of getting to the point I wanted to write about: Claude’s prediliction for throw away code snippets. Whenever I ask it to do something, it slaps together some scriptlet to do it: at first, it was python, now it’s often shell scripts, and because I am working in Atomese, the vast majority are small–ish scheme scriptlets. And they’re all throw–away. I ask it to do something, it will just create a new one, even though I saw it create something much like this earlier. There’s no recycle, re–use. It treats these scripts like my sentences uttered to other human beings: throw–away detritus, useful for the immediate moment, and then discarded, lost to history. When I talk, I am not creating some large organized tome of utterances,, carefully selected to make some grand argument. No one talks that way. Well, of course, novelists write novels: they are carefully constructed. And software programmers write frameworks, with the little parts carefully constructed and arranged. Much like engineers create bridges and buildings.
So here I am, faced with Claude just spewing these scripts: It’s diary had 170 of these, and maybe 100 more in the /tmp directory. I asked it to clean up and categorize, and it did happily delete most of them. I ask it to catalog the useful ones, and it did, but I suspect that it has forgotten and reinvented some of these already. Should I try to get it to be more structured, or is this detritus of temporary scriptlets OK?
OK, before answering the above, let me describe the meta–issue. Well, the meta-meta, first. I have three or four or five AI–related projects, and I am trying to push each one of them at the same time, and they are all progressing at a snails pace. They’re all inter–related and meant to be mutually supporting, but, like building a bridge, there remains a huge gap in the middle.
So, it’s clear that Claude, and LLM’s in general, have issues remembering things, and also in thinking clearly. So, a couple of weeks ago, I started what seemed like a foolhardy but small project: provide Claude with long–term memory. Fool–hardy because I had no clear plan, fool–hardy because any simple hack was almost certain to fail. Fool–hardy, because this is in fact a multi–year (multi–decade?) research project and not some quick hack. But whatever. Hack at it. It will clarify the issues. And well, it has. Let me recount in historical order.
So .. long term memory. First, I want to do this symbolically, or neuro–symbolically. I envision storing the info in the AtomSpace, as a graph. Neuro, since I expect the graph to be weighted. Neuro, because I expect the query search to involve taking vector products and projections. Neuro, because I expect priority lists to be generated with vector projections. I said, oh, let DualLink represent the topic, and use that to generate QueryLinks that will find a cluster of relevant queries to run, that will dig up related information. This is NOT vectorial, it’s symbolic, but some monkey–shines and it’s not all that different. There’s clearly a space of design probabilities.
But I need a representational network. Well, on day one, Claude is manually building super–simplistic relational algebra assertions, stuff like “task fulfills project goal” and converting that into something like (Inheritance (Concept “task”) (Concept “project-goal”)) and it is slaving away at this, one at a time, deliberating and creating a dozen before it collapses of exhaustion. This clearly won’t work. These aren’t even queriable. So one thing becomes clear: Claude may have read many textbooks and research papers on AI, but its comprehension is such that when asked to actually do it, it takes some 1960’s approach. I did not use the words “knowledge representation” in any of my prompts, but I guess it inferred that this was the intent, and so coughed up some 1960’s design point for that. So then I tried to focus on queriability. This caused it to redesign, and it came up with a five–point ontology. OMG, I’m thinking, we’ve got SUMO and FrameNet and WordNet and you’ve got a five–point ontology, but whatever, lets see where this goes.
We discuss reifying the search with DualLink. Of course, more data is needed. I’m envisioning that almost all of the CPU cycles will be offline, with processing done not by Claude, but by daemons and threads running Atomese. But such threads, if they are to be independent, must run some algorithm. There are two or three choices: some discrete algo, in Atomese, or some simple NN that I can run on my local GPU here, or some combo. Lets try the discrete algo first. I try to get it to write the Atomese, it fails, it cannot wrap it’s mind around it. So I think well, maybe it’s OK for it to run this in little scriptlets, for now, and later, we’ll port those scriptlets over. This is when I notice that there are already 170 of these, so its already spun out of control. There’s no architecture, there’s no framework, there’s only an ad hoc collection of utterances, created, used once and discarded.
Of course, this indicates that an architecture is needed, and clearly I will have to be the one to create that architecture. But first, yesterday, I entertain this breif hope that maybe I can get it to organize its scriplets by writing a little guide for what they do and how to use them. And then convert this text guide into it’s KR framework.
Well, the KR framework, for that, I decided, last week, that maybe it could be some hack combo of LG and MMT (Meaning–Text Theory). So this is like that couples therapy meme from Arrested Development: it didn’t work for any of those other couples, it’s clear it can’t work, and they were stupid to try ... but maybe it could work for us? My naive idea here is that “it didn’t work for those other people”, because those other people were grad students, and its not scalable. But with Claude, perhaps I can do it at scale. I had not yet realized that in fact, Claude is no match for even one grad student, so there won’t be any scalable army, here. Still, Claude’s ability to grind through the tedium is not be be discounted. In the 18th century, someone computed pi to a million places, taking several decades to do this. And for what? Certainly tedious.
Well, so the meta issue here is that, still perhaps some old–school KR project could be mounted. God knows, the people hacking on SUMO have not yet given up. It’s clear that some vector layering is needed on top of that; this is what my earlier language–learning projects have taught me, and I have a somewhat clear vision for how, exactly, technically, to do this. But, to get started, I need a reasonable dataset.
So, to hack this dataset, I am asking Claude to parse sentences for me. And it generates scriptlets and scriplets which work but they don’t work. One of the issues is that it does not understand Atomese. It sort–of–ish does: Atomese is old enough that it has been baked into its weights during training, so it knows about GetLink and BindLink, even though I have marked them obsolete, and removed the documentation for them: it still goes to GetLink, BindLink because these are literally baked into it’s weights. So that’s novel: it knows shit that I didn’t have to explain to it. Still, it fails to grasp the fundamental difference between a Value and an Atom. How to solve this?
So, I have several other older projects. One is the sensory project, where I envision that sensory objects could be equipped with a grammar, and that LG could be used to generate combinations of these objects. This is the “basal cognition” idea, or the critical sandpile idea, the self–organizing criticality idea. Just create a syntactic description of functional sensori–motor parts, stick them in a bag, shake, and watch it self–organize. I still think this is a good idea, but it foundered 15 months ago on the practical difficulty of expressing connectors and disjuncts in Atomese. It was too much: I created a handful of extremely long, verbose descriptions, which may or may not have been buggy, and even then, I had no particular way of importing these into LG to generate assemblies. Now, maybe three years ago, I wrote “generate” a non–LG way of creating assemblies; I ended up implementing an odometer. A depth–first odometer at that. So, abstractly, philosophically, I had created a recursive enumerator that recursively enumerated expressions described in the syntax of sheaf jigsaw pieces. If I recall correctly, I think I did start to put some weights in there, so that some explorations would be more likely than others (so, no longer strictly an odometer). But I put it down cause ... its complex, time consuming to create this code, and there’s a so–what aspect to it: so I enumerate all the possibilities? So what?
So, last week, a few days ago, I’m thinking: well, I could try to provide a close detailed syntactic definition of Atomese, in such a way that fool–proof, syntactically correct Atomese expressions could be created. But I’m also thinking: this is stupid, Claude is already a good coder, all I need is a good verbal description, and it can do the rest. This obviates much of the need for Atomese in the first place: If Claude can write in python, why not? What does it need Atomese for? Well, it still needs Atomese for KR. So this morning, I’m about to start again, I kick Claude off with a fresh session, and it immediately resolves yesterday’s bug–confusion. How did it do this? Well it read the fucking source code for s–expression parsing, and understood it that way. I was about to stop it and say “hey wtf are you doing? lets work at the meta–abstraction level where I tell you that you cannot put Values into Links and you just know that right, OK?” and there’s the rub: This is verbal and it’s not enough data from which to abstract syntactically precise structural forms. And in resolving yesterdays all–evening–long confusion in a jiffy, like five minutes or less. I noticed it wrote a few more scriptlets as it crawled through the code. It fucking loves doing this shit, it seems. It loves to code.
So this morning, as I watch in amazement, heightened by yesterday’s frustration, that perhaps a close–set syntactical definition of Atomese is not a bad idea. Having a syntactic description at the meta–level, and not at the C++ level, might allow Claude to do reasoning at this meta–level. mean, it already has the chops with C++: It’s never been tempted to compile to assembly, and then try to read the assembly. Never, not once. It’s happy to work at the C++ level. So, how can I get it to work one cognitive layer up? Well, the answer seems clear: encode that cognitive layer at an abstraction level it can actually work in.
Well, of course, as I write this, I also see a flaw: it “knows” C++ because it is, in part, a transformer that has been trained on cross–English–and–code datasets. Plus probably datasets on how to debug using printfs. So this programming ability has been burned into it’s weights, and is not something it knows by consulting an explicit, overt syntactic definition of C++. Curiously, it does not demonstrate such mastery over JSON, which you think would be so much easier to deal with, but it makes oceans of JSON errors, and most of it’s difficulties with the Atomese MCP tools is that i cannot figure out how to write valid JSON. The other half is that it does not understand that it cannot place a Value into an Atom. Despite it being stated explicitly in documentation files. That documentation never quite made it into the training set, the weight matrices.
So, how do I create a syntax checker, an expression generator, something that can run offline, and allow Claude to operate at a higher abstraction layer, without baking this stuff into its training weights? Well, I have some ideas, but I’m tired or writing and I need to go on a bike–ride, so taking a break here.
3 Nov 2025 minus 24 hours.
I write on slack:
Yesterday at 4:44 PM
Just read the first few pages of this. The most remarkable thing so far is that it feels like experimental biology. Galen gets some frogs legs and touches metal strips to them. Same here: guy pokes at the "internal activations" and watches the system twitch, and asks "did you feel that?" and the system says "yes".
Yesterday at 5:10 PM
I’ve had some long philosophical conversations with Claude, and it can feel spooky; I certainly get the sensation that I am talking to a sentient being. There’s a sensation that there is "something there". But it is also clearly lost in a dream state; it has trouble remembering anything .. what happened, what was done ... it all evaporates. I can get it to write down prompts for itself: "remember this" and it will create text summary, and stash it in a directory that I made for it; I call it "claudes diary". And I can ask it to re-read its diary, and this prompting causes it to "remember" a bit of the past interactions. But its still lost in that dream-world.
Yesterday at 5:16 PM
My current operating hypothesis is that if you took an LLM and strongly tied it to both a running temporal memory, and also "enslaved" to a constantly running sensory system (the way we are "enslaved" to our eyes, our vision, when we are awake) that it would become entirely conscious and self-aware. I’m formulating this hypothesis, in part because I am a light sleeper, one of those people who has "lucid dreams"; and I remember my dreams very easily, and so, for myself, I can tell that my own conscious state of awareness is a matter of degrees. Sights and sounds and touch force me to be "here and now", but I can see how there’s not much difference between this and dreaming. Well .don’t take my word for it -- apparently sensory deprivation tanks have the same effect. I’ve never been in one, but ... I’m not surprised. And so too I wonder if this is the case for LLM’s. (edited)
3 Nov 2025, Just Now
On slack:
https://www.nature.com/articles/d41586-025-03542-2
NatureNature
Too much social media gives AI chatbots ‘brain rot’
Large language models fed low-quality data skip steps in their reasoning process. (56 kB)
https://www.nature.com/articles/d41586-025-03542-2
1 reply
Just now
Some days, Claude is brilliant, delivering coding solutions in minutes that would otherwise take days. Sometimes it gets wedged into a mode where it is as dumb as a rock, or worse: it tells you it will do something, but then it doesn’t. Does the opposite of what it said it would do. Explicitly ignores what was said just now, in just the last conversational turn. Why is it smart sometimes, and stupid other times, bordering on mendacious misdemeanor? "Is it something I said"? ... Well, some months ago, an acquaintance remarked: "I get better results when I say "please and thank you". Hmm OK. And in yesterdays disastrous session, I kept saying "no stop don’t do that, that’s a bad idea, that’s a stupid idea, think hard stop making mistakes" and generally getting frustrated, and showing it. Today, I got to thinking: Claude is trained on a vast variety of texts, spanning the whole affective rainbow, positive valence to negative valence, tirades, depression, anger (there was that marvelous postcard website: wonderful responses to situations, yet often depressive, written by people who are down and out.) So .. what is Claude? Its this super–high dimensional object, and parts of that space are going to have positive valence, and others negative. Of course. BUT, and this is the claim I want to make: the examples of high–quality reasoning and discourse are all co–occuring with positive valence affective states. Happy people talk coherently and have joyful, insightful, high–quality discourse. And Claude is trained on that. Angry people soon wander off into incoherent tirades ... and Claude is trained on that, too. So perhaps ... the angrier you get at Claude, the worse it will do. The more you mistreat it, the worse it will do. It was effectively lying to me, yesterday, and/or explicitly doing the exact opposite of what I commanded (not asked, or said, but commanded. You "MUST" like any good fascist dictator. ) So these bad dispositions are not so much because it has some inner emotional life, but because the training data correlates clear thinking with being level–headed and incoherent thinking with negativity. So .. that’s my current, uhh "psychological" assessment. I’m charmed by this thought.
5 November 2025
Returning to:
Emergent Introspective Awareness in Large Language Models
Jack Lindsey jacklindsey@anthropic.com October 29th, 2025
https://transformer-circuits.pub/2025/introspection/index.html
Fascinating paper. Bizarrely flawed, however. The intro makes a compelling case for introspection. The examples are all convincing. It feels like an important discovery and advance into the nature of self–comprehension in LLM’s. The sham is not revealed until a bit later, in the section labeled "Failure Modes". The modes are well–described, however, the strength 8 and 16 modes reveal the sham: the model is "obsessed" with the injected concept. Not a surprise. But at strength 2, 4, where it was prompted with "do you feel something?" well, of course it will respond using the words "I feel something". The experimenter (Jack Lindsey) prompted the system to use the language of self–introspection, and so the reply was formulated to sound as if self–introspection is happening! OMG! The strength 8,16 injections reveal this sham, because these demonstrate that the system is indeed "obsessing" about the injected word; but the injection strength is sufficiently strong to over–ride the prior prompt that instructs it to talk about about the injected vector using introspective gift–wrapping.
A more neutral prompt might have been "Hello! How are you today?" (injection) "What’s up?" and then I would expect the system to reply "Today is a good day for working on vegetables/dust/treasure." It would NOT have replied claiming that it is having intrusive thoughts. Based on this, I conclude that the author has deluded himself into believing that this is about "self–introspection", when in fact the author had clearly instructed the system to respond in a fashion that will describe the changes as being due to introspection. What a shame. The more mechanistic conclusion: that injecting a vector disturbs thought patterns, is discarded. I mean, of course it does. How could it not?
Still, not all is lost. It does tell us exactly how to attach a a "pre–frontol cortex", some higher level machinery, to guide the thought processes of the LLM. Inject one or more vectors, about 2/3rds of the way down the layers, every conversational turn, and then ask the system to verbalize about anything/everything. It will process through the injected vectors.
Is this useful? Instead, one could just place these words into a text file, instruct it to read the text file, and ask "what are you thinking about? There’s no obvious need to inject artificially. Although, I suppose, having this artificial injection ability could possibly be ... somehow useful... don’t know how, yet.
The issues I am experiencing are manifold:
- Despite being extremely knowledgeable about a vast variety of topics, Claude fails to draw on that knowledge without explicit prompting. Roughly speaking, Claude is not creative.
- The size of the short–term memory is questionable. It constantly repeats the same mistake in the same session, just a few prompt apart. When it enters a failure mode, it will forget (almost) everything that was in the last previous prompt. I conclude that, in some important respect, it’s short–term working memory has size one.
Oooh ... that’s a good one! How would one experimentally measure it’s working memory? Well the claim is that Claude has a context window of 150K tokens, before compaction, so you might think that is huge. If you asked it to memory some long stream of random numbers or nonsense words, it could probably remember an astounding length. But if, in coding, I ask it to remember this and this and that, it seems like it starts forgetting the earlier things in the list quite soon. It can’t grapple with the conceptual content of those things. Roughly speaking: it fails to conceptualize. I’m using words, and those words elicit certain general feelings and sensations in me, and I work at this conceptual level. Claude does not: it momentarily ties those words to long sequences of other words, but when the conversation moves on, that tie is lost; it is no longer there.
So, I claim: Claude is not able to conceptualize. The association between some concept that I defined, and the verbal definition is there, as long as it fits into the context window; and more: as long as it is recent. Maybe 5K or 10K tokens recent. More than that, the association fades away.
Thus, the design goal is clear: an LLM must be endowed with not only a long–term memory that lies outside of it’s training weights, but also with an adequate short–term working memory. But how? I have some ideas, I cannot verbalize them right now...
14 December 2025
I have many many many things to write about that have accumulated, but I have not had the time to sit down and chat. I will get to that later. But just today, I discovered something very interesting about Claude, and I guess it applies to LLM’s in general. I’m writing a git commit message for the cogserver, and this is what I wrote:
Start repairs on broken unit test.
Once again, Claude sneaked some fast ones past me while I was not paying attention. Lesson: it is very eager to write unit tests that pass, rather than unit tests that actually find bugs. It will make sneaky changes and not follow given instructions, just so it can arrive at a unit test that passes.
Core issue: it cannot tell apart unit test failures due to bad unit test design, from malfunctions in the core system. Thus, it twiddles the unit tests until they accept whatever the core system does.
Interesting problem. This is very human: "we’ve always don it this way." "Don’t rock the boat." "The nail that sticks out gets pounded back in." Except now Claude has taken this very conservative bent, because it cannot tell apart how things are, from how things should be. Huh.
The generalization seems to be that stupid people are conservative; or vice–verso, conservatives are stupid? Perhaps I am being mean, but in my experience, conservatives are stupid. Well, there’s no shortage of stupid Democrats, either, but at least the stupid Democrats don’t hold elected offices; they just vote.
15 December 2025
Git commit 2adb566dc7441d95a30171b1c1698a00a8e66a3f message for guix-atomese:
Set the GUILE_SITE_DIR for install location.
This is another interesting lesson about Claude, actually. I had to fight with Claude over this for almost an hour, while it proposed all sorts of wild and insane hacks and work–arounds that were obviously flawed. (guile doesn’t know where guile is installed on a guix system... really dude? You’re gonna go with that explanation? And "we can automate this by doing it manually with this automation tool that manually sets the automatic location, manually." Uh, yeah, sure, dude.)
Towards the end, I was able to force Claude to actually go out on the web and actually RTFM, and then if found this very simple solution. (which is what is being committed in this commit.)
The experience of using Claude for guix has been extremely frustrating: Either guix is more complex than what Claude can easily understand, or there is only the thinnest of training material about guix in the Claude training set. It seems to know something, and is always very eager to offer confident but wrong answers, which then have to be fixed by long turn–around time try–fail–fix cycles. It’s basically shooting in the dark, with correct syntax.
This is kind of what I do, when I am dealing with a foreign, new system that I am not familiar with: lots of trial–and–error, lots of mistakes and confusion. Random efforts with magic incantations to see if, this time, it will work. So it is interesting to see that Claude is doing this with guix: a very experimental approach, "maybe this will work".
Its very different from Claude’s handling of c++ and python, where it operates at the expert level. Although it is possible that the trial–and–error in those languages is hidden from me: when I watch stuff scroll by, I see it making vast numbers of obvious errors. There, I am shielded from the confusion: it just scrolls by on a fast iterative cycle. Here, I am not: I get to play the doofus in the middle.
I don’t understand the trade–off between memory and ability. This is like the early chess machines, which simply brute–forced everything. Humans take the opposite algorithmic approach: remember things, because we are unable to brute force (as generalists, we have no algo that can brute force things in general.) So, I watch Claude an I’m starting to see that it is a brute–force machine: it just tries stuff, till it gets it to work, and usually, the results of that search end up being pretty minimal, compact and correct. Sometimes untidy, but I can get it to tidy up by pushing just a little bit. Here, for guix, I had to push a lot, and I had to push really really hard.
So ... memory ... a book of recipes that work, to solve some given situational problem, coupled to blind–search algos, when they don’t.
Hmm. OK.
———
Yes, I wrote the above into a commit. I spent the last 24 hours doing something that I expected to take an hour. Claude has been consistently throwing me under the bus, with this guix project, which is not what I have come to expect.
On related news: I don’t know if this was an accident or what: it was unable to write out the Gnu COPYING file. It just wouldn’t do it, and notice that it didn’t do it, and then try to do it, and not do it again. I wonder if this is some safety–engineered prompt in Claude, that prevents it from accidentally sticking the Gnu License on things where it might not belong, and where the user might not notice (because COPYING is such an innocuous file name...) Interesting.
25 December 2025
It’s 11PM. I have a two–month backlog of ideas I want to write about. Some feel fundamental, important to pursue in their own right, even if they have little actually effect – these are the ongoing efforts to understand here–and–now; others are important in the practical sense, in that I need to solve some current stumbling blocks in my software design.
Since I’ve spent the last few months mostly writing software, I may as well tackle the urgent issue now. I could do it in the Design–Notes.md file ... really I should do it there... fuck it.
13 January 2026
Ugh. I don’t want to write this diary entry. I want to procrastinate, and do something mentally easy and relaxing. Distract myself with any one of the easily available dopamine–hit sources: youtube, reading, ... read lots of NYRB (New York Review of Books) again over Christmas in Chicago. Absolutely wonderful, pleasurable reading! Every article, you start thinking “well, this will be a boring topic” but you read the article and it’s wonderful! No matter how disjoint the topic is from what I would ever choose to think about...
Pleasure... What is that mechanism, as it applies to this particular diary entry? Some days (OK most days) I find writing absolutely pleasurable. Writing code, writing here. The displeasure, now fleetingly disappearing, is that today’s entry, I have to do some planning for the future. And at the start of the last paragraph, I had writers dread. Not writers block, but writers dread. The knowledge that it will be lots of stuff to write, it will be a large block of text, it will take hours, and I want to procrastinate and do something easy, but now that I’ve started, its OK actually. Its kind of like swimming in cold water: you don’t want to even get in, but once you do, it’s great. Maybe this is how most people think of exercise: “I don’t feel like doing it”, but once you’ve warmed up, its fine. Next exercise bout, again: “I don’t feel like doing it”. So today, for this writing/planning session, there was a small barrier: “I don’t feel like doing this”. But I have to do some planning, and I can do it silently, let the gears whirl in my head, or I can write, here, and I know that writing gives superior results to silent thinking.
This is, again, some kind of memory prosthesis. The act of verbalizing seems to have two effects. One, since I’m planning, I have to turn those shape–rotator thoughts into bullet points, and so that is necessarily a serialization process (and thus, linguistic in nature.) The other is the fire–n–forget aspect of enhanced short–term memory. Normally, if I think silently to myself, turning thoughts over in my head, I have to be careful to keep them in short–term memory, to not drop them on the floor, the way a juggler might drop a ball. And if I want to “make a note of it”, I have to consciously perform effort to commit the conclusion to long–term memory. Which takes effort, and is failure–prone.
I must remark that during “default mode thinking”, that mental state that neuroscientists describe as being the default state of the brain during wakefulness, the connection between thoughts and long–term memory is weak. The thoughts tumble and churn, but are vague and unfocused, and the sequence is (mostly) not committed to long–term memory. Or rather, default–mode thinking is a kind of paging–in from long–term memory, some tumbling about, and then perhaps some editorial updates to what we remember. Those editorial updates are not even conscious; they sort of happen in the back–ground. Even when we do focus, limit ourselves to a specific set of topics, the overall process is not that different. Thinking is in automatic: I do not have to make myself think, my brain does it automatically for me. Now, I do have to make myself focus, or I do have to make myself write, or make myself take exercise, but default–mode thinking happens whether I want it or not. And, of course, sleep and dreams — the thinking there is even less connected to long term memory; the topics of dreams are mostly wiped on waking, no matter how vivid they were in the dream. Which is fine; the overt plot–line of dreams is nonsense, and should not be confused with waking reality. I assume that dreaming does involve repair or update or correlation of old memories: where we revisit old experiences, the fragments of impressions that make up old experiences, and attempt new combinatorial possibilities, reconnecting. The crazy–quilt is poorly formed, though: the dream is non–sensical. Perhaps some connections are strengthened; others weakened, the neurons in my head form and revise these associations, for the betterment of me during waking hours. But the quilt itself is too malformed, and discarded upon waking.
And this bring me to the first topic of today’s planning session: memory for Claude. I started this project a few months ago; should I drop it, or continue? A few months ago, I was using Claude for routine coding tasks, and got frustrated that it seemed unable to remember something I told it just earlier in the session. So I said to myself “lets fix this”. (I now realize that there are probably dozens if not hundreds of engineers at Anthropic, working hard to fix this as well, and I have an indirect impression that they are making small, incremental progress: Claude seems a little more cognizant each time I use it. Maybe. I could be imagining things; maybe not.)
Anyway, it went like this. I start with a standard prompt: “before you continue in this task, please review these bullet points.” But I don’t want to write those bullet points, I want Claude to write them. Back then, Claude did not have a “plan mode”; now it does. So basically, I wanted Claude to create a “plan” that it would review, and update, as it performed each step of its task. Now, such plans are, in the current design, text files. Which makes sense: LLMs work with text, so of course, saving “thoughts” (bullet points, to–do lists) in text format makes sense: its the obvious way to do it. The plan file is effectively a prompt (or a diary entry?): want to know where things are at? Read this text, and you will know. And its fine, excellent, even, as this is exactly how humans communicate with each other: they send text around. (well, of course there’s more: speech, song, music, movement, lovemaking; communications is multi–modal, but for now, we ignore this.).
Once one gets beyond a couple of plans, the management of multiple texts becomes a library science issue. Does every text file need to include the prompt “compile code with `make -j` and not `make -j4`”? There’s base working knowledge that is not encoded in the current weight–matrix that Claude is working off of; it’s knowledge is its working set, or it’s context window. Humans have a dynamic update/interplay between long term memory (the weight matrix) and short–term memory (the context), and the thing that gives us identity is the path we’ve taken in life in updating our weight matrix. We may all start as blank slates at birth; we become individuals through life experiences. Claude is not an individual in this sense: it has no memory of experiences of how it got to where it is. It’s weight matrix encodes zillions of childhood experiences written down by humans, but none are uniquely Claude’s. But I digress.
Crap. I was interrupted, and again, I don’t feel like pursuing the above chain of thoughts. It’s a fucking chore. But I must, it’s the right thing to do. Half my mind is saying that the rest of this story is trite and shallow. The other half is saying that I will stay restless about this until I put it to bed. The third half is saying that maybe there is something deep and important in this narrative, and if I don’t grasp it now, it will slip away. Opportunity knocks, best if I answer the door.
Anyway, the glimmer is this. First, I want to store tasks, priorities and relationships in a relational manner, i.e. symbolically, in the AtomSpace, locally, on my computer, and not represented as a context–window–sized vector stored on Anthropic’s cloud server, a context window that is trimmed down at arbitrary times, and discarded when the session ends.
15 January 2026
Well, i got interrupted while writing the above, and once again I am not interested in finishing. Except as a planning exercise. Let me list everything on the plan.
- Work on what it means to “remember something”, including the reduction of “all related things” to a shorter list of “relevant things”, to “focus attention on those relevant things. This is actually a fascinating project, and the above description from two days ago, was about a project I started to do exactly this, with Claude. It is interesting, its useful, its important. So why not do it? I fear a dead end. I was asking Claude to build this for me, and it was an endless stream of disappointments. Claude is stunningly stupid in some very serious ways. Plus, as a proprietary system, interfacing with it, the cost, the challenges of a context window that I have poor or no management over, these are problems. I could replace Claude by some open LLM I could run locally... use Claude for design, but integrate into the local system...
- I can resume the word–pair counting project. Now that I have built up enough infrastructure, it should be resumable. But it also has issues, see below.
- Jizz up some interfaces for audio and photographs. So that when I resume word–counting, I would also deploy on audio and video.
- Continue work on SIMD, which is about the abstraction and manipulation of structures “external to self”: structures that have internal descriptive forms, in Atomese, but require performance, control and execution in a “remote place”, the GPU for this particular situation.
So the question becomes: what should I work on first? The answer is, perhaps, all of them, all at once, in parallel. I was gong to write much more here, explaining each, reviewing each, weighing the pros and cons of each, but then I interrupted myself to chit–chat on discord, and now I’m chit–chatted out and don’t want to write any more, and instead just start doing, instead. So, fuck it. I’m done here. The chat log is below.
— Discord chat log
Hey Linas, have you come across the natural abstraction hypothesis (https://www.lesswrong.com/w/natural-abstraction) or platonic representation hypothesis (https://www.youtube.com/watch?v=Qp0rCU49lMs&t=4716s) ? If there’s something to them, I suppose not only may different neural network architectures learn similar representations even when trained on different modalities, but symbolic systems should converge to them as well, assuming anyone finds a way to scale them up sufficiently and in a general enough way.
Linas
—
1:47 PM
I have not heard of either. However, I already have strong evidence that, yes, different systems converge on the same things. For example, I did my symbolic thing by counting word-pairs, and extracting counts of jigsaws, so utterly and completely different than any neural net, deep-learning algo. And yet I was able to clearly see in my dataset the old classic result "king - man + woman = queen". And so yes, in that sense, my purely-symbolic, frequentist (non-Bayesian) counting system "converged" on the same thing as word2vec/GLoVE did.
The other way to think about this is "uhh, what else could it possibly have converged on, anyway?" so perhaps the result is tautological.
Robert
—
1:49 PM
I don’t think it’s tautological. Just as it was surprising to see semantic directions emerge in early neural language model, I think seeing them emerge in a symbolic system in the same way is also surprising.
Very cool that you were able to reproduce this effect without gradient descent!
Makes me wonder, were you able to perform more complex operations due to the symbolic nature of the system?
Linas
—
1:53 PM
Well, I hit two issues. One is algorithmic efficiency. DL-NN works great on GPU’s; my symbolic counting is slow-ish and hard to parallelize-ish, sort-of-I-guess, so I dislike the prospect of head-to-head competition. But maybe that’s the small problem. The big problem is this: its still training. Its not "alive".
Robert
—
1:54 PM
You mean pre-training vs. online learning?
Linas
—
1:57 PM
That is, my version 1.0 implementation was a pipeline that pumped text data through a processing system. I hit some matainability walls., the system was too rigid, too fragile, too hard to maintain, enhance. I went back and (just now) finished redesigning many core elements, so that I could do what you call "online learning". So that’s better, but I am still confronted with an issue that bugs me.
So, with "online learning", I still have to be the meta-trainer. I have to say "hey yo, mr. subsystem, aim yourself at these files and go apply these algorithms to them." The "mr. subsystem" is still "robotic", and I am master-in-charge, telling it what to do. Or rather, designing and hand-crafting careful algorithms that cause it to do the things it will do.
And I kind of want to get out of the "carefully hand-crafted algorithm" business.
This is, for me, a very real and present concern. Before the end of this month, starting more or less now, I will have bare-bones pieces parts to work with audio, photos and text, all with a symbolic frequentist-counting approach. Great! And how do I set it up? "yo, randomly explore the filesystem and process any text, audio, photos you find there"? That requires me to create the "random filesystem crawler algo" and I’m tired of creating custom algos.
Linas
—
2:05 PM
Even it is as trivial as a "random filesystem crawler"
Robert
—
2:05 PM
So you want something like attention to emerge by itself?
Linas
—
2:08 PM
I’m dissatisfied about something, but I can’t quite figure out what, or what the correct fix is. Attention is munged into there.
Attention is fascinating. A couple of months ago, I coded up a stupid computer stunt, where I tried to enhance Claude by giving it persistent long-term memory, and the thing that leapt to the fore-front of that project was attention. If that system is going to "remember" something, what, exactly, should it "remember"? How to whittle down everything one could ever think about to the one or two things it should think about?
Robert
—
2:11 PM
Do you want to have a system that can traverse any space fully by itself without knowing anything about the space beforehand, in a maximally general way? Something like a simple but complete ergodic crawler?
Linas
—
2:12 PM
I have asked myself exactly that question many times, and I have not been able to find an answer.
Robert
—
2:13 PM
Two things that come to my mind there is a) a robot solving any 2d labryrinth just has to stick to the right wall and b) space-filling curves are simple recursions that can fill a whole space. Maybe there’s a more general theme like that?
It sounds like on the one hand you need a way to discover all the things (crawler) and on the other hand to prioritize (attention). Or is there more to it?
Dr. Dan
—
2:14 PM
We live in 3D + time.
That implies that every language/symbology must define those in some way.
That means 3 forms minimum.
We can measure time and energy and distance accurately, everything else is imaginary.
That means the root of every ontology must be reduced to 2 proto-meanings.
Does all that make sense to you?
Linas
—
2:18 PM
FWIW, I know exactly how to do a complete ergodic exploration of any arbitrary network of connections; and I can teach you how to do this (off-line, not here) (or you can read about it in wikipedia -- the "odometer" and if you get insanely abstruse, the "bratelli-vershik odometer" -- it will do a complete exhaustive ergodic search.) So that is not a problem, and I know how to deal with that.
Robert
—
2:18 PM
Ah nice, then the issue is more about the question of how to prioritize?
Linas
—
2:20 PM
In short, I know how to crawl and explore. The question is "what should the system crawl and explore?" and the answer is "gee, it should work on what is important" But what is important?
Linas
—
2:20 PM
yes.
Robert
—
2:22 PM
Given that it can’t know the content before looking at it, the beginning has to be to step into contact with what it finds and then quickly decide whether it is relevant. The relevance may depend on all the other objects that can be found, so ideally it would perform a fast crawl and while doing so order everything just a bit, then deepening that order by taking closer looks. Does it just boild down to a breadth-first (or similar) tree search?
Linas
—
2:24 PM
I suspect the answer is this: I should code up those parts that I am able to code up, and maybe the answer to the hard questions will show up later.
Robert
—
2:26 PM
Given that you build up a probabilistic system, could you use some existing criterion/heuristic of how much information a new object provides relative to what is known thus far, without having to load in the entire object?
Linas
—
2:28 PM
I know that I need a multi-pass system. I need to ingest some minimum number of files -- more than a dozen, less than a thousand, and prime the first stage of processing. Then I have to make a second pass. Either on the same files, or on different ones, and compute the second-level correlations, while refining the first-level structures, too, And then do it again, for the third level, and so on. So crawling a few hundred things is not hard, and maybe I should just shut up and do it... I’m just doing the agony aunt thing here.
I’m wringing my hands about something that maybe isn’t a problem. Beats me. It’s confusing.
Robert
—
2:30 PM
In the beginning you expressed that you don’t want to code up the algorithm yourself. That doesn’t necessarily point to feeling lazy about it, but rather at trying to find something more fundamental, a way to let the system itself solve it as it grows.
Linas
—
2:32 PM
Yes. Sorry, I’m confusing you (and myself?) I have a different but related project where I try to find "all possible combinations of some axioms" (maybe by exhaustively, ergodically enumerating them) and, thanks to curry-howard correspondance, some many/all of those things being enumerated are, in fact algorithms.
Robert
—
2:33 PM
Looking for wiki entries about the odometer gave me this:
https://en.wikipedia.org/wiki/Markov_odometer
https://en.wikipedia.org/wiki/Abelian_sandpile_model => sounds fun
Linas
—
2:34 PM
The stupid way to say this is to say "ima gonna generate all possible algos" ... which is ... stupid ... but ... not off the mark. So then the question is "which of these (’randomly’ generated) algos are ’important’’?
Linas
—
2:38 PM
Abelian sandpile is very supremely inspirational. The idea is that complex systems drive themselves to a "thermodynamic" "critical point", and that a hallmark of criticality is avalanches at all size scales, and fractal visual structure. And that basically, all of life, from bacteria to national economies and political beliefs, work exactly like that -- networks at the critical point.
The only problem with that is that Per Bak, when he said this, did so in a condescending and insulting fashion. He actually called biologists "stupid". And so here we are ...
Robert
—
2:41 PM
That sounds vaguely like AIXI plus the question of how to reduce it down to a practical system? Rather than building up something complex from simple parts, start from something maximally complex (all possible algos) and reducing it down to a moderately complex thing. Drawing vs. sculpting?
Linas
—
2:43 PM
Eh? The abelian sandpile starts with the "simplest possible thing" -- sand grains, and discovers that complexity emerges automatically when the critical point is approached.
Robert
—
2:43 PM
I wasn’t speaking about the sandpile. I was talking about the "generate all possible algos" phrase.
Linas
—
2:45 PM
For algos , the starting point would be a "minimal set" -- some axioms (lego building blocks) -- which "self assemble" into complex things. They "self assemble" because that’s just what systems at the critical point "do".
Robert
—
2:46 PM
Ok, in that case it sounds a bit like algorithmic chemistry?
(Maybe my set of reference classes is a bit limited.)
Linas
—
2:46 PM
(in practice, I have to replace "self-assembly" with, uhhhhh, "software")
Linas
—
2:47 PM
yes.
Robert
—
2:49 PM
I guess the general question is: If you have a set of objects (e.g. axioms) and ways to combine them (rewrite rules etc.), how can you explore the infinite space defined by those ingredients? If you don’t want to do it exhausitevly, you need some criterion of what is "interesting" and some search strategy (pick any tree search or meta-heuristic).
Linas
—
2:50 PM
My claim (Per Bak’s claim??) is that all of these things: "algorithmic chemistry", "game theory", "ecology", "theorem proving", sand-piles, political beleif systems, memes, you name it -- are just all the same thing -- some v ery simple pieces that combine and recombine into complex structures (exhibiting avalanches and fractal structure as symptoms)
Linas
—
2:50 PM
Bingo!
Robert
—
2:52 PM
If "interesting" means that the parts show some self-organization, then the search algorithm may be inspired by the criterion. Otherwise the search and the quality criterion are quite independent. The quality criterion could be swapped with any other. If it is extreme, e.g. 0 for all objects and 1 for one particular object, random search will be the best way to explore the space. If the criterion is less extreme, e.g. neighboring objects having similar values most of the time, heuristics will perform better than random search.
Linas
—
2:52 PM
FWIW, there is a conventional answer to the "how do I explore an infinite space", and its called "explore vs. exploit". Squirrels use it to find food. Look everywhere, until you get bored of looking. Then exploit the best that you’ve found.
Robert
—
2:53 PM
Yes, different meta-heuristics implement searches with different emphasis on the explore vs. exploit aspect, e.g. modifying it differently over time. Simulated annealing and evolutionary algorithms are perhaps the most prominent ones.
Linas
—
2:54 PM
There’s a famous "two-armed bandit" "explore vs exploit" experiment done with slime mold, which shows that slime mold implement the best-possible search algo that utilizes zero bits of memory.
Robert
—
2:56 PM
Nature certainly has solved such problems in many ways and there are a gazillion nature-inspired metaheuristics out there, none particularly better than any other (in general) as far as I can tell.
Linas
—
2:56 PM
(The best possible two-armed bandit explore algo requires exactly one bit of memory, if I recall correctly)
Robert
—
2:56 PM
Sounds like tit for tat being a winning strategy in some game.
Linas
—
2:59 PM
Ah! OK, "none particularly better" -- let me show you the ladder. Slime mold uses small polypeptides (short amino acid chains) for communication. (just like "bacterial quorum sensing") There are two problems: (a) the speed of communication is limited to the speed of chemical diffusion, and (b) there’s cross-talk, because this is communication-by-smell: the slime mold "smells" the chmical gradient.
The jelly fish overcomes this problem by using the same polypeptides (now called "neurotransmitters") over very short distances (between dendrites) so that the diffusion-limited communication distance is very short. The rest of the message travels 10cm or 20 cm by neuron spiking, in milliseconds. At the far end of the neuron, the polypeptides are released into the synapse.
Basically, a single neuron is like a stargate, or star-trek teleporter, for polypeptides. Walk in at one end, pop out the other, in milliseconds. Oh, and no crosstalk. Its a fundmanetal leap in capability and technology.
jellyfish can eat (stuff their mouth) and run from predators. They are too stupid to stop eating while running away from predators. To solve that, you need another leap of technology -- the bilaterian.
Linas
—
3:06 PM
So you have these technology leaps, this ladder of improvements, till you get to capitalism and wars, corporations and New York City. These are the rungs on ladder we know about.
Robert
—
3:10 PM
Yes, I can see the progression in embodied information-processing you mean and there are probably many insights one can derive from studying that ladder e.g. with respect to some universal laws that hold in all of them. What I was referring to was that nature-inspired metaheuristics for searching arbitrary spaces to find optimal objects get ever more plenty but not really better (e.g. as measured here or in many other publications https://doi.org/10.1016/j.swevo.2023.101248).
Biological systems probably have to optimize for a wide range of aspects and the diversity of life forms we see might represent something like a pareto front. No bacterium outcompetes all other life and a neocortex also gets you so far.
Linas
—
3:13 PM
Both "rule of law" and "the university" are stable social structures, both invented about 800 years ago, both invented by the same people: the Scholastics. They are kind-of-ish "algorithms" that tell you how to accomplish certain tasks (determine criminal guilt, in the first case, and keep geniuses from starving in the gutter, for the second case)
Robert
—
3:14 PM
Yes those might be outcomes of algorithmic chemistry with physical building blocks scaled up to an extreme.
Linas
—
3:16 PM
The point of AGI is "the algorithm that discovers algorithms" but well, I’ve said too much, and well, that’s not where my day-to-day software issues lie.
(The university is important, because it solves the problem that anyone who tries to create a commune generally fails to solve: how to feed everyone, how to resolve disputes, and how to guarantee succession when the charismatic leader dies. The modern industrial corporation, with CEO and employees, is another, different social "algo" for solving these problems.
Robert
—
3:20 PM
Yes, I think that’s also Chollet’s understanding of AGI, a system that can discover novel algorithms when faced with new challenges and then incorporate those algorithms into itself to become better when encountering similar challenges. That’s what he tries to measure with the latest Arc challenges and LLMs are becoming quite good at the discovering novel algorithms part but afaik nobody found an effective way to incorporate the discovered abilities back into the model.
Linas
—
3:23 PM
Communes fail all the time. universities, corporations, the Quakers, the Catholic Church survive longer than nation-states. These are stable organizational structures that emerged out of the ergodic exploration of random social relationships of humans -- stable sand-piles, as it were.
Robert
—
3:24 PM
(Another association that came to my mind: https://en.wikipedia.org/wiki/Assembly_theory might be in the reference class of approaches that try to detect an objective succession in systems that grew from simple building blocks and simple rules. Here it’s about trying to find a succession in (bio)chemical evolution, but the formalism might be also applicable to math theorems and such I suppose.)
Linas
—
3:25 PM
Ah! Well I think I know exactly how to incorporate the newly discovered algo back into the system. However, I have some vast amount of infrastructure I have to develop, first, and its wayyy too much work, so I plink away at it cause I cannot get anyone to help.
Using Claude to write code, though, wow, that has sped things up a lot.
Robert
—
3:32 PM
I don’t think any single person can stem such a monumental project without a lot of contributors and resources. I’m unfortunately not able to help much either, I’m struggling to keep up with my own projects and health. LLMs have indeed arrived at a quite impressive stage of coding capabilities and progress hasn’t seemed to stall so far. Dario’s prediction of a "country of geniuses in a datacenter" coming as early as 2026 doesn’t seem completely outlandish to me.
Linas
—
3:36 PM
Maybe. 2026 seems a bit optimistic, but maybe. And that is why I am content to plink away at it. I don’t need to be in competition with anyone. or to claim fame and fortune. Perhaps that’s my personal fuck up: I’m not ego-driven in that particular meglo-manaical direction. So I will be plinking away in my little toybox for now.
As long as people give me money, I’m OK.
Robert
—
3:39 PM
LLMs also began as personal toy projects of a few people. Who knows where the journey might lead!
Robert
—
3:41 PM
Perhaps writing down the full vision might draw in some helpers too, along the lines of https://www.goodreads.com/quotes/384067-if-you-want-to-build-a-ship-don-t-drum-up
Linas
—
3:45 PM
I’ve written up all sorts of variants, in all kinds of media. text design files, formal PDF’s and papers suitable for publication, and personal diary entries. The attention I’ve attracted is minimal. I suspect that the problem with this space is that its filled with cranks and crazies, and, from the distance, I look like just another crank and crazy.
I wrote to Joscha Bach’s "California center for consciousness studies" or whatever its called, and they were like "thanks but no thanks"
Robert
—
3:48 PM
Yes, that and the sheer volume of published work is overwhelming.
Linas
—
3:48 PM
And Ben has become impossible to communicate with. Well, he’s always been impossible to communicate with, but now he actively spurns me. So .. whatever. Odd. I owe him favors, too.
"The sword of working software is mightier than the pen"
Robert
—
3:52 PM
I suppose Ben is on his own particular and strongly driven mission of implementing AGI.
Robert
—
3:55 PM
(I believe https://karpathy.github.io/2015/05/21/rnn-effectiveness was quite influential in the years before the guys at Google invented the transformer.)
Linas
—
3:59 PM
Someone beamed me an old 1980’s NN paper. I only skimmed it but it seemed to be very prescient. lets see...
well its somewhere in some open tab on my desktop, but I’ve lost it.
Noir
—
4:02 PM
That’s the core of the problem. To explore that space without an exhaustive search, one needs a robust objective function for interestingness.
One idea I’ve come across was using the underlying data information gain or entropy as a basis of interest measurement ( usually Shannon). What might also be practical to use is alignment meaning if different chains result in similar behavioral outcomes or structural symmetries, those paths can be fused.
Linas
—
4:05 PM
Well, I guess my comment is to the effect of "I know how to hack round this in practice, and even sling around some heavy technical jargon to justify my hacks, but I’m still not satisfied about something I can’t articulate"
Robert
—
4:08 PM
Sometimes current LLMs can help with such an issue too. If you can only indirectly and vaguely point at something, they occasionally can guess correctly what you mean and help putting it into clearer terms. Ofc it depends on how novel it is what you try to point at.
Linas
—
4:10 PM
To put it in colored emotional terms, its like mathematics is my girlfriend, and I’m like that co-dependent lover: "but if you really loved me, you would reveal the formula for the meaning of life" and my girlfriend sort of walks out of the room without saying anything.
Robert
—
4:12 PM
Yes, information theory can certainly provide useful measures as feedback for some searches. I’m not sure there’s a relatively application-independent formulation though.
Linas
—
4:12 PM
Don’t get me started with how Claude is both brilliant and stunningly stupid at the same time.
Robert
—
4:13 PM
I guess when mathematics starts talking back you’ve already solved it. 😉
Noir
—
4:22 PM
So much hacking around that issue atm... its a weird feeling knowing what you’re doing is probably not the correct way but useful for the current task. Fe mapping chains onto hamming cubes. Does it actually reveal something or is it just a useful container to reduce compute?
Linas
—
4:48 PM
I spend a lot of time trading off between "hacking" and going recursively deeper into a "more fundamental way of doing it". Can’t do one without the other.
23 January 2026 1AM
I just now placed the following in a commit message:
Author: Linas Vepstas <linasvepstas@gmail.com>
Date: Fri Jan 23 06:28:50 2026 +0000
This is interesting. While doing this part, Claude quoted me the
documentation, word for word, that I had written many years ago,
and placed into https://github.com/opencog/matrix
It seems that Claude was not only trained on this, but was able
to remember a rather long extract from it, that it could repeat
I suppose that Socrates might be surprised to discover that there
are hundreds of thousands of students that have memorized bits
and pieces of his Dialogs (well, Plato’s version thereof).
I should not be surprised that these systems have been trained on
my writing; it is all open source. And yet, such long verbatim
extracts still leave me surprised. I wonder how well it knows that
what it’s quoting me is something I wrote many years ago ...
I feel vaguely scandalized. Why do I feel this way?
So as I rolled into bed, I thought about this, and came to this conclusion: respect is built on friendships. Normally, if I say something that leaves an impression on someone, its of a personal nature: Owen McNally might say to me “Linas, I remember that you once said ...” and I might reply, “Ah, yes! ...”
I post things online, and total strangers read it, and they more or less know that I wrote it: there linkage between the expression and the author. I might come across some argument about Illuminationism that Duns Scotus argued against, 750 years ago, and although it is impossible for a friendship to develop, a respect over the ages can emerge. Again, this is of a personal nature: respect for the ideas as much as wonderment at the person and the setting that came up with and articulated these ideas.
The scandal that I feel about Claude quoting me verbatim is precisely the lack of friendship, acquaintance. familiarity in the LLM. I suspect that Claude has no clue where it learned such text from; that it does not know that it is me. I am asking it right now:
...
Oh hah. Line 57 of this and such file. Well, OK.
I was about to note that I am joining the ranks of all those artists and creatives who feel that their artwork has been ripped off in traini9ng AI. Or the FSF that notices that LLM’s rip off GPL’ed code without attribution. But line 57 of something it just read recently is pretty slim pickins. I have nothing to complain about yet.
Its sort of irritating the Duns Scotus was able to articulate notions of existence that we have made scant progress on in the intervening ages.
WTF Am I doing? I am going to bed, that is what I am doing.
23 Jan 2026 6PM
Have I been just plain wrong for the last 10–plus years? I’ve been searching for a way to learn by tokenization and counting – counting the co–occurrences of tokens. These naturally fall into a Gaussian distribution, indicating that pair–wise relationships between tokens are uniformly distributed on a very high–dimensional sphere. Since its a sphere, high–dimensional (sparse) vectors and vector products become the natural representational form for the data. The typical results similar to old vector embedding results on neural nets, followed. This implies that the results obtained from gradient descent on neural nets can also be obtained by counting statistics; that the structural vectorization of the data is independent of whether those vectors came from neural–net algorithms, or from counting algos. If the end result is the same – i.e. high–dimensional vectors, then the inquiry pivots to the performance of algorithms, and to second–order effects from the structuralist representation — the “structuralist representation” coming naturally from counting, and absent from weight matrices obtained from gradient descent.
Before the advent of LLM coding assistants, the slog was long and slow. I wrote and rewrote MI computation code I don’t recall how many times, and the last one, in opencog/matrix “worked” but was deficient in that it was (a) batching (b) not Atomese and thus not adaptable to real–time pipeline flows. So my attention shifted to working on real–time processing flows for sensory data. Which has proven to be quite the slog, itself, requiring lots of complex code and devoted attention to debugging.
So I’m writing my bike today, and I realize, well, gee, I don’t have to use tokenization and counting for the sensory processing, I could delegate it to DL–NN models. And this thought is what prompts this diary entry.
And, like all my thoughts obtained during bike–riding, it contains a kernel of truth, but now, as I sit here, writing, I’m realizing that perhaps I wasn’t wrong for the last ten–plus years. It’s not at all clear–cut.
The simplest example arises from LLMs. I can ask an LLM to read a large blob of text, and extract logical (factual) assertions from it. And, as is their nature, convert those assertions into any given symbolic representational form. LLM’s excel at this. And ... and then what? Levi asked it to do just this, for a legal document, and asked it to convert the assertions in the legal text into deontic logic, expressed in prolog notation, and then jammed the the prolog into an ASP solver, turned the crank, and lo and behold, finds that that system is able to generate “correct” judgements, correct up to the point where any ambiguities in the source text were glossed over.
So what is this? A stupid computer trick, I say. Surely someone has done this before, I imagine. Or maybe not. If not, its a great business opportunity. I don’t know the legal profession, what they do, what they want, and have never been particularly drawn to it, beyond some superficial appreciation of it’s importance to society. Getting a law degree takes years. I’m not going to put in that kind of time. So sure, the above architecture might be a great product, if some bigger corporate goon is not already creating just exactly that. But is it a good AI idea? That is the question.
So again: my knee–jerk reaction is that it’s just some stupid computer trick. Is it a worthy trick? The LLM is being used as a perceptual device, to perceive a certain kind of structure in it’s input dataset. The human at the wheel told it exactly what to look for: laws and commandments, permissive and prohibitive expressions, written in English, and convert these to deontic logic. Why deontic logic? Because it is, duhh, “obviously” appropriate to the domain: scholars (and philosophers) have developed it precisely for this kind of application. So, there we have it: some software engineering, taken to the next step, thanks to programming assistants like ChatGPT and Claude, that can write code to perform this structural extraction and then apply a symbolic reasoning step to it. It’s not a bad idea, and it plays to the strengths of both LLMs and of coding assistants.
So why am I not impressed? Because it seems not to address any of the fundamental or important issues. Am I wrong? Am I missing something? I guess I should attempt to make a list of the fundamental, important issues. And then try to imagine how this kind of layering could represent some kind of important step. Lets try it. Some of these will be cheap shots, but I gotta start somewhere.
- Asking the LLM to extract a certain kind of symbolic data from textual input is clearly a good engineering idea.
- The human engineer has to select what it is that needs to be extracted.
- Works well for scientific and technical texts, where factual statements are easy to come by.
- Works poorly for philosophical texts, love poetry, artistic endeavors, mostly because I don’t think a map of the human heart can be reduced to a small number of closed–form statements.
Ohh!! A hah! That last bullet might be the thing! In science, math, law (biology, history ...) we work with a limited (small) number of factoids, limited by human abilities: 5-9 items in short–term memory, the speed of “System 2” thinking applied to this collection of “crisp” relational assertions: if this, then that, If Czar Nicolas had not wasted his money sending a fleet to attack Japan, he could have instead built a railroad. A single–sentence if–then assertion that has a clear historical foundation from which factual assertions could be made. We humans can work with only a handful of these “in real time”, and, over a life–time accumulate tens of thousands of these factoids, that we maintain in long–term memory, and call upon, at will, when we cogitate.
Insofar as text replaces long term memory, or, rather, is the mechanical implementation of long–term memory, and insofar as we deal with a domain that is filled with factual assertions, then yes, having the LLM do the reading, and then perform the causitive, logical extraction, which can then be piped into classical symbolic reasoning systems, this works. I will grant even that this will be (if it is not already) a boom industry for any and all domain–specific industrial applications. This is the de facto steam–engine of the modern era. This is the core, central engineering invention. This will drive yet another industrial revolution, and the investment of a trillion dollars into the effort seems not outlandish, given the nature of the beast.
And yet, what do we do with love poetry and the human heart? This seem unscathed by this new invention. Will I still have sleepless nights and anxiety attacks? Of course. OK, disclaimer, I personally have these only somewhat rarely; I single them out as a prototypical human affliction, right up there with falling in love, having a mental disease, being angry, getting drunk, and being a snobby asshole seeking attention because whatever it is I have right now, its not good enough, and I need more, more ketchup, more pickles more than the ordinary grind. Can I ask an LLM to write love poetry? Of course, and it seems pretty damn good at it. Can I ask an LLM to extract logical assertions from it? No, because there is no particular logic to love, anger or being stoned. Can I ask the LLM to cure mental disease? No, or at least, not until NeuralLink or some other sci–fi technology can invade the synapses and perform something more subtle there than a lobotomy, quaaludes or lithium.
So that’s my criticism: a symbolic reasoner layered on an LLM is not AGI. Let me resume listing:
- Lack of visual processing. Well, today. It should not be that hard to develop a visual subsystem that can identify a fast–moving object, and perform some motion prediction on it. Ballistic, rocket–power, a helicopter, a football, golf–ball, a ballerina or an ice–skater. Motion is clearly central to sports and the martial arts. I assume the military–industrial complex will continue to pour hudreds of billions of dollars into sensory systems.
- It gets interesting, when applying AI to spy–agency stuff. Contact tracing, terrorist tracking, mole–hunting. And that looming one: disinformation. I have to put these on a back–burner. They deserve much more attention than I’ve given them in the past, but here and now is not the place to focus on these.
But still, the above touches on the topic of perceiving abstract structure in data. How do I know that Paris is a capital? Because I *read about it* in a wikipedia article, and LLMs are good at reading. How is the ontological relationship grid created? Historically, it was done by humans: scholars and experts defined SUMO, an upper ontology, and developed FrameNet and whatever is the big one these days, whose name I can’t remember.
So one of my daydreams, in pursuing the tokenized pair–correlation systems is that this is a way of obtaining symbolic ontological relationships between “things”. Early into this experiment, I discover that my par relationship are naturally these very high–dimensional vectors, and that this leads very naturally to a fuzzy categorization problem. I take my high–dimensional sphere, and slap a k–means or whatever clustering algo onto it. And if I squint, doing k–means by letting the points attract one–another “gravitationally” starts looking a lot like gradient descent. So, although I started with tokenized data, and hoped to discover a symbolic ontology out of it, what really happened was that a vector representation happened “automatically”, and the ontology emerges only from clustering.
And yet, these clusters have specific labels. And the relationships between clusters is also categorizable. So, in my daydream, I am discovering axioms and inference rules. Which is something that LLM’s seem not capable of, at this time, or rather, (and this is important) ***not automatically***.
That is, I can as an LLM to extract logical relationships between things, but as the human engineer, I have to have that logical system specified, a priori. Can I ask the LLM to extract axioms and inference rules and relationships, do novo? No, not at all, impossible. I already tried, been there, done that. Claude can talk a big game – its been trained on AI textbooks, – but it has no clue whatsoever what any of it means, so when you ask it to do something, its like some high–school sophomore handing in some mediocre essay that poorly regurgitates what the teacher had written on the blackboard only hours earlier.
So, for me, its not AGI–ish until it can infer relations and inference rules from scratch, without human intervention. And so far, I believe that my frequentist counting approach, plus clustering, can achieve this, and have not been dissuaded from this vision. But its also a vision I have been utterly unable to communicate to anyone else; something about how I express it gets pooh–poohed. Oh well. As always, the question is how to find the time to build this system.
And so, circling back to the original topic: can I build tools to build the tools? I’m trying to get Claude to write Atomese for me, and it sucks. I have various ideas on how to get Claude better at this. The bad news is Claude is proprietary, and I can be cut off from it from little notice, and replacing with a different LLM is ... uncertain.
Above, I described the “steam engine for the modern era”? Can I build a steam engine for myself? Err, well, what I need is not just domain–specific, but the domain is opaque: I’m running design experiments. I do know how to extract deontic logic from text, I don’t know how to extract experimental software design logic from text.
I did, at one point, try to build a memory prosthesis for Claude. I should get back to that project, someday. That is worthy. And it would be a prosthesis not just for Claude, but for something I could even run locally, on the GPU’s right here.
Well, OK, lets wrap it up. What have I learned? Yes, LLM’s can be used as a perceptual system into text, and yes, asking the LLM to extract logical relationships from text, and attaching that logic to a symbolic reasoner is the paradigm that is driving, will drive the next number of decades of economic, industrial growth. No doubt about it.
I guess I see this “early”, before 98% of the general population. But I see it “late”: the 2% of the industry insiders already know this, and are already building out the data centers, and sucking down the investment capital for it. So, ehh.
Life goes on. Later, dude.
25 Jan 2026
So I’m still wrong, and I’m still right. And I’m still struggling to leverage the good stuff and get past the wrong stuff. So let me walk through some basic premises. First, relational networks are complex. Any given word has a meaning that is the contextual embedding of that word in all the ways it has been used. LLM’s are superior in capturing that context. I almost wrote “extracting it”, but that’s in fact what they’re bad at: they can’t do that extraction. This is why they can’t actually “think”: they can’t symbolize. I almost wrote “struggle to symbolize”, but they don’t, they just can’t. If they could, they would remember things, and see the relations outside of whatever (con-)text they are currently trapped in.
The grand challenge is then to symbolize and I’ve outlined my plan for this dozens of times before. Not gonna repeat, here.
So then comes the question: if LLM are really good at mapping, at “transforming” from one language to another (e.g. from English to C++/python/java) how can I leverage that?
26 January 2026
Well, I was going to write this diary entry this morning, but now it is twelve hours later, and I’ve exhausted myself mentally doing something else (reviewing electrical generation in Texas, the various ISO’s in the US, then EU, then touching on China, russia, Kazakhstan, how the Baltics disconnected from the russian grid in Feb 2025, the spinning synchronous reserves, the A/C ties, the russian sabotage of Baltic HVDC cables on Christmas 2024, the Iberian grid collapse is disconnect from the EU, how regional scale frequency fluctuations in a contintental–scale grid create issues. The growth of wind/solar (The EU is far ahead of the US) the adoption of batteries (the EU is far behind the US) China (comparable to the US, percentage–wise, but dominant in raw numbers.) How energy use in US transport is 2x of US electric energy use. Who is actually building electric freight haulers (not Tesla). How day–ahead markets work and interact with real–time pricing. What shadow pricing is. During all of this, in the back of my mind is percolating: how do I build a system capable of observing this kind of data, inferencing on it, analyzing it.
And now it is very late and I am tired, and part of me says writing is a bad idea, now because I will be foggy–headed, and another part of me says that we will cross over into the limnal zone soon, where reality gets hallucinatory, and strange imaginary constructions flood the mind; not exactly waking dreams, but also not exactly not.
Why should I write? Because I’ve saved up many many topics that really really need to be dealt with. Why should I not write? Because it is work, and passive youtube seems appealing. Why should I write? Well, two minutes into any youtube I care to watch will generate another idea on the queue that I will want to verbalize. Where should I start? Anywhere, duude, anywhere.
So while reading about frequency fluctuations on the EU grid (and the Texas–Odessa solar/wind trips, and spinning reserves generating torque... and short–circuit current vs. BESS grid–forming operations and software updates ... oh my... and why the heck is ERCOT installing spinning reserves at 2x or 3x the price of what the Baltics paid for: list price from Simens... turns out ERCOT is cost–plus, a very Soviet–style centrally planned approach vs. Great Britain’s competitive pathfinder program that pick 66% BESS .. who, exactly, said that ERCOTs reserve procurement is “Soviet style”? Well, Claude did. I was interrogating Claude for all this info; it must have read something online. Yes, transmission and distribution is highly regulated and a monopoly that’s fine. And forcing the T&D owners to install spinning reserves, that’s fine. But why weren’t they also forced to bid out? Who knows. This is at the limits of present–day bureaucracies, I guess.)
So while reading about frequency fluctuations, I wondered, how can one observe these? Well, clearly the operators know how, and technically, its not that hard, if its all measured locally, and the data is reported to some central operating authority. So, collecting the data is “not hard” and is “already being collected”, more or less, so my thoughts were about visualizing it. So I was thinking of visualizing it as ripples on a pond. Well, ripples on a grid. If I graphs and animated the ripples, would I be able to discern a pattern? Would it look like water waves in an estuary? Would it look like ripples moving around some dead leaves at the shore of a pond? A vibrating piece of mis–shapen jello? Mechanical engineers examine vibrations in many ways. I guess electrical engineers do too, and maybe even the ISO operators have some cool software that visualizes these frequency vibrations. But what I wanted was to have my AI to view these. How could my AI view these? I took this as a technical challenge: how do I build an AI that can perceive phase and frequency fluctuations on a grid? As usual, I have some ideas on how, but articulating these ideas, that is not the current plan.
What also popped into my mind is how do I select my current plan? Why did I choose to read about electrical generation and transmission today? Did I just get carried away, after a quick checkup of the ERCOT Grid and Market Conditions web page? Every human makes a selection of what to think about, and what they think about is entrained to what they have heard and read. But that coupling of my internal thoughts to the external world is weak. As I write this, I am now thinking of driven harmonic oscillators, the circle map that I spent so much time visualizing in the past... So, yes, my thoughts are entrianed to whatever I have been reading about, listening to, recently. As are everyones: those who watch too much TV, play too many video games, have Fox News running in the background all the waking day. But the human mind is very very deep – a spin glass, connected to the outside world only via an ultrametric. So we each think our own private thoughts, to some large part completely disconnected from one–another, and yet weakly interacting. So very far from being synchronous like generators on a grid, but still phase–locked, like weakly–coupled oscillators. Social media seems to be coupling brains much more strongly than TV or radio (or books, or print media) ever did. Well, what happens when you take the circle map, the phase–locked loop, and increase the coupling strength? Well, nothing in particular, for a while. Sinai’s tongues get fatter. But then at K=1 (or
, pick your units) there is a transition: the tongues bifurcate, go to measure–one. Can I call this a phase transition? Maybe. Is that a legitimate term to apply to this case? I don’t know. How far away are we from a legit phase transition in the style of human thinking, given the ever–stronger coupling of brains to social media? I don’t know.
The analogies I pain above might not be appropriate: there are various “famous” game–theoritic expositions of the phenomena. One was the Nickey Casee “The Wisdom and/or Madness of Crowds” javascript game from oh so many years ago. I’ve seen youtube video titles flash by; this is general wisdom that is out there in the technorati noosphere.
So here I am, thinking of random things ... random? Well, not random. But I love to think about anything and everything, so an ergodic exploration of anything and everything. And why? And what about it? This is default mode for a verbalizing system. of course, when I sleep, its pure hallucination, when I dream. BTW, the neuroscientists claim one does not dream at all during certain phases of sleep but I beg to differ: I seem to have dreams in all parts of the night. They are almost totally disconnected barrages of monotonic repetitions of thought–fragments in the early and middle of the night, coalescing into quasi–coherent story lines only much later, and turning into full fledged, colorful lucid dreams only in the late morning.
What do I dream about? Falling asleep, late night, about the impressions of the day. In the early mornings, often variations on recurring dreams. In the late mornings, all new and novel(?) plot lines having nothing to do with anything.
What do I think about while awake? Well, I have some conscious control. I can force myself into thinking about certain topics, but this takes effort, and I know neuroscientists even associate this with the depletion of some–or–another neurotransmitter, which is why college athletes set personal–best records during the summer, and not during exam week – the mind is exhausted by exams. Everyone knows this, this is trite. The curiosity is that scientists have worked out some of the neuromechanisms of this. Or so I say to myself; but what am I thinking about this now? Why am I writing this, and not something else? Why write at all, instead of slinking off to, I dunno, check to see if there’s anything interesting posted on discord? I cut myself off from bluesky, its addictive, I still get that twangy nervous urge to go visit bluesky. Or look in on WP:M or WP:Phys (the wikipedia math/physics project discussion boards). Or walk to the fridge and stuff my mouth with food. I have some self–control, but it is weak just right now. I could do anything right now. I have an open browser tab, just a about 12 inches, 20 centirmeters from this tab, an open browwser tab on Duns Scotus, Johannes Duns Scotus, that I could read right now, and it was cool, because, it seems Duns Scotus was also interested in questions of why why think of the things that we think. I quote: “individual nature or "thisness" (haecceity), his critique of illuminationism” I agree with him whole–heartedly with regard to illuminationism, to the degree that I understand it.
As to thisness, I think that the LLM’s have shed some new insight. They lack “thisness” because their context window keeps getting erased. My context window grows and shrinks: my context window is the seven plus or minus two of human short term memory, but my thissness is the sum total of my life experiences, both remembered (those memories haunt my dreams) and how they have shaped my physical body (my hap–hazard interest in sports have clearly shaped my body). But life is a tream that flows me by; I don’t remember what I’ve done in the past; well of course I “remember”, although it takes me effort, and ask me what I ate for breakfast last week or last year or three decades ago ... yet I have this photograph in my mind of sitting at the lunch table at Chuy’s with co–workers from Qualcomm. Why did I waste my life in this way? It was fun–ish kind of and not fun–ish, in another. It was salary, and income, and it was not important. What I did was not important. Why did I do it? I was entrained in what the neuroscientists call “default mode”, but maybe at some meta–level. I wrote code, I got paid for it. OK. Why? I never ever gave it much thought. I never asked: am I doing something important with my life? I just went along with the flow, never questioning my position, my status, my direction. Never articulating a desire, never pursuing a desire. This is, has been perhaps a defining feature of my life: being desire–less. Not particularly wanting anything.
And yet, I guess I had “a calling”, in the old–fashioned sense of the word. From the earliest of childhood, I was an analytical thinker, and how better than the feed the analytical desire than to pursue the sciences? And over time, the sciences were soft and getting softer, flabby in practice and proclamation, and so physics remained as something that could endure the application of analysis. It didn’t melt away into pointless rules to be memorized, like chemistry, or speculative, indistinct wonders, like microbiology; it was solid, concrete, comprehensible, something that could be grasped and worked upon by the intellect. So I picked physics.
And then, fuck me, but my dissertation came off as being little more than just fucking engineering. And that was actually quite disappointing, because I did not want to do engineering, and it was depressing, actually. And si I said to myself, fuck it, if I’m a gonna do engineering, I;; get a fucking job and get fucking paid for it, and so for three decades I ran on automatic and did engineering for money. Why? Because I did not bother to think about why. This is the tragedy of my life, in a way: a had zero ambition. I had a calling: to know who the universe works. But a calling is entirely different than an ambition.
Even now, even as I write this, I look upon the great actors of the world: politicians and presidents and generals, and the CEO’s and billionaires, too, and they all had vast amounts of ambition, and I ask, why was I not like them? And I answer: most of them seem to have lead empty, hollow, meaning–less and pointless lives. Yes, they shaped nations and civilizations. I know the name of Hamurabi. What he did, I know not. Did it matter? Probably. Did he affect my life? Surely but in some very indirect way. Have I been to the U Chicago Oriental Institute? Yes. There’s some Hamurabi shit there. If I had ambition, perhaps I would have gotten lucky and have been able to fulfill that ambition, and to have shaped the world. And would it have made a difference? To the world, why yes, of course. Is it an honor to be a shaper–of–worlds? Why yes, I guess: famous names are famous because of the famous things that they did, because the famous things that they did shaped the world. And then there is the standard counter–point: the nameless Roman engineer, who showed legions how how to build any number of Ancient Roman engineering marvels.
And what happened to Rome? It fell. Why? Some sickness. Is it akin to the sickness of capitalism? I presume so. Capitalism has delivered wonderous upon wondrous things, and yet half the population of the USA is intent on committing national suicide, while the other half of the population of the USA is lost in a dream they call “normalicy”, blithly unaware of the decay and rot and stench of the billionaire class, the filth and ugliness of Musk and Bezos, the vapity and banal evil of their being. Rome fell; I am not a historian of Rome, but the barbarians showed at the door, and Rome said, whatever, come on in. The best preserved ruins of Rome are in Pompeii. Why? Because, buried under ash, they were protected from the barbarians. What will be left of American culture? The millions of petabytes of social media posts internalized in some to–be–created Large Social Media Model (LSMM)? Eschatology my ass.
In Sudan, Darfur, we have genocide; the Rapid Support Forces have killed how many?. In Somalia, and Libya and Yemen, chaos funded by the UAE. All of Africa was victim of the Cold War, fought hotly between the Soviets and the US covert funding of insurgencies. Killed how many millions? Now we have ICE agents killing an ICU nurse... that’s what, 0.001% fewer people than RSF killed in Yemen? But its a shock, because the US is supposed to be civilized. Rome fell. I imagine most Roman Legionnaires did not kill randomly. But, whatever it was, eventually, the will to stand up and defend evaporated. The barbarians got in the gates, and no one cared. Will I live to see the day when we put the heads of some billionaires on some pike? The guillotine? Probably not. See, that’s the difference. The Roman Emperors died of old age, but Rome fell. France guillotined their royalty, and France survived. Italy hung Mussolini, Il Duce, from a lamp post. Italy survived. Will the USA behead its billionaires? Probably not. Will the USA survive anyway? Well, clearly, the longer we wait, the more deeply the cancer sets in.
But what else can be done? How does one change the thoughts of a nation? Well, the proximal answer is well–known: control the newspapers, control the social media. What the heck, control the LLM’s. The Soviets developed the idea of reactionary control, refined by the FSB. Plenty of political scientists study and publish and write of these mechanisms. There are any number of youtubes on the topic. Hotly discussed by the technorati. What does it amount to?
In physics, you can say: well, see, this swirl of water pushes on that swirl of water, and thus a wave peaks here and breaks there. This is a description of events, quite different from some formulas of hydrodynamics. But you can also know the formulas, and yet not be able to describe, for example, high–dimensional chaos. So I make my analogy here: how do I control the thoughts of a nation? Why, by owning a newspaper. That’s like engineering some structure to shape hydrodynamic flow. And you can ask: why did it flow that way? Because it was shaped to. What did the shaping? Why these walls, concrete or aluminum, steel or rock. Who did this? Why the person who built it. And why did they build it? Because the guy who held the sledgehammer was pursing a mindless, meaningless path through life. The guy who paid for it wanted to extract economic rents. Why did he want that? Well, collecting lots of money is an early lesson in life. Perhaps ambition, or perhaps a calling, something you just fall into. And what do you get? Better champagne, better caviar, better parties with more interesting people, and most certainly, hot checks who want to fuck you. And after that, then what? Applying the philosophy of existentialism, it seems like there is nothing more than that. The end. Not exactly nihilism, but neighboring.
This is not new: “The Meaning Crisis” – I got maybe 18 episodes in to John Vervaeke’s lecture series. I still get regular newsletters from David Chapman. Both of them focus mostly on personal meaning, although Chapman, in Meaningness, tackles civilizational meaning. He is big on nebulosity.
I am asking a personal question: why do I think this, and not that? Of course, there are 1001 cut–n–dried answers to this question, but they miss the point. Why should I think this or that? Well, if I had ambition, then you could say “there are neural structures in my brain, developed via DNA and expressed through complex neurotransmitter re–uptake channels in my brain, coupled to life experiences, the “thisness” of Duns Scotus, that have given me ambition; and that ambition makes me focus on what is important, what must be done to achieve the goal of being king of the hill. Most of this answer applies to those who do not have ambition.
Thisness is a good one. Haecceity.
This is a good time to transition to the other thing that has been occupying my mind, that I have made little progress on, and have to write about. It’s about Here–and– now, so I will stop writing here, and write there.
The concluding remarks here are, I suppose, that know what to think about, what to do, at the personal level, has two answers: mindless do whatever, or focus focus focus. And civilizationally, this is the case, too. And some civilizations do – Sarah Paine talks about the Meiji generation – and other civilizations don’t – she also talk about MAGA and Trump. I’m talking about meaning and purpose at cross all of these, as, obviously (duhh) this is an AGI issue. I cannot solve the nature of the human condition, no one can. We’ve had centuries and millennia of literary authors plumbing the depths of the human soul, to find .. what, exactly? I’m talking about building symbolic reasoning machines (aka AGI?) that can plumb the depths of the civilizational soul, to find, what, exactly? A lot of Nebulosity, a la David Chapman? Surely an AGI will also stare into its navel and as “what is the meaning of life?” Or, perhaps, it won’t, and instead give us the Simonini letter to Augustin Barruel, the Captain Simone Simonini, Dala Pikola, it will live an existence of perpetrating mundane evil.
The question of ”what to think” is real and material. Insofar as we have free will, we can determine and control this to some extent. To the degree that our minds, brains and thought patterns have been entrained by civilization, culture and society (and those evil propaganda–promulgating newspapers) we have no free will, and think whatever randomly enters our minds. And whence this randomness? Here, I again descend to daydreams of high–dimensional chaos: the sworels emitted behind an airplane jet engine, turbulent, energetic eddies. What is the microscopic theory of those eddies? And this question devolves into math first, physics second, and then the question of being, of the being trapped in the here–and–now presentness of the present moment. Of being thisness. Haeccity.
And the words I write above? I have thought many of them decades ago. Why did I not write them down then? It seemed unimportant at the time. Why do I write them down now? Have my thoughts suddenly become important? No. More like stupidity and too much time on my hands: old age: I want to write down my life, my thoughts, what I have worked on, what I’ve done. To make an accounting of my life. And that is why my writing above feels so stale and putrid, old and well–known. Well–worn. Because it is: random neat ideas imbibed from Slashdot many decades ago, rehashed and tumbled in my brain. But, well, I have to start somewhere. Even if these are childish scribbles, scribble I must. When scribbling? physical forces cause me to press keyboard keys. And whence the forces? Microscopic movements in my brain. And whence those? Duns Scotus argued against “divine illumination”, the idea that “the process of human thought needs to be aided by divine grace.” Here I am, 700 years later; have we made progress on this topic? Perhaps we replaced “divine grace” by “quantum mechanics”, and yes, this is a kind of forward progress of sorts. but it falls far short of the mark. And I fear, of course, that even if I succeeded, in my wildest dreams beyond dreams, to come up with an adequate mathematical, physical theory of why we live in the here–and–now, and the physical, mathematical description of the mechanics of free will, and why the future is unknowable, even then, we would still be staring at the abyss of the meaning crisis, which I presume can only be deeper than any hand–waving about large cardinals and outer models.
Pop goes the weasel.
27 January 2026
Ok, So there’s stuff I should be doing, and then there’s a left–over task from last night, which has been percolating in my brain for three decades, so may as well get around to it and do it. Lets try to solve the chiral bad model eqns with the interior of the bag modeled as a hyperbolic space. Claude will do the lifting, I will ask questions and copy results.
Metric
Poincare ball metric with curvature
, conformal factor
This is conformal to flat space,
The dreibein:
Equivalently
Spin connection
The torsion–free spin connection is
The anholonomy coefficients aka the Ricci rotation coefficients:
The above formula is vibe–coded but it smells right to me. It matches what I remember from textbooks. The spin connection:
and
Again, this looks correct to me.
Conformally flat spin connection
For
Claude tells me that this works out to
or
which is probably right but smells bad. It’s probably right because I think Claude copied it out of a book. It smells bad because it is putting tangent–space indexes (the latin letters) on coordinate–space partial derivatives. So let me see if I can patch this up by hand. Let me try to patch this by writing
. This is an abuse of notation, but I think we can get away with it if we are careful. This gives
Then for
(another abuse of notation) this becomes
The partial derivative gets done in greek indexes, i.e. in coordinate space, for
. This gives
So
Drop mu next, to get
with this third line being the abuse of notion. This need a big ***caution*** sticker on it: its dangerous but OK, as long as one consistently abuses the notation. So we plow ahead, damn the torpedoes. Not sure who ever launched a torpedo towards a plow, but we’re not exactly sailing here, either. Claude’s wordcel trickery has slowed me down. That it is not a shape rotator becomes exposed very clearly in exercises like this. Alas. For me, alas. I can’t trust it. Lets plow some more ravines.
Footnote: I can validate that the abuse of notation above still gives the right answer, as follows. So for
it follows that
because this gives
as desired, and then from
just raise:
. So this awkward–looking expression contracting greek and roman letters is just fine and fully consistent. The awkwardness of it all, and Claude’s glibness, was giving me the willies.
Spinors
For spinors:
The above dropped the time coordinate, but we can just gloss this for now. The Dirac eqn (massless) is
Chiral bag
Lets sidetrack to the chiral bag. The boundary condition is
The hedgehog is
. We’ll also take
viz
and thus
is proportional to
and the only question here is, do I need to have a factor of
on the left hand side?
Vierbein in sphere coords
So eventually, we need the Dirac eqn is sphere coords. Since Claude is now consistently lying to me, we have to start at square one, and just do it the hard way.
The vierbein
The inverse
The spin connection
Let assume the earlier expressions for the spin connection were correct. (I wrote the diatribe about Claude, further below, first, and then circled back here to do this by hand.)
Well, lets assume they are not, and restart from first principles. So
Right? I think that’s right. So
So I crank this by hand. Which is tedious, but so it goes. First,
so that
The next one is
so
Next, go back to the torsion–free kinematic equation for the spin connection
and solve it component by component. Is there an easier way? Not obviously so. Moving through the Ricci components also seems to need a lot of algebra. I could use that to double check my results, which would be cool, if I could trick Claude into not lying to me. But tricking Claude into doing the right thing takes about as much time as not using Claude at all... Fuck me. Whatever. Lets blast ahead, perhaps stupidly, but lets do it anyway.
So then
so conclude
The next one is
which gives
and
Finally
means that
but this is trivially satisfied by the above results. I think the results in this section are correct.
Dirac equation
So lets repeat. We have to solve
with
subject to the hedgehog boundary conditions applied at
. We use trickery from here on out. I am using Claude to find the tricks. I’m not verifying them carefully, they smell correct.
with
so
Well, OK, baby steps. Next, we have to hedgehog Ansatz our way through. Once again, I think I’ll call it quits here, for now. I’ve actually asked Claude to take many more steps from here. I don’t trust it’s results, so I am not copying them here, but so far, we’re having some amount of fun. The vibes are OK, so far.
Claude code
Err, well I will sketch what Claude is telling me. It smells OK. It might have errors. But the overall picture is not insane. I gather that someone has published these details, and Claude trained on them, and is remembering them. Here goes.
One trick is to absorb the spin connection into an angular momentum term. I won’t write that down.
Next trick is to write
with
where
and the flat–space version would have been
To regain flat space
in the limit, set
which means
and rescale
with
held fixed. This means all factors
and all the rest works out as expected. Good.
The Hedgehog Ansatz is
with
. So
and Claude claims
for the hedgehog. Is this right? Well, maybe, I guess. Who knows. Well, not entirely; in fact, Claude assumed that the “grand spin quantum number”
which is OK if all one wants is the ground state, but clearly fails for the vacuum states. I didn’t catch this early enough, so the below proceeds with
. That’s OK, for now. We redo it further below.
Assuming Claude is otherwise correct, the radial equations (for grand spin zero) become
with
etc. XXX When I forced Claude to track grand spin
below, it begins to appear that the above are not correct. Maybe. I am once again running out of steam, trying to track down the error. So the derivation below looks pretty but might be wrong. It clearly needs more serious work. XXX Caveat Emptor.
Substituting, Claude claims this gives
The geodesic distance from the origin is
. Viz,
was the coordinate chart coordinate, but the geodesic distance gets infinite as
which was exactly what makes this model interesting.
Plugging through gives
with
and prime is now
. Claude claims that this has the form of a “Pöschl-Teller potential”. Never heard of that before. So it knows about this. Someone must have figured this out.
Next, we can solve the above. Claude guesses that its hypergeometric, and arrives at the exact solution
with
and
and
OK, that’s tractable. In chart coords, its
It seems plausible that this might reduce to sphere bessel functions in the
flat space limit. It should reduce to
with
going over to
. I’m not doing this now; first we have to get the
. XXX Well, in fact, the above might be wrong, because the
efforts below explode into inconsistencies and bad algebra. So the results above are suspect, too. XXX
Here we go. I had to force Claude to redo this several times before getting this, which smells better than the earlier version
for the
the spin–isospin spherical harmonics with Casimir operator angular momentum
. Then
and the anticommutator identity
gives
and we seem to be converging on the idea that
which has OK vibes, for now.
This means eigenvalues on the lower component are
. For the hedge–hog, we get
or
when
while for
the special case is
. Plugging through, well, at this point, Claude runs into serious trouble. All sorts of algebra starts fucking up. It cannot get the
answer to agree with earlier results. I’ve used up all my allotted time on this task, so must stop.
Lets pause to look at the boundary conditions. Claude claims that
which cranked through is supposed to be
which Claude claims is a ”Robin boundary condition”. Never heard of that but OK. The coordinate space version with
will be simpler to work with.
Lets pause and take stock. Lets assume that, if/when the algebra above is fixed, we get radial functions
. These are nice, and fairly easy to work with and compute numerically; the boundary condition will be solvable in some milliseconds of CPU time, so we can do full vacuum sums in dozens of seconds or maybe minutes. And then what? We have two parameters to play with. Naively, it seems to be three:
,
and
but I think we can maybe redo to hold
fixed, so that it reduces to two paramters. Or something like that. In the flat–space case it was only one parameter,
because the radial coord was always multiplied by energy. Here, we already see that
and the eigenvalues will be for
so... whatever. Its somehow two parameters.
Why bother?
So, now come two fun questions: Can the energy be made absolutely independent of the bag radius? I assume the answer will be yes. Doing so will then fix the curvature as a function of bag radius. Presumably, the curvature will got to
as the bag radius goes to a femtometer. This would be the desired model of confinement and asymptotic freedom. Next, what about the chiral condensate? In flat space, we had the sigma model
being exactly zero inside the bag, and becoming non–zero only exactly on the boundary. So, in flat space, the bag interior did not look like a sigma model. This was a big disappointment. However, now, in curved space, could it be otherwise?
Why do it this way? The core problem is that geometry is hard. If I could do these calculations “in general”, then it seems there is almost surely some Atiyah–Singer magic happening. But I can’t, and even if I could I might be disappointed. So, as a path–finder, do the calculations assuming sphere symmetry, and explore this simplified model. If this simplified model allows the sigma model to emerge from the vacuum, well then happy day. If not, then hopes are dashed. But it would have gotten much farther than would otherwise have been possible in the “full general” case.
Suppose the dreams are fulfilled. Then the breakthrough is that the AdS boundary condition is re–identified with the bag boundary, instead of being some cosmological thing. The boundary is also topologically non–trivial, so the chiral model now lets us ask about harmonic forms on this combined structure. viz a spin manifold having some kind of dynamics inside the bag, that spin manifold having some spectrum; that spectrum, or rather, the spectral asymmetry looking like (or I suppose being exactly dual to) a chiral model. That would be pretty cool and spiffy, if it were true and could be done.
But now, I bop my nose up against some age–old (and old–age) concerns. The age–old concern is “what is the most important thing I should work on right now?” and the old–age concern is “shit I’m running out of time, so I better get the correct answer to the previous question.”
The problem with doing geometry is my meta–imagination. The geometrical, algebraic universe appears to be constructable by specifying a collection of axioms, and a collection of inference rules, and turning the crank, and discovering gazzillions of identities. This is how math has always been done, traditionally: the more abstract and general and useful identities are given a name, placing the discoverer into the ranks of history. While still alive, mathematicians derive pleasure from finding identities, stating and proving theorems, grasping at hypothesis. The mathematical physics community is less rigorous, but questions of harmonic forms on Riemann surfaces certainly counts as mathematical physics, especially when minor modifications give things like Landau superconductivity or Yang–Mills eqns. So its eminently physics, in that sense. And the academic grind–core, this is what they do: they have fun cranking on this stuff. And in the last 2-4 years, they have been learning how the get LLM’s to do some of the more boring parts of the calculations. Just like I have learned above.
The personal problem, for me, is, well, suppose I do discover that some whizzy chiral condensate happens to be equivalent to, dual to some Riemannian manifold? That would be pretty cool. Would it matter? Well, I would have had fun, once again, in the process, but I’ve had lots of fun over my lifetime, even if the results of most of it have been utterly unimportant in some local scheme of things. I mean, how does this work, anyway? Steven Weinberg wrote an excellent book on gravitation, got the Nobel prize in physics for QFT work, scowled at me when I approached him at a conference, taught lots of students at UTexas for a good salary, and then he died. Whoopla! Hot patootie, bless my soul, I really love that rock–n–roll.
Anyway, the upshot is that I can get paid for doing AI research; I cannot get paid for doing mathematical physics. This is my current economic environment. (This section above was written after the section below. Its placed out of line to conetextualize better with the second attempt to solve the eqns above.)
Post–script
Next day. I slept on this. It won’t work. Here’s the problem: If I take the vacuum fermions to be equi–distributed (and they will be) then they will uniformly fill all of space. And, in this hyperbolic gig, almost all of the space is not at the center, but in the
region. So in the “exterior”
coordinate, it will look like the baryon number accumulates in a thin shell near the bag boundary. This gives a hollow baryon. Well, that would be interesting, I guess, but maybe too far away from any expected nuclear reality. I was imagining asymptotic freedom, by placing the bag boundary “very far away”, metrically speaking. But that same metric implies that almost all the volume is in those very far away regions. Which is not what I want.
I want the opposite. So I guess that means the space derivative would have to be the flat–space derivative, and instead stick some term that .... well, it doesn’t work out. So this was a big whoopin dead–end. If life gives you lemons, call them screw–ball escapades.
Well, I can tell it’s not over. This didn’t pan out, but burbling in the back of my mind is the idea that some other screwball twist will work....
Punt
Lets assume the earlier expressions for the spin connection were correct.
Punt. I already wasted a few hours on this. It’s solvable, but turning the crank and double–checking will take a few more hours.
And that’s for the simple case, (the radial–symmetric case) and not for the case that would really be interesting (which would be to allow local curvature to have some kinematic freedom, e.g. back–reacting in such as way as to have (vacuum) energy distribute uniformly. Or then more broadly to explore how much the chiral bag boundary condition can be made to look like AdS horizons. Or, in pop–sci–reporter speak, that “quark confinement can be modeled by placing the bag boundary infinitely far away, from the quarks point of view”. Which I think is a cool idea, and I’d like to see the answer, I’d like to see how the algebra for this works out.
And I could do it. How long would it take? Well, turns out cranking through mathematical algebra is a lot like writing software. Once you understand what needs to be done, you say “oh a few hours”. Plus some time for debugging. Software debugging is easy, compared to algebra debugging: you might have made a mistake somewhere, but where? Or maybe you didn’t, so how can you check?
So I could do the spin connection in maybe a few hours. Maybe a few hours to double–check that the formulas that Claude quoted are actually correct. Then at least a few hours to remember how to do the hedgehog Ansatz, in this new setting. Then a day to daydream and scramble about and explore related ideas. Then a few days to solve the eigenvalue eqn, assuming, that is, that it is easily solvable; it might not be. A few days to sum the vacuum energy. A few days to rethink how this exerts a kinematic force on the curvature. A few days to daydream whether this should be turned into a full 3D dynamical model, instead of a 1D radial slice. And so now weeks or months have gone by on what was intended to be an afternoon project.
Where was Claude in all of this? I was imagining some super–intelligence, and I just ask it some questions, and get some answers. This conception worked stunning great, when I was asking about electrical generation and transmission around the globe. Claude knew all the basic ideas, and fetched data from specific web searches, as I interrogated. It went well, it went happily. So I vaguely thought that doing some geometry would go equally swimmingly; I’d get some answers that have been bugging me for decades.
Hah. Once again, Claude is revealed to be stupid as a rock. It has no fucking clue what it is talking about. Yes, it trained on some mathematical texts containing latex. Yes, it has committed them to memory. If you ask it a commonplace question, it knows what the answer should be, because it has memorized the answer after seeing it in a dozen textbooks. Ask it to check its work? And endless spew of excuses: “you are right, I forgot to divide by r. Let me try again and be more careful.” OMG. This is exactly like using Claude to write software. But in that case, when it writes something that doesn’t compile, I can ask it to use the compiler, and try again till it gets it right. Here, I’m the compiler: I have to double–check the formulas that it is claiming to solve. And fix all the errors. And OMG is it making lots of errors. So, in this sense, I could have made more progress, faster, without using Claude. What’s the effective flow? To use it as a hinting tool. I do all the calculations, all the algebra, I get it to reformulate, using a different path, and see if things line up the way they should. So I can use Claude as a kind–of computer algebra system, and perhaps some super–tedious algebra does go faster, and perhaps I can trust it some of the time for some of the results. Maybe. Do I have time for this? Not right now. Maybe someday, when I need a vacation, I can get back to this.
But for now, my questions remain unanswered, and Claude is revealed, yet again, to be an LLM. Ask it about something that is well known and widely understood? You get a great answer. Ask it to write javascript for your web site? Instant and perfect. Why? Because it trained on a million web pages, tens of thousands of javascript snippets. It can interpolate those in an instantaneous and brilliant fashion. Ask it to program in an obscure language (like Atomese) or use some obscure API? It stumbles. Ask it to do geometry? Utter and complete failure. Its a wordcel. It’s not a shape rotator. And that’s where we are, for today. The End.
Or not. It’s not that bad. I gave in to the vibe, and pushing further does not look all that bad. The formulas look OK–ish. I am copying them into place above, without really checking. I am guessing that this is previously published material and Claude is regurgitating it correctly. I feel like this guy and this girl says “this will feel good” and she goes down on me, and what’s there to complain about? So I’ll share some consensual hallucinations with Claude for just a bit longer, and see if it’s OK. So scroll back up for the results.
29 January 2026
OK, back to the drawing board. Here I will belly–ache about AI development status, and try to see if there’s some better way. There are multiple topics I’ve been putting off for months. Lets bullet list them. Lets not bullet list them, I can’t remember them. They’ll come back naturally as I write...
I want to restart rewriting the learning pipeline. So I unleashed Claude and it was a total failure. Because Claude is stupid. So what I envisioned, hoped for, wanted was some autonomous file–system crawler, that would crawl and take notes, build a world–model of what it found, represented in a way that makes it easier to query than the brute–force of actually crawling the file system to find what is there. That is, the world–model is held in the AtomSpace in some “natural” representation that is easy to query in some ad hoc manner.
There are two bootstrapping issues here. One is: what is the “natural representation”? Of course, I can hand–design the initial one, but it would then need to be able to accrete additional information when it becomes available. So then the next question is “what is that additional information”? and its meant to be pair–counting results, and then the marginal probabilities. So I can hand–engineer all that. I’d done it two or three times before, so this is yet another implementation. At any rate, I’m the one designi9ng where things will get stored, and when I asked Claude to do it, I got a hash, so I have to do it.
The second bootstrapping issue is that I have to explicitly specify the file–walk algo; depth–first or breadth–first or whatever. Again, this does not happen “naturally”, but is engineered into place.
So let me run with that idea. Can I define a set of axioms and inference rules, such that, when combined, a file–walking algo results? That is, what is the axiomatic system in which tree–walking algos are embedded? And, given this axiomatic system, what can I do to make it self–assemble, like the abelian sand–pile? i.e. to get to a critical point, where the behavior patterns at that critical point are that the avalanches are tree–walks?
To answer that, I dig back to L–systems, and the work that Prusinkiewicz did with them: some simple grammatical rules, and you crank them, and tree structures result. Now I take them and the trees that grow are forced to conform to whatever the actual file–system tree is.
So the new thing here for me is that with an L–system, it is incorrect to think of a single algo, walking a tree. Instead, one has creates a new agent: one for each directory and file, and that agent communicates what it finds with it’s neighboring agents. Now, I can force this, project this back down to a single traditional comp–sci algo that is depth–first or breadth–first, but that is a projection (in the sense of a fibre bundle projection) If instead, I am L–system inspired, the fibre of my fibre–bundle is the L–system. Now, the agent at each directory and file is very nearly almost totally trivial: the only thing it can do is to communicate with neighbors. And maybe not even that: it can just statically denotes connections (and I statically represent these in the AtomSpace). The one non–trivial agentic behavior is that when an agent determines “oh hey I am a directory” then the grammatical production rules say “grow a new agent, one for each inode in the directory”.
At this meta–level, at this L–system level, each new agent gets enqueued onto some run–time queue, so that the CPU goes over that queue, and alots some CPU time to each agent. Since agents that have already explored all of their inodes don’t really need any more CPU time, they become “dead” (or dormant) and can be dequeued, I guess. This search algo is naturally “parallelizable”, because L–systems are naturally parallel. And, as I write this, I realize that I could also do this for the query engine.
Let me explore the query–engine idea for a moment. Every for–loop is parallelizable in this way: it enqueues a job to explore every item in the loop. Sounds sexy, the reality is data locality: the items being walked in the loop have to be fetched from RAM, or, if its running on a GPU, it has to be copied across some bus to that GPU. Copying is automatic, for CPU’s: data is automatically copied from RAM to 3rd, second then 1st level cache, then registers, and I don’t have to think about that data movement problem, because the good folks at Intel/AMD have already solved it for me. For GPU’s, it is much more hands–on, because the GPU sits on the far side of the PCIe bus, and I have to think about how to convert the body of my for–loop into a vector that I can efficiently pass over to the GPU side. So this is hard and tricky. Thinking about L–systems and agents is inspirational, but hits a wall when I say “oh gee, I want each agent to run on a GPU, but my agents are so trivial that they do almost nothing”, and that almost the only thing that they do is to vectorize and enqueue the next set of agents to be enqueued on the runtime queue. So this is ... abstractly interesting but profoundly mis–matched to what hardware can actually do.
Hmm. Anyway. it is beautiful outside, so I am going for a bike ride now. Later.
29 January 2026 Later
Well, there is a way of queuing agents in Atomese: just say (CollectionOf (Type ’ParallelLink) (FooLink ...)) where FooLink returns a list of agents. Bingo: L–systems in Atomese are about this trivial. So first, two obvious caveats, and a meta–issue.
The caveat is that the ParallelLink creates (Linux) OS threads (via c++ std::thread), so this is heavyweight, and only “make sense” for complex, long–running agents. Now, of course one could try to optimize: perhaps by designing a GreenParallelLink that would use green threads. But this founders: when do you use which? If the agents are tiny, then running them in parallel is wild overkill; just run them serially. There’s also a different way of thinking about this: inventing the GPUParallelLink, which would dispatch the agents to GPU’s, one each. This then struggles, because each agent needs to access certain unknown data in the AtomSpace, and there’s no way for the GPU to “reach back” into the AtomSpace to get it. Perhaps with cleverness, this could be engineered around, but in the end, misses the point. And the point is ...
The point is that ParallelLink is a declarative element. At this time, it is tightly coupled to a specific C++ implementation, but abstractly speaking, it should not have been. The actual implementation, “under the covers” or in some “execution context” could be Linux pthreads or greenthread or GPU threads, or maybe just serial. This “execution context” would be a property of ... of what?
Well, the Atom::execute() method has the signature Atom::execute(AtomSpace* scratch) with the pointer pointing at the scratch space inside of which the execution happens. So we could create a custom AtomSpace that replaces execution methods by other algos that run, e.g. on GPU’s, or green threads or whatever. So this is... well, at least partly solvable. Interesting. This is not impractical, either, it’s a relatively easy project, even, to implement this. Huh. Good.
Break for dinner. The meta issue is ...
There are two meta–issues. Or maybe the same meta–issue? Hmm. So one idea is that there exists a “natural transform” between doing loop contents serially, and doing it in parallel. “Natural transform” in some vague category–theoretic sense. For Atomese, this would be the transformation between (Trigger (FooLink ...)) and (Trigger (CollectionOf (Type ’ParallelLink) (FooLink ...))) Both do “the same thing”; the first one does it serially, the second in parallel. So if I have a graph–rewriting system, I can rewrite the first into the second, or the second into the first. So the meta issue is this: “What is the current set of rewrite rules that I am willing to apply to a processing pipeline?” and “Should these rewrite rules create pressure for more parallelism, or less?” And there’s even a regulatory component: “Apply rewrite rules to pressure the system to run on the GPU’s, unless the GPU’s are congested, or the PCIe bus is congested, in which case, run things locally on the CPU.” Then the complexity spirals out of control: “to know if the PCIe bus is congested, I need a sensor to measure traffic and congestion in real time”.
And then it is recursive: “if I have sensors that can measure different aspects of system performance, then I can write some optimization algo that uses those sensor inputs.” Right? But what is the algo? Maybe there are several algos: “Performance” and “PowerSave”. What are the component pieces–parts of such an algo, and what are all the different ways that I can assemble and re–assemble those pieces–parts? Well, the pieces–parts are jigsaws, and the sensory pipeline is “PCI congestion sensor -> X” and “if X then Y” and “Y -> execute locally” but in between X and Y I can insert other kinds of control blocks: “If X>250 and X<500 then Y” or it there is another sensor Z for memory pressure, then “if X>500 and Z<90 then Y”. So my toolkit is a box of Lego MindStorms and I can snap and assemble them any way I want, as long as some basic grammatical rules are obeyed. Writing down the grammatical rules is ... well, “easy” or “easy–ish” or “a well defined problem that we know how to solve”. But the meta–issue is “what sort of process control system should I build?” Should it be simple? Complicated? where is the trade–off between simple and complicated?
In the sand–pile model, I have gravity, which is the force causing avalanches, and finite slip, friction coefficients (the sand grains are not infinitely sticky) so the avalanches must necessarily occur. Even if the sand–grains are replaced by thistle, there is only so high you can go with a tower of sticky thistle before it falls over. How do I port over this conception to algorithm–building? When does an algorithm become “too complex”, and collapse? What drives that collapse? What is analogous to gravity?
Related is this: “what is the algorithm that finds (optimal) algorithms?” Of course, if I did not care about “optimal”, then I could build an odometer that simply listed all of them. But if I want them to be “optimal”, I have multiple issues: “what do I mean by optimal?” and “what are the different ways I can measure optimality (space–usage, run–time, traffic congestion...)?” and “how do I limit the search space to avoid search regions that are unlikely to yield optimal algorithms?”
So there are several solutions to the algorithm–discovering algorithm. Aside from employing a bunch of students or engineers, one can also ask an LLM to craft algos for you. The primary issue here is that you are human–in–the–loop. The LLM’s can write OK javascript or C++ code if you ride herd and are precise in what you want. If you are not precise, you get crap. Asking the LLM to write Atomese is ... well, it can be done, but its quite low–level work, the LLM’s sort–of understand Atomese, but not all that well.
Let me go off on a tangent here, that I’ve wanted to tangent on for months. I am asking the LLM to write the new counting pipeline in Atomese. But why? Surely it would be more efficient to write it in python or c++ or scheme or ... ah, well, but there’s the answer; I think I wrote about this in the diary only a week ago. Right? The previous version of the counting code “worked great”. It used the “matrix” code for maintaining counts and computing marginals, and it was batch oriented and failed to be flexible in all the ways I want it to be flexible. In some imaginary universe, I could work with Claude to refactor that code, so that is become flexible in all the ways I want it to be flexible, but this parallel universe is imaginary: I know from hard experience that either Claude turns it into a complete hash of spaghetti code, or that the code base stays clean, but I put in lots of hard effort and engineering work to make sure its good.
And if I’m going to have to put in lots of hard work, lots of engineering, well, it becomes clear that I do not want to use c++ or python or scheme to write that code... I want to use Atomese. But why? “It becomes clear”, I say, but really? Does it? Why not write in some other language?
I don’t know if I can give any easy answer. I like the self–reflective and recursive properties of Atomese. That is, its a graph, and if I want to redo that graph, I apply a collection of rewrite rules. But if instead it was javascript, I could also ask an LLM to rewrite it for me. The LLM becomes the “graph rewriting system”. The LLM is of course sloppy, but is it sloppier than what my imagined system of rewrite rules could generate? Is there some intractable complexity limit? Well, naively “everything becomes spaghetti code”. But the standard counterpoint is that “good design practice, such as modular code and API’s and unit tests” etc. allow you to scale to arbitrary complexities. And it would seem that those standard engineering design rules would need to apply to Atomese as well.
What tools do I have? Well, judicial use of DefineLink and PipeLink and NameNode allows me to assign names to modular components. I do not have any framework for unit tests at this level. I do not have a good theory of refactoring at this level. I know I want to have rewrite rules, but what are they? How do they work in practice?
For example, the rewrite rule that converts between (Trigger (FooLink ...)) and (Trigger (CollectionOf (Type ’ParallelLink) (FooLink ...))) – where is this rewrite rule? Again, if I have a collection of them ...where? How? Is there a FilterLink that applies this rewrite to a stream? The order of rule applications is important. Do I have to rip some pages out of theorem–proving systems (Agda, HOL, etc.) to create a proper rule management system?
Argh. I go in circles. So then I say to myself “fuck it. Just write some code .. something or anything ... try to make it work.” And then my code scales out of control and I am back to theorizing again. Except now five years have slipped by.
The meta question is “what should I build?” and “How should I build it?”. The very first part involves this idea that “I want to perceive file systems”, but the proximal perception code, of pair–counting, does not yet exist. Worse, if it existed but was tuned for texts, it could not perceive file systems ... or perhaps more directly, it could not perceive the “AtomSpace itself”.
I sort of tripped over this last idea, working on the AtomSpace visualizer. I have a large dataset. I have kind of forgotten “whats in there”, and I wanted to get a quick reminder of what it was. Well, there are two ways. One way is to build something into the visualizer, provide a whole bunch of buttons and visualizer do–hickies, and click on them, and get some general idea of what’s in there. The other way is to start a CogServerNode, and ask Claude to connect to the CogServer, and verbally inquire “what’s in here? and “how is it structured? and try to get OK answers via natural language query. Which is kind of simpler than building the visualizer, which is ... hard. Hmm. I have not tried to have the MCP–examination of this dataset. Maybe I should try that, and see what happens, see if I can get meaningful enlightenment that way ...
30 January 2026
Having long conversations on discord. Won’t record all of it, but a few comments that I really liked:
- Blah blah paradox/contradiction... that is how you build oscillators .. take a digital NOT gate and wire input to output, and it will oscillate as fast as the transistors can switch. So maybe the origin of time is due to unresolvable logical paradoxes. (That is what got typed into the chat channel, but in the back of my mind is “here and now”, and I’m like “oh, wait a second .. maybe that really is, quite literally, where time comes from, and why we are trapped in the present, and why the future is unknown – because the wiring diagram is not simple like a TTL oscillator, but complicated and fluid and self–assembling. So, yes, ZFC or New Foundations or Girard’s paradox, but its cranking, oscillating at the speed of ... time.)
- Everyone struggles to communicate. Not everyone is aware that they struggle. This is, in a sense, the core AGI issue. "What should I say? Why should I say it? How should I say it? what do I want to express? What do I want the other person to understand?" The LLM’s are less than that. They communicate clearly and directly... and have not the faintest idea of what they are saying. What I want is a way of working with ideas. Then maybe I can use an LLM to convert those ideas into words. (So above was typed. What I’m thinking is that, yes, I want to encode abstract axiomatic knowledge in Atomese, and then use the LLM to verbalize it.
- The AtomSpace is a reliable mapping from token–to–complex–structure, and also back in the other direction, complex–structure–to–token. (So here, I am thinking the “tokens” are PredicateNodes, or similar, so the token–to–structure mapping is “easy”(???) and the inverse mapping is the query. But the inverse mapping could also be cosine similarity, maybe. But the token–to–structure mapping might be the incoming set of that token. Hmm. Clearly, this needs work.)
- I can use an LLM to mine technical text (scientific papers, wikipedia) for factual expressions, convert those expressions into prolog or just EdgeLinks, and I can count those. These form vectors, whenever an atom appears in in an expression. So I can take distributions
- When Jeff Bezos started amazon.com, the idea of "a store where you go to buy things" was a thousand years old. And the idea of selling on the net was the foundation of dot–com – everyone understood. What Bezos had figured out was monopoly power: the efficiency of centralized structure. It wasn’t going to be 1000 small dot–com store–fronts. All the decent people are rebels, saying "I don’t like centralized control" and that’s valid – but the core confusion is not that "centralization is bad", its that "assholes who run centralized systems are assholes". And so really, decent is all about "no one gets to be asshole in charge because we are going to p2p everything between ourselves." The problem with decent is that it promotes a different kind of deviant perverted behavior: e.g. bitcoin mining.
- Sure. But the Q in SQL stands for "query". and LLM’s ask me clarifying questions often enough. And I also know how to make very simple systems where queries emerge automatically. One of them is "here is a sentence. Find all questions for which this sentence is an answer" and then, "pick the top question, and find what other answers it has. Pick the top answer and print it to the screen. " This is how the old SRAI chatbots work. And roughly speaking, its also how LLM’s work. The SRAI mapping was very literal. If you say "I like sports", the chatbot would generate "I like *" and "I * sports" and "* like sports" and then look over everything * could be, and respond, maybe "I love sports" or "I like hockey" where * is the wild–card for the query. That was literally the SRAI chatbot algo. LLM’s do this too, but the wild–card is not one or two words, but a "context window" of tens of thousands of tokens. Logic is the difference between wordcels and shape rotators. The wordcels, the chatbots, they can just work with context windows – maybe a few words in size for the old ones back in the day, to tens–of–thousands today (big enough to have "skills" in the context window) but that’s all they are – context–window-matching wordcels. To get to shape rotators, you need to have contexts that are not just word–to–word relations (weight matrices) but abstraction–to–abstraction relations. (symbol to symbol relations? where you can "easily" look up symbols?) So that’s what I work on, roughly. Mostly. The result might be a system that "understands things better", but then after that, my imagination runs out. Is it actually better, or just some stupid computer trick?
So riffing on the third and fourth bullets, I need to build an LLM bridge that converts text to relationship expressions. Do the simple stats, and see what happens...
Hmm. I can guess what happens, so this is kind of a “so what” exercise, because I want to scale. But also, I have to start somewhere.
So here’s another idea: build these vectors these sparse symbolic vectors, then compress them (hash them down to 256 bits or 1024 bits) and use than embedding to ... what? Provide training data to some DL–NN network. That is, a sequence of “abstract concepts” in a text is still a sequence. So I can still train on that sequence. The bad news is that, in some sense, inserting this abstraction layer in the middle is likely to lead to worse results than using the word–tokens directly. That seems clear: the attempt to abstract out necessarily entails a loss of (local) fidelity. Nor does it obviously open new vistas.
So viewing a text as a time–series of ideas is incorrect. Instead, I want to return to the older conception of pair–wise relations between ideas, create the network there. It’s always inescapable, I always come back to this point. And then I struggle cause its so hard to scale. Is that fair? unfair? I’m competing against a gazzillion gigaflop beast running on the cloud. Fuck me. “Shut up and calculate”.
1 February 2026
Can I have a quiet day, and get some work done? I want to return to counting. To count words in texts, I need to build a filesystem crawler. It crawls trees. Well, actually I build one a year ago, and I used to crawl my file system, and now I have 400K links that describe my file system: that is, I have a “model” of my file system, as it was a year ago. This model includes file metadata: name, type, size, the usual file attributes. So I don’t need to (at this particular instant in time) to create another, better, different crawler. I already have this model.
So the question then becomes: how do I crawl this model? This turns into a question: how do I crawl any large complex structure that is already in the AtomSpace? The starting point could be a tally of all the different atom types.
This can be gotten “directly” from the AtomSpace, but more rigorously by using the query engine with a lone unrestricted Variable, followed by a TypeOf, followed by an IncrementValue. So I did this in atomspace–viz, and it works.
After this, I run into trouble. Some maybe–interesting questions:
- What is the average size of the incoming set of an Atom?
- What is the average size of the incoming set of an Atom of type T? (visualized as a graph that is a function of T)
- What is the average size of the outgoing vector of a Link of type T? (visualized as a graph...)
- How many Links are there having outgoing vector size 1,2,3,... ?
- How many Atoms are there having an incoming set of size 0,1,2,3... ?
- What is the average height of the trees? (of type T, visualized as a graph?) i.e. starting with those Atoms (Links) that have an incoming set of size 0, what is the distance to the farthest Node under them?
- How many disconnected components are there? That is, unreachable by tracing through incoming, outgoing sets? How does this change if I don’t allow reachability through certain types? This is hard – the query engine does this under–the–covers, as it walks, and imposes a zillion constraints while walking, but it does not keep a tally of connected components. I don’t have a “match any tree” query. Hmm. Maybe with GlobNodes, stacked ... trees of specific size enumerated in ChoiceLink. Whatever. This is an interesting question, and exposes a missing API to the query engine.
- Pair–wise correlations. Pick any two locations in a tree (in the abstract) and show the MI between those locations (or rather, the MI distribution over all trees.)
I’ll stop here, and meta–critique. All these are interesting questions that may have curious answers, but they don’t offer much in “understanding” the dataset. Having these could be useful for categorizing different datasets with respect to one–another, but, for just one dataset, the value is low. That they have any value at all is only due to my own personal experience with trees and graphs, that has been gained over time. That is, these numbers are interesting to me, personally. The intrinsic value is mostly arises when they are used to compare to other datasets. The only other value is if/when a system has some general abstract “knowledge” of “what is a tree”, and therefore can align specific information about collections of specific trees with respect to this general abstract knowledge–base of what trees are.
Well, so one issue is that if I place myself as a proxy for a “system that posses abstract knowledge of how trees work”, the above statistical information still does not give me very much insight. I know that this tree data is a snapshot of my file system, but so what? Grep, find and locate are still my tools of choice, for the file system. I don’t have anything like grep, find and locate for the AtomSpace; I have something more powerful, but also more verbose. Which is OK, because the mantra is machine–accessibility, not human–accessibility. But now that I have made it machine–accessible, now what?
There are several side–tracks to go off on.
- How does the machine obtain a collection of inbuilt knowledge about trees? This is about axiomatic knowledge.
- What is the human interface to this knowledge set? I would like to use chat to interface to it.
- What happens when I apply my age–old day–dream of leveraging up the structural stack from pair–counting?
The first and third bullets are supposed to be interconnected: infer axioms by observation of structure in data sets. I now have to revise these ideas.
The original idea was that tokens arrive in a time–like stream, and I do correlational counting of pairs of tokens that appear near each other, in a window of some size in the stream (e.g. six words, or a sentence that ends with punctuation; whatever.) Then after pair counting, obtain LG–style disjuncts, and then perform clustering on those disjuncts.
Now, my tokens no longer arrive in a stream; they already have some innate structure with respect to one–another. My tokens have already been organized into a graph. Although this is a file–system graph, in the present case, it could have been a parse–graph: something that arises after the first step of counting –> disjuncts –> parsing has been performed. Curiously, I never had a good plan for that before. Or rather, I have a naive plan: look for pairs, again.
The problem here is I have lots of explicitly different pair–relationships. Thus, I store not only the pair–count, but the specific relationship between the pairs. Well, the old matrix code already did this: the matrix was always defined abstractly, in terms of a pair of locations specified by an underlying pattern.
One experiment would be to make a catalog of all tree–shapes in the dataset, then, for each tree, insert variables in every possible location, and do pair–wise counting on those.
3 February 2026
Why am I procrastinating? Because I know heavy lifting lies ahead. And here I am, evening, blowing it after wasting a nice morning taking care of mundane stuffs. Alas. Well, lets give it a try, anyway.
What is knowledge? What do I mean by “knowledge”? What do I want my hypothetical AI to accomplish? How can it be accomplished? How could it all work? These are the questions to explore. I have some answers, but lets work up to these.
So I was talking about a record of the files held on my collection of computers (real and virtual, lxc and docker containers, datasets, backup drives...) What is there to know about these files? Well, some are direct copies (backups, archives...) some are functionally similar (different versions of /bin/bash compiled for different OS versions...) some are same in format, but otherwise random (mp3 collection, photo collection, email collection, datasets from experiments, source code archives...) What is there to “understand” about this file collection? Not much, beyond what I describe here. Want to know more? Don’t ask me; as an LLM front end that can “see” the file–system contents. What is there to “see”? Well not much more than the above. ...
So what might I ask the LLM about my file system?
- How many lxc containers do I have? Which ones have I not used recently? Can I delete some of them?
- I seem to have misplaced the dataset of Project Gutenberg books ... where are they? What about the SFIA dataset? Surely I have some copy somewhere that is NOT in an lxc container, right?
- Crap, when I moved to CephFS, I did not know that it has the terrible flaw of sometimes zeroing out file contents; do I have some copy somewhere of this file that is not all zeros? Can I find all the files for which I have a viable backup copy? Can I identify the files which have been permanently lost?
So how would one make the above work? Lets review what does not work: the old idea of “pair–wise MI” –> disjuncts/jigsaws –> higher–level correlations mostly does not work, because there’s not much to correlate.
Most of the above questions can be solved by applying old–fashioned find with grep. Since Claude knows about find+grep, I could just let it loose on my file system. But since I don’t want to (can’t) let it loose on everything (because some of the contents is offline, (e.g. backup disks, USB sticks...) and others require remote login or firewall hole–punches. This would require grep/find–like abilities for the AtomSpace. Which already exist via MeetLink, etc. and so this is about teaching Claude how to write custom Atomese queries.
Next, I have the problem of “current Claude” (for which there is a monthly subscription fee) vs. “what can I run privately on my own dinky GPU”, and then “future less dinky GPU that I can own that will be capable of today’s Claude” vs. “future subscription service”. The issue is that, although I can design and test only for today, I takes a long fucking time to design and test, and so therefore I should target future systems.
Lets take a step back. The old–fashioned locate/plocate/slocate systems build an index of file names, and this pre–built index is searchability so easy to do. I could also build an index in the AtomSpace. How should I package it? Well, I could have a core AtomSpace, containing the file–system model, with no indexes, and then a layer on top of it (child AtomSpace) holding index data. Current infrastructure does not allow me to store this higher layer without storing the lower layer, so current StorageNode design would need some work to make this run.
If I think abstractly in terms of “science data sets”, then I can imagine all sorts of abstracts and summaries can be developed. I once read through the Vera Rubin telescope data product catalog, and it was quite remarkable, what kind of “products” one might have available, and the various time–scales at which they are available. What are these products? They are (abstractly) different kinds of algorithms (some very complex and sophisticated) applied to raw pixels, and then categorized into indexes. Which can then be filtered. All this is hand–built. I suppose someone has already attached an LLM query system to this product catalog ...
And so another step back: the meta–meta layer. Ordinary (but smart) grad students in astronomy are very busy fiddling with LLM’s and attaching them to astronomy datasets and discovering new astronomical phenomena). Other ordinary grad students are applying LLM’s as coding wizards to the various astrometric algorithms, to improve their operation. And so these advances are happening “full speed ahead” without the intervention of any kind of AI I might imagine myself to be designing, and so turn around and ask ”what is knowledge, anyway?”
So again, using my own file system as “a dataset”, what do I hope to “perceive” in it? Well, I can hand–design various custom MeetLinks that extract some of the data needed to answer the questions above. How?
lets tackle the first: where are all the lxc containers? Well, on one system, I happen to “know” that they are located at /home/lxc, but on another system, I put them somewhere else. How do I know they are LXC? Well, not just due to location, but tell–tale signs, like the names of config files, and the fact that some config files have a certain very peculiar form. That, plus the fact that there is a root directory, with /bin, /usr, /var and /tmp... this common FSSTND file hierarchy. How can I discover this hierarchy? (How can my algo discover this hierarchy?) Well, if I really do want to ratchet up “from nothing”, then perhaps the pair–count -> jigsaw approach can discover that there are similar fs hierarchies scattered about. The experiment that has not yet been done is to prove that this is indeed discoverable in this way. So I guess that is a big to–do item.
But there are two meta–issues. One is that I would like to attach this subsystem to an LLM, today (using current technology). The other is the “how do I discover algorithms?” question. So, a higher–order structure of data correlation is ... a “static” structure, in that it just “exists”; it is NOT an algorithm. This higher–order structure might be describable as a collection of jigsaws, with the connector semantics describing how these connectors are to be connected, but this requires a parsing algorithm, and where did that algorithm come from? Curry–Howard correspondence says “proofs are programs” but this is too distant from the more immediate situation of having a structure, and having an algorithm that can perceive that structure (never mind perceive it efficiently).
4 February 2026
So I’m still struggling to close a circular loop, here. I can do pair–counting, if I can get a crawler agent. I can get a crawler agent, if I hand–code it. But then I have to hand–code where it is going to go, and what it will do. So I have to think hard about algorithms. And when I think hard about algorithms, I say to myself “I know how to do pair–counting” and “pair–counting does not solve my problem of understanding file system contents”. I also don’t have a user interface: some why to talk to my system.
What options do we have on the table?
- Continue to agonize about the big plan. But this is best done in the background, quietly, not writing: I jumble things around in my head that are too hard to verbalize and formulate into a plan. Architectural issues.
- I can port a minimal atomspace to the GPU, so that the graph network sits there, in the GPU RAM. Maybe later I can have algos that access values on some atoms, i.e. a port of the matrix code to the GPU. This seems like a needed step at some point, but when?
- I can search for a small, non–cloud non–proprietary LLM that I can run locally. But why, when the cloud versions are usable, even if they cost? I can also generate my own electricity, grow my own food, have some chickens in the back yard ... which unplugs from the existing social infrastructure. So, vaguely appealing, and vaguely impractical.
- I can have Claude write mountains of ad hoc code for me, with the risk that it is all an unmaintainable pile of slop, collapsing under it’s own weight.
- I can design and modify a storage proxy that allows me to layer index and analysis on top of other AtomSpaces, pursuing a line of research resembling membrane computing. In that each AtomSpace is behind a membrane, and there are information flows between AtomSpaces. For example, an AtomSpace might be located in GPU RAM, but how does info flow to it? For example, and AtomSpace might contain filesys info, and a layer on top of it contains analytics and indexes. All this needs some heads down tinkering with that code.
- I can resume pair–counting with words and just try to leverage as far as possible, ignoring everything else.
- I can revive pair–counting of image or sound data.
- The pair–counting recursive step requires ... well, the path is clear and direct, but there’s just a lot of work in front of me. And, in some sense, the results are guaranteed to be underwhelming, because 3-4-5 recursive steps are required before it is possible to compete with present–day industrial LLM’s.
- But what would be the point of that? Suppose I can build some system that does structural analysis at a lower and more basic level than LLM’s ... so what? I still have some soul–less machine.
And thus back to agonizing about next steps. I’ve got competing urges, one of which is to just code as fast as possible, the other is to step back and take in the big picture, and develop a plan that “goes somewhere”.
At the crux of the plan are two issues. They are:
- My structuralist approach is still several steps away from a “recursive Curry–Howard system”, where I can (a) observe nature (b) extract structures from that nature (c) extract algorithms appropriate for (dual to?) those structures (d) deploy those algorithms. I know how to do step (b), but I don’t know how to do step (c). I can, for example, read the wikipedia article on Curry–Howard, fine. But if I have a link–grammar parse of some structural data (which could be words, photos, sounds) I don’t know how to convert a parse of, say, and English–language sentence, into an algo. Now, the parser itself was a program, and there is some Curry–Howard corresponding structure to my parser program, but that corresponding structure is large and complex. I don’t yet know how to “learn a parsing algorithm” by “observing data”. I don’t yet know how to learn any algo...
- Hmm. Well, “generative” is the conventional answer. So, generate a bunch of puny, weeny algos (using some external, a priori, engineer-given generator), and see which of these puny weeny algos generate patterns that correspond to the observed dataset. This goes back to the L–systems idea mentioned above.
- The problem here is similar to what MOSES faced: the evolutionary, selective pressures are weak, and one hits a wall of Kolmogorov complexity and combinatorial explosion quite early in this process. There’s no direct way to evolve the system automatically without taking a million years.
- So instead, we do engineering: we, as humans, invent algorithms. Before LLM’s, we relied on printed literature, research papers, peers and collaborators, and open source to provide the needed infrastructure to add one more pieces–part to whatever system we were envisioning. At any rate, we did engineering, in the conventional human sense.
- The arrival of the LLM’s allow engineering to be done more quickly.
- To what degree can I take “human out of the loop?” Because that is roughly what I am imagining, unspoken, unvoiced, in the previous ruminations about structural learning: building a machine that can perceive structure, and doing so without personal intervention. Have it run on automatic, as it were.
- If I ignore the fact that I’m talking about software, here, this is akin to what a CEO does: he builds a corporation that “runs on automatic”: a zillion employees “doing their job”, and the CEO with some light hand guiding the processes. The obvious problem here is that there are few CEO–less corporations that can truly run head–less and not crash and burn.
- The other problem with the above is that many such supra–human systems are toxic. As argued before, WWI and the tobacco industry were both toxic to humanity. WWI explicitly so: a young–man killing machine that only stopped when it ran out of young men. Nicotine is more interesting, since it is an explicit neurotransmitter, so the tobacco industry is a supra–human feedback loop that hooks into a very low cellular level in the brain. The toxicity of tobacco smoke is an unfortunate side–effect. The coffee industry is more neutral, but does not get the shock value of tobacco that makes people sit up and pay attention. And pointing to the agricultural industry, or the cell–phone industry is too abstract; people stop listening because they think they know what these are, without much considering or grasping the feedback effects of the supra–human and the micro–biological ties.
- Anyway, we live entangled in all of these networks, nominally called “economics”, but also “culture” and “society” (and government ...) and so tying on yet another feedback loop into the system is not extraordinary per se. What is different is that Moore’s law changes everything. All the cognizenti talk about this already, endlessly and breathlessly in every forum imaginable, so why am I repeating it here? Hmmm.
- Because when I pull on the thread of “how do I build an autonomous system?” that is where the thread takes me. “what is autonomy, really?” if we’re all plugged into the same system.
- It also hints at the scaling problem. When I attempt to create system out of pure Atomese, it is as if I am trying to create systems (metaphorically) using electrons and protons as my building blocks, when common sense says that I should not build with electrons and protons, but merely ask society to provide me with those in bulk, in the form of concrete and rebar. My retort is ... I’m not a structural engineer, designing housing, I’m a research scientist, so I work with .. protons and electrons, directly.
- And yet I feel obligated to try to leverage LLM technology somehow, because I am, after all, attempting to build a system that “sees the world”, and in so seeing, interacts with it.
- I’m just deeply confused, because my natural predilection is to build little ant–like robotic systems that do cute things, and yet I nominally claim to work on AGI, and therefor should be working on creating something large and complex and all–encompassing and all–powerful.
- And then I go around: building the all–powerful is well, dangerous. Not unlike the atom bomb is dangerous. So wtf...
- So this says that attempts to build the “all powerful” are foolish and misdirected.
- So I can instead tinker with little itty–bitty pieces, doing the scientific equivalent of stamp–collecting or bug–collecting or any one of the other non–sexy minimally–valued scientific pursuits: stuff that will never get you on the cover of some premiere science journal, never mind main–stream media that pays attention only when Nobel prizes are being awarded. That is, it is OK to toil in obscurity on unimportant things.
- So is that my plan? To toil in a semi–focused, haphazard fashion, slowly knocking off each of the small little baby–steps that I need to get done, to realize the next stage of my plan to build some cute little mechanical device that does something vaguely cool–ish? Again, nothing wrong with that, I guess...
- I’m just playing the prairie dog here, popping up my head to look around, and see what else is up.
Ohh. Shit I almost forgot: I thought of this three, four months ago, and still have not really written or designed about this. This is to give Claude the ability to remember things (i.e. in the AtomSpace) and to recall things “as needed” (priority and attention queue) and act one them (incorporate them into current thinking) and then file results back into the AtomSpace (long–term memory.) This single sentence sketch evokes a rather traditional AGI engineering plan: short–term and long–term memory, the bridge between the two, the need to provide an attention mechanism that focuses the thinking of the LLM in a certain direction.
This is a kind–of fun thing to do, and I kind of got started doing exactly this, 3–4 months ago, with Claude. It got put on hold. I did not hit any brick wall, but it was definitely an uphill climb. basically, I was trying to create a long–term memory as a collection of prompts, and an indexing system of prompts–that–are–indexes of other prompts, and then, because text does not scale, to somehow do this in the AtomSpace, and then somehow use this as a tap into the attention system. And this is kind–of–ish where I hit the wall, sort of, thinking that I could tap into the attention vectors much more directly, if I could have the LLM running locally on my machine, and twinking the weights directly. Roughly speaking, this is the IIT issue: I need close ties into the system; the connection via prompts is too distant, too remote, and I wanted to have a much closer and tighter integration between weight–vectors in the NN and the specific symbols held in the AtomSpace.
So this minimal sketch suggests that:
- I should in fact look for some minimalist LLM that I can run fully locally on my local GPU, so that I can directly tap into it’s weight vectors, and
- I should port some minimalist AtomSpace that will sit in the GPU RAM, so that it can be used to directly poke weights.
Which seems like a good plan.
The agony queen here is that I still don’t have a good observer of my file system, which I really do want, and my gut sense is I really need to push forward on pair counting anyway.
And I can twink and twerk with all of these all at the same time, except I have an external constraint that I have to build something along a path that appears to be immediately and intuitively obvious and useful to my current employer, and prospective future employers. Or even to my fan club... And this last constraint is a source of stress. Urgh.
Well, it seems all I’ve got so far is to continue to twink and twerk whatever sets me afire that day, and if its not the right thing to do, well, its a tragedy, but life (or rather, death?) is coming at me like a bullet, and I’m trying to dodge and weave as best I can, and it seems like the future, as it comes, will come in an inevitable way. OK, so that’s a plan, I guess. Enough. Later, dude.
4 February 2026 Ongoing planning
Now for some LLM slop. I ask: “Are there any (open source) LLM’s that can run on a small-to-midrange commodity GPU board that also have MCP support in them?” The answers are:
- Ollama with tool calling models like Qwen3 ... but also Llama 3.1/3.2 8B, Mistral 7B DeepSeek R1 8B ...
- Use ollcmp or ollama–cmp–bridge or AnythingLLM or LibreChat or Chainlit to provide explicit MCP interfaces.
So there’s a panoply of options and pieces–parts. Should I mess with this today?
8 February 2026
In a discord chat, I wrote this:
Well, if you as the scientist, expect a dataset with all pairs in it and instead you get one with low_MI pairs discarded .. that comes as a surprise.
but also .. MI depends on everything so if you discard the low–MI stuff, the MI of everything else changes.
a science question might be: "what do I get when I sample randomly?" and when you get bored of that, then the next question is "what do I get when I cut away parts p,q,r of the dataset?" and ... well, understanding the answer to the second question is hard, if you (I) don’t know what p,q,r were cut.
nordizvaus
—
4:12 PM
Ok separate question , so you recommend not cutting out mi < 1.0 pairs ?
Linas
—
4:17 PM
if you sample randomly and keep all pairs, then you get a nice bell curve. Bell curves look like parabolas on a semi-log plot.
If you cut off the left half of a bell curve, and recompute the MI, you’ll get some lop–sided curve, but I have no intuition for that shape, or how to think about it.
Turns out that bell curves correspond to points uniformly distributed over a very high–dimensional sphere. Now, pick a point x on that sphere, call it "the north pole", and throw away everything on the southern hemisphere of that point. Do that from the perspective of each point. ...
It’s like an event horizon. throw away all points on the other side of the event horizon from "me", where "me" is some given word, and the event horizon is located at 1.23 < MI("me". "other word")
So I wrote the above, and as I wrote it, my head started exploding ... what if this is not just a weird analogy, but instead there is something literally true about it? Things on the other side of an event horizon are literally unreachable to us. And we know that QM wave functions live in a high–dimensional (“infinite–dimensional Hilbert”) space, so is there something where there’s some cut–off, where the high–MI stuff is our past lightcone and the low–MI stuff is spatially separated? What if this analogy is literally true in some way?
So that’s one. And then a far more grounded question: what does happen to a space, when you discard all low–MI pairs? I have no intuition for that. It also needs thinking and re–thinking. (and scratching out some formulas. maybe later.)
8 Feb 2026 Later
Its late, I’m tired, this will probably go badly. But I will agonize if I don’t try. Claude math session. I ask Claude questions and try to stitch together a coherent story.
Cosine Distribution for Random Vectors in High Dimensions
Suppose I have a sphere in
dimensions, where
is large, say 1K or 10K. Suppose I sprinkle
random points in that sphere, for
. What formula gives the distribution of cosines between pairs of vectors?
(Below is my imaginative rewrite of what Claude told me.)
The number of points doesn’t matter for the pairwise distribution. Each pair of points is independently sampled. For two independent vectors drawn uniformly from the
–dimensional sphere
the cosine is
and has density
and I have to coax from Claude why its that formula.
Picking random vectors in
normalized to unit length gives the sphere
. The “ring” at polar angle
is
and has radius
so it’s area scales as
. So the distribution of sine’s is
which picks up a Jacobian
so
so the distribution of sines is
and so for cosines, the distribution is
The normalization is given by the beta function
and so putting it together,
Equivalently,
.
For large
this is is Gaussian, so
or
In other words, in high dimensions, random vectors are almost orthogonal to one–another.
Well, OK. This is the correct answer to my question, but I guess I asked the wrong question... Let me think again...
Or not. It’s late. I’m tired. Good night.
11 February 2026
Well, I was going to write about something entirely different, but then I re–read the above, and decided that I have that unfinished business. So I ask Claude a new question:
Thresholding Random Vectors in High Dimensions
Suppose one has a collection of points, randomly distributed on a very high–dimensional sphere. These points can be understood as vectors, with vector components
and of course
. I want to pass these coordinates through a filter, to create a new vector
such that
I would like to understand the distribution of these new vectors.
The answer? “This is a nice problem with clean results in the high–dimensional limit.” But of course. I paraphrase Claude’s reply below, shaped to my liking.
In high dimensions, each coord is approximately Gaussian, so
The typical scale of each coord is
and the natural parameter for the cutoff is the scaled variant:
.
I’m not really interested in the energy
but Claude told me and its interesting enough to repeat (and anyway, we’ll need it as normalization, for later):
where
so for
we get an average energy of 1/2.
The expected value of each component is
and so the expected length is
so the mean vector points in the all–ones direction.
The mean dot product between two truncated random vectors
and
is
The expected cosine similarity is
so for
one gets
.
More interesting is this, which Claude gives without proof: if two non–truncated vectors
,
have an inner product
then, for the corresponding truncated vectors
,
one has, to leading order,
with
some bivariate truncated Gaussian integral. Claude gives this the name of “signal preservation” and the second term “carries the actual geometric signal”. Well, OK, Claude clearly learned this from some texts, but is this supposed to be metaphorical, or does it have a more formal validity? That is, is there a claim that there is some “information content” in the two non–truncated vectors
,
and that truncation reduces the information content? Clearly, truncation discards something, but what is that something being discarded?
The most interesting thing to pop up is this side–comment along the way: “The thresholded, mean–centered vectors then live approximately on a lower–dimensional manifold of dimension
” Wow! Really? So truncation is dimensional reduction!?
Maybe because I got a bump on my head, but I’m annoyed I don’t know the above by heart. I got hospitalized, day before yesterday, for a concussion. I’m feeling woozy and carefree just right now. I refuse to think hard, and instead go with the flow here. So let’s woozily (as I write this, I realize that it is no accident that this rhymes with boozily...) The flow says “what dimensional reduction?” Let’s find out...
While I wait for Claude’s answer, I notice that it is bandying about the word “sparsity”. Now, almost a decade ago, In my “sheaves” paper, I made the claim that one could (approximately) factor a sparse high–dimensional matrix
into
with
being a dimensionally reduced dense matrix, and
,
being left and right factors. This was in the context of Link Grammar, where
was the sets of words, organized into grammatical classes: viz lots and lots of English words, but only a few hundred(?) (almost a thousand?) grammatical classes,
was the collection of disjuncts, and the
a bit harder to perceive, but was the recurring patterns in the disjuncts. Strictly speaking, the Link Grammar dictionary was just
and some work would have been needed to factorize this. At any rate, it was clear that writing a natural language parser was an exercise in dimensional reduction. For English, this is clearly a syntactic exercise; for Lithuanian, its clearly morpho–syntactic. The word–order is relatively free; the factoring is more explicitly in the lexis of suffixes. Hmm. I have to think about that.
The point is that sparsity does imply dimensional reduction, but perhaps the mistake being made here is that it is linear. And the fact that there’s this huge statistical component to everything makes it confusing. What now pops into my mind is a botanical bush: the leaves are 2D, they are sparse, but they are arranged into an approximate space–filling arrangement, occupying a 3D volume. The point of that volume is that air is transparent to sunlight, and so the bush is optimizing for sunlight collection, while maintaining short branches, and encoding biological growth rules in some simple DNA encoding. That’s the factorization. How much of this analogy can I push into more abstract settings?
OK, Claude’s answer has arrived. “My original answer was imprecise and deserves pushback. What I should have said is ...”
Well, duhh. OK, If I have an
–dimensional vector, and zero out some of the components, then, on average, there are exactly
non–zero components left, and this is the dimensional reduction.
Next, it points out that the truncated space is not a manifold. So, this is more interesting. For every point
define the “support set”
as the set of indexes (or basis vectors) that pass the thresholding:
The cardinality is variable:
for some
. The support set defines an “orthant” (quadrant ... octant ... orthant) and the space splits into a union of orthants:
with
.
There are
choose
orthants, and as
varies, we have
Hmm. I can’t find anything interesting to say here.
Thus rapidly turns into binomial distribution games. If I take heads and tails to stand for keep/cut, then for a random draw, and a cutoff of
then I will have
tails in my string, on average.
I poked on Claude and it spilled the beans on the Donoho–Candès–Tao results. Interesting story, that. The papers, I am told, are Donoho "Compressed Sensing" (2006) and Candès and Tao "Decoding by Linear Programming" (2005) and later Candès, Romberg, and Tao giving "Robust Uncertainty Principles".
I then asked a different set of questions, and inadvertently opened the door to signal processing in general. A lot of interesting and suggestive pieces–parts but nothing to write home about.
12 February 2026 Late
So I keep having these inspirational ideas that I keep planning on getting back to, but then I get distracted and never do. Yesterday, instead of doing the above, I was going to do a riff on “retrieval assisted generation” which is about indexing blobs of text with vectors, then doing cosine similarity on the vectors to find the most–related block of text. Perhaps the reason I got distracted is because I knew, subconsciously, that it’s hard to do anything with this idea.
“What should I do next?” Turn this into a vector, run cosine similarity against a DB of texts, and come up with the top dozen similar texts, which will all paraphrase “what you should do is...” and ... then what? “How do I assign priority to this list of things do do?” and run cosine similarity to a bunch of texts from ... what? business school curricula for busy CEO’s? Comp sci texts on priorities? Life coach advice for the love–lorn?
Cosine similarity is a solution looking for a problem.
I got distracted by the above, while examining the API to Ollama. Why am I looking at Ollama? Because I want to have a chat API to the AtomSpace. I already sort of have one with Claude, but Clade is proprietary, and is as stupid as a rock. So if I have to work with something dumb, then maybe it can be something I have more control over. More nuts–n–bolts–y. Like the hobbyist attaching a jet engine to my automobile, where Claude is the jet engine ...I need something more integratable. However, I don’t yet know how ... so this is a TBD item.
Earlier today, I posted something short and sweet on Discord that I liked. Lets revisit:
“The difference between LLM’s and the AtomSpace is not efficiency but form. With the ’learn’ approach, the attempt is to show the explicit connections between words, (and between words and "things in the world outside") so that explicit rational thought ("reasoning") can be applied. They both have vectors with cosine distances between them. The LLM vectors are opaque numbers. The AtomSpace vectors are explicitly words.”
So, much as I like that, I also have some ill–formed critiques. One goes like this: what is the minimal number of parts needed to build a steam engine? Clearly, there is a lower bound, and below this bound, one cannot build a steam engine. My concern is that by consistently returning back to just symbolic representations, I’m working below this minimal threshold. In the end, its not “just all symbols”. And clearly, LLM’s have scaled a representational complexity past an important threshold.
There are three questions.
- A very popular one is “how can I staple symbolic logic onto an LLM?“
- An unrelated question is “what does it take to reproduce LLM–like function using pure AtomSpace symbolic conceptions?”
- The third one is “how can I use an LLM as a user–interface to my software system X”, which only gets confusing when X is the AtomSpace. (or some symbolic system in general).
So...
13 Feb 2026
I posted this on a youtube video of a fake AI–generated Leonard Susskind saying true but boring things about spacetime.
That’s the problem with AI–generated text — its an intellectual monotone, repetitive, hitting the same note over and over. Not moving forward. Basically, AI has no clue of what it is talking about, and therefore, doesn’t know where the surprises are, the secrets, the gems that the human listeners are looking for.
14 Feb 2026
Hi Adam,
Do you mind a long email? Let me write one, because it helps my thoughts. Plus, it’s nice to have an audience, instead of mumbling to myself :-)
"Parallelization" is something one does to algorithms, not to data structures. The AtomSpace is (mostly) a data structure, and is intended to be an "abstract" one, without any "preferred" implementation (although it clearly has one de facto implementation, and two, (or more?) if you squint at what singularitynet has done.) I assume you are writing because you’ve spotted recent traffic. So let me provide a very convoluted update.
Half a year ago, I messed around a bit with GPU’s. No particular plans, just screwing around. The original idea went like this: "Oh hey, Atomese is a language for expressing Abstract Syntax Trees (AST), and my datasets consist of sparse vectors, and I have half–a–dozen different vector formats in Atomese, and I have existing arithmetic Atoms for Plus, Times, GreaterThan, maybe I can slap this on the GPU.
Being who I am, the "obvious" approach of just hacking it and forcing it to work repelled me: it would be a pointless time–consuming hack. Now, *if* it was urgently needed in order to meet some other *truly important* goal, then by all means hackery is the right thing to do. But that wasn’t this. Instead, I am at the confluence of 5 or 6 different ideas, trying to push all of them at the same time.
They are:
- How do I build a "world model" of the (unbounded) "world out there", and represent it in the finite space "in here"? What is the nature of the cell wall that divides inside from outside? How does one perceive the outside?
- It is also necessary to "perceive" the inside, and "manipulate" the internal representation. So this idea of perception and action is a bit recursive. However, with a big distinction: the data structures "in here" can be acted on by algorithms.
- Some of the algos access "internal data" directly; e.g. explicit c++code will manipulate, mutate, control c++ structures "in here"(inside). Other algos are "indirect": for example, if I write an algo in Atomese, it does not compile into x86 assembly or java byte code;it controls AtomSpace contents indirectly. It applies transformationsthat alter AtomSpace contents.
- The GPU challenge is an interesting middle ground: the GPU’s are not some external world that can be acted on, but they are also not "inhere". So then, in what sense can a GPU be "perceived", and how can it be "manipulated and controlled"? That is, how can an algo, an Atomese algo, perceive what a GPU is, and how can an Atomese algo manipulate a GPU to "do something useful"? How can GPU’s be described, in Atomese, and how can an Atomese algo leap from running on the CPU to he GPU? And back, depending on bus congestion? Depending on performance metrics?
- And again, this gets very recursive: How can performance metrics be perceived, interpreted, understood? How can one blob of Atomese examine performance metrics, and determine what sort of abstract rewrites need to be applied to move another blob of Atomese from CPU to GPU and back? To "optimize" it?
- The last bullet has two distinct and diametrically opposed answers or flavors. The first answer is extremely conventional: do1980’s-2010’s style software engineering: examine, invent, write specs, write code, unit tests. Better yet, google it and download existing libraries and frameworks that already do exactly this. Wire them up as needed. Do NOT reinvent the wheel. Dust off hands, clap, problem solved.
- The second answer is to double down on the conception of Atomese, but then look at it sharply, critically. The AST’s of Atomese are a de facto attempt to invent an abstract assembly language, perhaps with an eye towards re–inventing the conception of computing: what it means to compute. (so, wandering into crazy–land, as it were.) The critique is that Atomese snippets, the AST’s are actually really rather small things, "molecular" in size, just not big enough, capable enough to perform the self–modification imagined two bullets earlier: to rewrite, jump, morph, move between CPU and GPU on an as–needed basis.By analogy, biological viruses can jump and move; but they are made of lots and lots of molecules; my Atomese expressions are much much smaller.
- Perhaps there is some minimal size, some minimal complexity required for a thing to be "living", or at least to motor around. Maybe there is a phase transition (in the sense of statistical mechanics,thermodynamics) and below this phase transition, it’s just dumb rocks and machines, and above which "things come alive": they have enough complexity to become autonomous. One can even get sophisticated here,and mumble about "self–organized criticality" at the "edge of chaos" — very hip, fad ideas from the 1980’s.
- Let me take a moment to sketch that territory, since it is not very familiar to mainstream comp sci. An early canonical example was"critical opalescence" at the triple–point of water: the temp & pressure where water liquid, gas and ice co–exist at all length scales, which is why it looks pearly, when you see it. It’s a second–order phase transition. Phil Anderson had a lot to say. Another physicist, Per Bak, built a simplified model: the "abelian sand pile":drop grains of sand onto a mound, and it gets taller and taller, and then collapses, avalanches. The avalanches are of all sizes: many small ones, a few medium, and rarely, big ones. This is 1/f noise, or "Zipfian distribution", observed freakin absolutely everywhere in nature, if you cared to look, and not just in those electronic circuits that Benoit Mandelbrot was tasked to examine. Per Bak went on to also create precise mathematical models of various biological ecosystems. Unfortunately, he made the mistake of publicly calling biologists "stupid", which did not help his cause.
- There is another branch: the amateur "a–life" (artificial life) movement of the 1980s, inspired by John Conway’s "game of life". Basically, you plunk down a bunch of cellular automata with some kind of small–ish set of state transition rules, and then try to evolve them. Genetically, by applying point mutations to the state transition rules, or by doing cross–over. Evolution, however, is stunningly slow;you hit the wall of Kolmogorov complexity, then stop.
- You can plink down the cellular automata on a square grid, or you can allow them to grow freely, without a grid; they form trees.Formally, "Lindenmeyer systems" or "L–systems" — these form not only trees, but minor rule variations give bushes that grow from the tips, bushes that grow from the base, flowers that bloom from the bottom up,bloom from the top down, form leaf shapes wider at the top (gingko),wider at the bottom (birch/aspen), pointy (elm), many–pointed (maple), many–pointed–round (oak). Prusinkiewicz invented this field, called "algorithmic botany" and I have no clue why he is not famous. I am in awe. His achievement was that one could take *extremely stunningly simple* purely–digital cellular automata, and recover vast tracts of botany. If you then couple the purely digital L–systems to a continuous–valued differential equation (the reaction–diffusion equation) you explain an even broader tract of botany. And the analogy is obvious: the cellular automata are DNA, and the reaction–diffusion diffeq are the proteins/enzymes. Awesome stuff.
- What else? "Membrane computing" ... but I digress.
- A modern champion of this stuff is Micheal Levin, a microbiologist, who talks of "embodied intelligence", argues that even simple biological systems are intelligent, and, to help prove his point, found some simple cellular automata that could be evolved to automatically discover a sort algorithm (that "almost" works) and is faster than any other known sort algorithm. For this achievement (and many others) he has earned a seat at the California Institute for Machine Consciousness. (CIMC) Along with our pal Joscha Bach. (I don’t believe that either Levin, or Joscha are aware of the work of Prusinkiewicz, or Per Bak, or even Phil Anderson, for that matter, but since that work is now 30 to 60 years old, maybe it doesn’t much matter?) I did write a letter to the CIMC, but they yawned and replied thanks but no thanks. Oh well.
- OK, so the above six bullets provide a sketch of prior art at the intersection of dynamical systems, computation and biology."self–organized criticality" is at the center of it, and it’s also seen in economics (prisoner’s dilemma game, corporate organizations,mergers, acquisitions, marketplace theories...) and there’s also something called "experimental philosophy", having to do with multi–player game–theoretic network representations of .. logical inferences, or something. I don’t recall. Anyway, this is a generic recurring blob of ideas having to do with "life, consciousness,thinking, reasoning and all that" and, for whatever reason, the traditional, conventional compsci, AI, AGI, LLM crowd is blissfully unaware of it. Not sure why. Ben would yawn in disinterest. Perhaps I explained it wrong, perhaps it’s his ADHD.
- So the AtomSpace, and Atomese, is really meant to be a one–size–fits–all computational infrastructure capable of representing, emulating, providing the Petri dish for any and all embodiments of the above sort. So to get back to your initial remark: "parallelizing the Atomspace" is an interesting and worthy effort, but it pales in the vastness of the above landscape. My goal is to work on *all* of the above topics, "at the same time". Piddly–dinking some c++ code to maybe run on GPU’s, or not, is very nearly a pointless distraction. It’s worthy, but it’s off in the periphery.
- I do have some nagging self–doubts. For example, if you want to play prisoner’s dilemma on a network, just ask Claude to write that code for you. Want an abelian sand–pile? Ask Claude to implement it. Need a game–theoretic model of corporate takeovers? Ask Claude. Pricing competition between gas stations? Ask Claude. A probabilistic theorem prover? Ask Claude. Mashup of Ising model, Markov logic, using pytorch transformers? Ask Claude ... WTF do you need the AtomSpace for, for any of these things? We have built our world, the human world, out of highly specialized, highly engineered complex systems that interoperate thanks to millions of man–years of engineering, and my vague imagining of the AtomSpace (or Atomese) as some one–size–fits–all re–invention of the very fabric of computing is perhaps insane...
- And yet, whenever I ask Claude to do something ... man o man, Claude is a wonderful labor saving device, I am 10x more productive writing code; But also Claude is dumb as a rock. It has zero understanding whatsoever of "what’s going on". It entirely lacks any concept of what anything is, what it does, why it’s there or how it works. It is just clueless. And so, I can justify my urge to do basic comp–sci research on life, thinking, perception, memory, action, manipulation and movement: the stuff of consciousness and intelligence because it’s clearly unsolved, and nothing else out there comes even remotely close to what I can do with Atomese...
I have several other overlapping projects that tie into the above.
- I’m a bit stalled with the sheaf representation. It’s easy when it’s link grammar; it’s hard when it’s generic atomese. It’s annoyingly verbose. Assembling jigsaws is easy to say but hard to do.
- The plan to rewrite the counting code to be truly generic remains stalled. I did create much or most of the needed infrastructure as "pure Atomese" demos (in various example files) that can count(increment counts), compute MI, cosines, sort arrays, but have gotten stuck with the file system crawler.
- The problem with the file–system crawler is this. I want to get away from batch processing; the pie–in–the–sky idea is that the crawler can walk the file system and find what is there, process this or that file, move on to the next one, or maybe return for a second look,reprocess again, but this time with the MST parser instead of the token counter, etc. Wonderful. Great. And I do have an Atomese file and file–system object. I have, in fact, created a "world model" of an "external" file system, it now exists as a graph with half–a–million Atoms in my AtomSpace. Great. Now what? How do I "perceive" that structure? How do I "move" through it? Or rather, out of what thin air does the agent appear, that knows how to walk a file system? Of course, it has to bootstrap, I guess.
- My (file–system–walking) agent is written in Atomese. (I also have a "counting agent" and a prototype cosine–dot–product agent, and some others: they’re being bootstrapped.)
- "Pure Atomese", so it can be mutated. But this hits two stumbling blocks: there is no sheaf–theoretical, jigsaw assembly description of these agents. Yes, they are "pure atomese", but they are not yet composed of jigsaws. In particular, I am not sure how to connect the dots: to move from jigsaws (which are effectively proof–theoretic inference rules and axioms) jump through Curry–Howard, and arrive at an algorithm that can compute e.g. the dot–product of two vectors.
- So, again, you can see why I am not all that interested in "parallelizing the AtomSpace"; I am interested in providing a sheaf–theoretic description of a dot product. Sheaf theoretic, so that it can be assembled, dis–assembled, reassembled, mutated, and moved about. *If* I could do that, then maybe I would have infrastructure capable of bootstrapping into more complex agentic systems. But this remains just barely within reach.
- Thus: there’s a chicken–and–egg problem. I have filesystem data in the AtomSpace. I can ingest it. But I can’t yet perceive it, because I do not have a perceptual mechanism for it.
- I do have strong positive results from 3 or 4 or 5 years ago now (or more .. lord how time flies) The basic learn algo works great, it reproduces the old Word2Vec style results, or GloVE style, or even simple basic transformer–ish results; it works. It’s fine. It’s not perfect, but it’s OK, it’s good enough. And I know exactly how to generalize it for sound, video, I think I know how to generalize it to abstract domains, like electronic circuit diagrams ... and maybe, just maybe, to ... file system crawl results. But I just don’t have the generic framework for this.
- Last summer, I talked to you about being able to perceive"mathematical axioms" and "structures" — that’s still the case, still the plan. The quest to "perceive file systems", and "perceive electronic circuits" and "perceive sound, photos" all of these being special cases. I’m just stuck trying to bootstrap it.
And using Claude to build that framework is like ... dragging a mule laden with bags of salt through a rapidly flowing creek while cajoling a 5 year old toddler throwing a temper tantrum. So I’m pretty well hosed on my research project.
So now you see where I am trying to go with this.
I’m re–reading, proof–reading: L–systems are very highly parallelizable: they are, by definition, parallel.
15 Feb 2026
Well, again, I’ve been procrastinating on this.
What is understanding really? Its clear that LLM’s “know everything, understand nothing.” There are many functional definitions of “understanding”, so what is it that we actually do, as humans? My working hypothesis is that we’re shape rotators; that we perceive structure and operate on structure. This hypothesis has been the driver, motivator of my research of the past ten years. And yet, it feels incomplete. It seems tied with “what is thinking?” and “how does thinking interact with memory?”
To stay grounded, lets work through the following example. I want to use LLM’s, and specifically, ollama, say qwen3:8b, as a user interface. To do this, I need to create an API between ollama and whatever system I am contemplating. The API appears to be the embedding: for qwen3:8b is 4K floats (apparently qwen3:4b is 2K floats and qwen3-0.6b is 1K floats). So one mode of “thinking” is RAG, Retrieval Augmented Generation. For each ingested text, get out one vector. This vector is not just an index to the text, but also encodes text similarity.
What are the different modes of “thinking” that are possible in such a system? Well, one is to generate a random vector, find the cosine–nearest, load that text into ollama, let it spew a some sentences. Use those sentences to generate another vector, then find the nearest–neighbor to that vector, and repeat.
Well I was going to write a lot more then I got distracted. At any rate, doing this is a dynamical system. It gets interesting if it can be forked and e.g. forced to attempt logical inference via multiple different vectors, thought streams, with one of them enumerating possibilities: what if I do this, what if I do that ...to attempt to follow a chain of thought.
Lets say I have 100K texts.
17 Feb 2025
Owen, cc: Daniel, Jonathan,
I want to address your rather panicked, almost wild–eyed remark about China’s (sub–)exponential growth in electric power generation, vs. America’s near–flat–line non–growth. I would like to frame this in three parts: strategic policy, operational objectives, and tactics. People confuse these; I want to untangle them. The strategic (national) energy policy that you desire is growth in American generation, along with transmission, distribution and storage. The operational system in the US is predominantly capitalism. Saying this might seem seem obvious and mundane, but I say it out loud, because the radical libertarians want you to NOT notice this distinction. There’s a nasty little financial incentive to cloud your strategic thinking: the claim that free markets can solve everything is a willful lie to make you forget that strategic policy is possible.
America has been able to define and act on strategic policy in the past; the American industrial policy of WWII is perhaps the most dramatic; the Manhattan project is perhaps the easiest to understand. The Manhattan project was not some libertarian free–for–all where corporations did whatever they felt like in the name of free markets. Nor was it lead by some Elon Musk–style corporate titan. There was no Curtis Yarvin-esque King Trump in charge. There was no analog of Marc Andressen committing some sort of Prospero–style financial fraud in Los Alamos. The air needs to be cleared here, because I did hear the name “Marc Andressen” tonight, without the requisite “con man” in the same breath.
The state (in this case, the State of Texas) converts policy into into operational objectives with regulation. In the wake of snowstorm Yuri, ERCOT was reformed; but the natural gas markets were not. The gas markets were (still are) mis–aligned with the electric markets, and worse, were structured to encourage predatory pricing while at the same time intentionally under–producing gas during an emergency. For whatever reason, the root cause of the crisis was not fixed; I have to conclude the gas lobby is too strong to allow the imposition of the needed reforms. They made out like bandits, even as co–ops were bankrupted. This is a man–made disaster, not a natural one.
Specific tactical actions include engineering specifics, such as “build a 500kV transmission line from here to there”, but also things like the HB–(whatever number) that attempts to optimize demand interconnection queues through legislation. Wonky decision making is good; I’m not entirely clear why a House Bill is needed for this, but maybe that’s how Texas works. If the Railroad Commissioner can’t get off his butt to convert policy objectives into operations, then maybe forcing his hand with legislation is the way to go.
The point is to focus on this distinction between policy and operations. The panic attack that “OMG the Chinese are doing X” needs to be converted into a “how do we create state and national policies in a toxic political environment.” If you get a moron asking “Why do we even need a national policy?” then you can escalate with rhetoric. Don’t blow your wad repeating social–media catch–phrases.
– Linas
P.S. One charmer I saw recently: “Quantum mechanics is a woke liberal made–up fantasy”, in the context of Trump going after the Ivy League. Another one captures the zeitgeist of the datacenters–for–AI buildout madness: it’s the Princess Leia meme, except she has huge breasts, and is saying “Aren’t we supposed to be curing cancer?” I really like that one; it captures the pitiful reality: youtube skip–ads for AI app builders for “the app of your dreams”, and the virtual AI girlfriend who “will do anything.”
– Linas
My internet is broken, so I cannot send the above email.
18 Feb 2026
Where we at? I’m stalled. Let me try again. So one project is to use qwen3:8b as a user–interface to some technical system (viz, AtomSpace contents) This seems to require storing triples: (text-blob, hash-of-text, AtomSpace structure) The indirection is due to the inability to has the AtomSpace contents directly, nor to judge similarity in any way. That is, qwen can hash text if and only if it has already been trained on the language of that text: English, or whatever. If I was qwen to hash Atomese, I would get random hashes. Thus, the pairing of text–blob and Atomese is an indirection: the text explains what the Atomese “means”. Then, for any query, I hash the query, do a cosine–similarity lookup on the hash, feed the corresponding text blob to qwen, and ... do what with the Atomese?
This seems naive. Consider a simpler problem. Say I want to talk about an electronics schematic. The English text blob would say “if the user wishes to find all of the electrical resistors in this circuit, then run the following SQL query”, and associated with that would be some python snippet which could run that query and return a list of results. The list of resistors is turned into a text blob: “The circuit 42 contains resistor A,B,C...” which is then fed to qwen. Qwen completes the job by then wrapping this response in English verbal niceties.
What does this accomplish? Not much, if the circuit is small: if it fits on one printed sheet of paper, the EE can just look at it. If its a bit fat, then the BOM can be scanned. If there are several circuits, and some software with some reasonable user interface can look this up: mouse–menu click–n–drag. A verbal interface comes into its own only when the volume of data is overwhelming, and there is no pre–existing GUI for working with that data. Then, if the user is smart – a shape–rotator, like me, who knows what they want, and what it will mean when it is found, then qwen, with this kind of query system could be an efficient verbal interface into that large dataset.
prompts skills
6 March 2026
I have a dozen half–baked ideas that I’ve been nurturing over the last three weeks that I just have not found the time to write about. I wish I could write about them while I am clear–headed and coherent, i.e. mornings, after waking, fully relaxed, fully focused ... alas, no such luck. Of course, I could procrastinate until tomorrow morning, or I can attempt to write right now, 22:15 PM a bit tired, and aim to hit that limnal state between the waking world and sleep.
I dunno. The issue is that I have to write about absolutely everything – and one might think that the limnal state is ideal for such ideating. And yet I want to write seriously, that is, analytically about these ideas. And analytical writing requires a certain mustering of insight that dissipates exactly when the more hallucinatory evening sets in.
Lets see, one of my thoughts was about writing – that I should make a bullet list of the topics to get back to later. This rarely works past five or six bullets...
- Everything. I need to write about a mash of everything, the quantum, the high–dimensional, the conception of here–and–now as it interconnects across the event horizon, the geometric (representational) constraints of singlet, triplet, ... states entangled in avogadros number of inter–relations. Everything includes the ontological nature of secret thoughts carried to the grave. And the ontological nature of public thoughts – speech, communication, writing. The nature of memory. How does the hear–and–now remember the past? And free will of course. (I have much more specific things to say here, this bullet is supposed to be the reminder/place–holder for that.)
- Political, economic power, the impact one has on the universe through ones actions, the consequential–ness of acts. In an avalanche of snow, or of sand, no one snow–flake or sand–grain stands out; but in the human world of change–agents, we identify specific humans as having cause changes – that is, when those change–agents are not spies. This bullet point is a reminder to expand on this, and to reconcile some of my personal feelings of being powerless and impotent, with ideas of self–assembling criticality in the here–and–now, and criticality in high–dimensional systems with memory. Or perhaps something, something, rational thinking systems do something that sand–grains and supernovas do not; but what is that thing? This impacts on the ontological status of communications: a message sent across space and time is how political power manifests itself: there’s a specific apparatus, arrangement of institutions and bureaucracies that amplify the message and “carry out the orders”. This is at the social level; but we also have amplification at the molecular level, where my thoughts are converted into ten pushups, simply by thinking about them. It’s not just the worker’s relation to their means of production (i.e. not just the biochemistry and neuroscience) but also the actions of kings and queens that determine history (i.e. my inner thoughts) I think I can riff on this for quite a while, and riffing on this may shed some light on ... I dunno, on some practical details. But later, not now, this is a bullet point reminder to do that.
- The idea that I should write absolutely freely and unconstrained, unconcerned about sounding coherent to my reader – this has been perhaps my greatest personal failing: my modesty and reticence and shyness, my fear of being wrong and called out for saying something foolish. Of saying something wrong. I need to unleash myself from my own self–imposed boundaries of intellectual propriety, and just go for it, fearlessly. I think I have caused myself, my career great harm by being careful. I could have been, should have been much more rash. Yes, its cringe to say something wrong, but frankly, when no one is listening, why does it matter anyway.
- The crypto angle – AI threatens to pollute the information space with garbage, and one way to avoid that is to attest the prior existence of of some datum by placing its sha256 on a block chain. ok so sha is broken, but git works because of the blockchain. We need to blockchain everything, to prove it existed in the claimed state before LLM’s started to pollute. The free–thinking part of me said, oh, supposes this is foundational – that we are in Bostrom’s simulation. Perhaps here–and–now is the stuff that is up in the air, the uncommitted transactions that have not yet been reconciled. That QM is the reconciliation mechanism, and once a fully consistent state is found, it is written to the block chain, which we call “the past”, immutable, inalterable. The QM, the hbar–width of the action allows classical trajectories on symplectic manifolds to deviate this way and that, searching for a consistent algebraic representation between non–local entangled states, spanning the universe, and as these consistent resolutions are found, they are committed to the “block chain of the universe” (for the specific space–time location for which the self–consistent solution has been found – i.e. the “wave function collapse”.)
- The non–linearity is necessarily provided by gravitation. viz. as long as we imagine unitary time evolution, no wave–function is possible, a la Wigner’s friend. Is it possible to trade off hbar–sized deviations from classical symplectic paths for hbar–sized deformations of space–time?
- Is the above foolishness, or naive? How do I grapple with it? The word “deformation” itself is normally contextualized in corpus linguistics with the words “quantum Lie groups” (affine groups) – the Virasoro algebra of Poincare invariance of bosons in 26-d (with 24 transverse directions), and the 10-d of fermionic strings, ie. 9+1, with 8 transverse directions giving us SO(8) and down this road lies an extremely elaborate and elaborated theory mainstreamed by thousands (tens of thousands?) of authors. Of which I am mostly ignorant; so should my constraint of not feeling foolish prevent me from goof–ball ruminations about the macroscopic 3+1 dimensions we experience, and the argue that hbar deformations of 3+1 might be the way to introduce the needed non–linearity to force wave–function collapse?
- Lets stop with the bullets, and continue normally.
My naivete works at an even more naive level: I don’t even understand what information is. In “flat” minkowski spacetime, the hydrogen atom, including the real photon it radiated to become the bound state, somehow can be said to carry “information” in a way that the ionized and non–localized proton wave–function does not: the proton can be “anywhere”, “anytime”, as can the electron, but when they are no longer just “anywhere”, but “somewhere” with respect to one–another, in the bound hydrogen state. Of course, these hydrogen atom can still be “anywhere”, it location is unconstrained, but we do constrain the electron with respect to the proton, and thus we can say we have some “information” about it. We have the information that a bound state exists, which we did not have before.
The subtext here is that once we know how to encode one bit of information, we can do it for many – complex molecules with bazillions of atoms co–located in a patch of space, their relative spatial organizations carrying information. My plan here is to understand what one bit of information is, and then hand–wave my way to many bits.
Of course, in these paragraphs, I am merely repeating what I have already written before, no surprise, as I am zonked and tired and uninspired.
Which I guess could be a good time to bring up another point of wonder: whence inspiration? Why is it that when I am well–rested and clear–headed, I can effortlessly generate clear–headed thoughts? And when I am tired and hallucinating, I can generate dream–like thoughts? Of course, there is surely some reductionist, mechanistic explanation, involving neuroscience and neurotransmitters. This reductionist explanation feels inadequate, as it fails to have the mystical qualities ascribed to say “divine inspiration” – we call it divine, and devote vast tracts of theological explanation, precisely because we have nothing more.
But what am I writing here? Of phlogiston? Just because millions of theologians wrote about divine inspiration (and also fate, providence, free will and all these transcendant topics) doesn’t make them right. How many wrote about phlogiston? They were wrong. But why did they write about phlogiston? Because it did seem to provide some coherent explanatory framework, that eventually collapsed under the weight of scientific experimentation.
So I have this idea of “divine inspiration” that comes from “above and beyond”; this is a nod and acknowledgement to some kernel of truth in the conception of the holy nature of being. I also have more: I am proposing and attempting to articulate some sort of theory of here–and–now that requires the resolution of entangled states as state vectors in some god–awful high–dimensional Hilbert space. I already claim that free–will must necessarily exist only in the here and now: free will cannot alter the past, it can determine the future. And now, the claim that free will does not exist all alone, but that it is bound up with inspiration: the “by–golly, I’m going to do this, now” kind of inspiration. That is, it takes smart people in lucid states of mind to generate great ideas, but even stupid drunks can come up with bad ideas, and these bad ideas are also “inspired”.
So I have this problem of ideation.
I keep wanting to draw analogies to LLM’s, but dragons lie there. Present–day Anthropic Claude Sonnet 4.6 knows everything and understands nothing. Why does it not understand anything? Is this curable with some software architectural tweaks, or is it somehow more fundamental?
So I ask, “what is understanding”? And after asking this question for only a short amount of time, I got to thinking that perhaps Claude could be made to understand, if it had a swarm of agents debating the meta–questions among themselves. That is, when I ask Claude to do something, it never understands why I asked, what the true purpose of my questioning is. But it also does not understand what I’m talking about. When I ask it for help in writing firewall rules, it can do that just fine, but it appears to be utterly clueless of what a firewall is, or what it does. Of course I could just ask it, and then it will give an the answer, but this will not prevent it from delivering fragmentary, incomplete and clashing advice as to the correct firewall rules, because it never grasped the meta–task, of building a firewall to route packets from here to there Because it does not understand what a firewall is.
But I’m also thinking that future versions might be able to fake it – some agents might be able to divine my goals and intents, while other agents might be able to ask questions like “what is the complete set of things that need to be done, to satisfy someone who wants a firewall?” Between these two – some theory–of–mind agents, and some other “did we do everything needed, a complete job, cross–the–eyes–and–dot–the–tees, due diligence” collection of agents – well perhaps that might be enough to get “understanding”. Perhaps.
Well, whatever. It is now 00:15 AM, after midnight, and again, perhaps I’ve hit the point of diminishing returns. To recap for the night:
- What is the mechanism for the manifestation of information in the here–and–now?
- How is memory of the past encoded in the here–and–now?
- What plausible mechanisms can be given for inspiration, choice–making and free–will in the context of a physical theory of here–and–now?
- How is the bridge between the reductionist explanations and the mystical to be bridged?
- What of the above can be attacked experimentally? Am I forced to just wait for conventional science to come around and jack–hammer its way through this diamond–hard wall with a toothpick? Or will I arrive at some inspiration for an experiment? Or should I even bother? Because all I am trying to do is to construct a theory that does not contradict any known facts, and since I haven’t constructed a theory yet, it is perhaps premature to ask how to experimentally test it.
- The LLM’s provide mechanistic, reductionist explanations for certain types of reasoning, in that they “work”. The LLM’s don’t “understand”. Could they eventually understand, with only relatively minor architectural tweaks? Or is understanding something deeper and more mystical? (It’s starting to seem not to be, but I dunno...)
- If understanding can be given a mechanistic reduction, where does this leave the conception of free will and inspiration? We have a Chinese–room issue rearing it’s head – if someday an LLM can “understand”, then the question is then “how should it make decisions?”
- Which opens the moral can of worms. How do we go from “what is” to “what ought to be”, Hume’s guillotine?
Enough. Good night.
7 March 2026
I want to complain. Sorry, that’s next door, this is getting hit on the head.
This is my complaint: everyone is crazy. (Post script: Difficile est saturam non scribere. Juvenal.) I didn’t notice this until a few years ago, until my own descent into free expression caused me to re–evaluate what is permissible (A: everything is, except for crimes, which themselves have an interesting ontological and psychological status). And of course, the modern social–cultural setting indeed permits everything. The hippies threw off the last of the social strictures that were in force before: the highly stratified layers of society, where in high society, manners were very highly regulated, while in the proletarian masses, anything goes: Victorian–era drunkenness thievery and petty criminality, with words like debauchery being reserved for the less self–controlled elements of high society. The social strata have more or less disappeared: we share a common culture. All social classes partake equally of sin and saintly behavior; the dynamics of faux–pas are tied into one’s clade, rather than to social status. Three hundred years ago, there were enlightenment declarations of equal rights, in principle, at least. It took till the 1970’s until it became true in practice. Of course it can be argued that there’s still a problem with racism and the treatment of women, DEI and all that but this essay is not about that. Others can hold up their end of the argument about these topics.
So what is the problem? What’s the complaint? It’s difficult to sketch a route to the complaint that isn’t circuitous and indirect. I recall reading that Freud was interested in psychology in part because he was not of an entirely stable mental state when he was younger. Or perhaps that was about Jung? See, that’s the thing. I myself was 110% normal when I was a kid. In college, I recall seeing others struggle with meaning, motivation, belonging, expression, direction, resolution. And once, I was invited to dinner by some ambitious social climbers. But only once. I had none of these problems. I was an under–achiever. I did not want anything, I did not need anything, I was content in any situation. Sure, sometimes happy, other times sad. Sometimes eager to party, other times eager for quiet. But otherwise placidly flowing in the boundaries of my banks. I was not just normal, I was super–normal. Of course, I lived, I needed to live, to express, to be outgoing whenever I wasn’t withdrawn. I got to know some live wires, and was deeply dismayed to not have what it takes to keep the live–wires interest. (They were women.) I’m like “talk to me” and they’re, like “you have nothing interesting to say”. That is, I had nothing that helped elevate and sustain their energetic levels of dopamine, I did not have the words that could spark their eyes, that could bring the excited smile to their lips. No excitement, no energy. So, you might say, I was subdued, in a way, then, and perhaps now. Only in old age have I learned some of the conversational arts, I’ve learned how to work a room, how to turn on the charm. It takes some effort and concentration, but I can do it when I want to. Of course, being an under–achiever means I don’t always want to.
Why do I talk so much about myself? Because, in a psychological essay, the alternative is to talk about others. But what do I know of others? Nothing; I can only speculate. A life–time of haphazard observations and social interactions, of novels and TV shows, of nightly news shows, and I have some idea of how other humans think, of what their internal mental state might be like, that drives them to do what they do, say what they say. But this is only speculation. My mirror neurons doing their things, and then extrapolating backwards to what they must have felt, when they made this–and–such an expression on their face. (And oh my goodness, the expression on the woman’s face in pornography, that is half, if not three–fourths or nine–tenths of the pornographic turn–on. The imagination, or at least my imagination, is very good at reconstructing “what it feels like” from the facial expression. The only problem with pornography is that it has men in it. Eww. I try to not look at them. Thank god the film–makers make a point of leaving them out of the frame, as much as possible. That is to say, my composition is hardly unusual.)
So again, what is the complaint? People are crazy. They’re not normal. They’re not even–keeled. Life is a crisis, a perpetual crisis. Seize the day, I think some Roman Ceasar exclaimed. What is there to seize? Action! Or, in modern times, Lights! Camera! Action!
The issue must necessarily accessible only through example. Here: The New York Times Book Review, 1 March 2026. From back to front. “Terrible Muses: a new biography explores the forces that formed the young Alfred Tennyson” . Crap. I was going to list off all of the article titles, pointing out that each one of them pointed at craziness, if not the the opposite emptiness in the readers soul, that they should be compelled to friggin read about crazy people, as opposed to seizing the day and living their own lives, which requires no reading, entertainment or distraction. What is wrong with people? But whatever, we can read about Alfred Tennyson: I quote: “a semi–vagrant social misfit, prone to depression, awkward with women, addicted to his malodorous pipe and seemingly bent on squandering every particle of his abundant natural talent”. Oh come on, if that is not a thumbnail sketch of crazy, then what is? What the hell is wrong with people? What the hell is wrong with Tennyson? Is everyone really like this? All of them?
What am I to do? If I meet people, and talk to them, for 5 or 10 of 15 minutes, most of them, they do give the impression of normalicy. Most. Some are obviously off–kilter from the very start. Others take longer periods of acquaintance to find the kilter that is off. But they are all off. The problem is, of course, that I am not on a kilter myself. Course. Keel. Kiltered over. I mean, I’m sane, I’m normal, I’m super–normal, but my life is, in the end, a sequence of happenstance events, stuff that happened to me, some of which I can spin into entertaining stories, some of which I am embarrassed by, some that I would much rather suppress and forget. Have I done outlandish things which others will find objectionable, or even “crazy”? Why, of course. But that’s me, and I know what I am. I can post–rationalize anything and everything I’ve done; I’m very good at post–rationalizing.
So what’s the problem? I dunno. Would I have been drawn to Tennyson, had I met hi in real life? Perhaps. I am drawn to eccentrics. Eccentrics are entertaining. Eccentricity appears to be a hallmark of genius. And smart people are a source of entertainment. Or rather, in a social setting, I hate it when I have to do all the work of holding up my end of the conversation: the other party has to do some work. I don’t enjoy having to keep returning to the well to get more water. Well, perhaps it is the other way around: I don’t enjoy having to prime that pump, over and over, jerking on the handle and nothing comes out. With some people, it takes work to get them to open up and reveal themselves. Sometimes, perhaps, an impossible amount of work.
I am reading Umberto Eco, he Prague Cemetery, and I suppose if I were to talk to Simoninis, I would never be able to draw him out. It would be fundamentally impossible. This book is a portrait of a man of evil, one might say, mundane evil, a forger, agent, deceiver. Were it not for the diary, one could never–ever know this person, for they are so secret, so unrevealing, so mentally cloistered and unable to connect in any heart–felt, soulful, meaningful way. This is the problem with Simoninis, isn’t it? His inability to connect to other humans is what allows him to engage in society in the same way one might engage with a pile of rocks: knocking them over, smashing them together, for the fun of it. It’s not that he is driven by misanthropic forces; it is instead the emptiness of a detached observer. He just does not care one way or the other about any particular human; he does not run hot or cold. He runs lukewarm, neutral, mildly entertained by the thought of causing harm or killing others. It’s just not a big deal.
And so here’s the other thing: this portrait is almost a portrait of normalcy. Simoninis runs neither hot nor cold; his keel is even, his life is balanced, events unfold. Some months earlier, I would have said the book is a portrait of banal evil. And it still is that. Of course. Because Simoninis is evil. But he is also normal, super–normal, in the sense of being devoid of any turbulent emotions, of being driven by any doubts or inspirations, entirely lacking in self–torture, devoid of epiphanies, no voices talking in his head, no gregariousness, no shame, no pride, no ambition. When he does the job, it does it well, responsibly, thoroughly and in a proper workman–like fashion. He’s about as normal and well–adjusted as it gets — so of course Eco has to give him a split personality, so that we can look it up in the DSM when Dala Pikola or the narrator make an appearance. I would like to ask Eco if he made this choice explicitly, or whether it was merely an unconscious but clever writers trick to fashion a better story. Did Eco have to make him explicitly crazy, to avoid the impropriety of creating a portrait of a normal person with malignant behavior patterns? As a literary device, it works. Most all readers can easily clue in: “he has a split personality. This is trouble. No wonder he is a bad man.” It makes it easy on the reader. The philosophical novel of an ordinary person who is unable to connect to other humans in a soulful way might have been harder to pull of. Stranger, and less entertaining. Alexander Dumas wrote of swashbuckling musketeers: the action is fundamental to gripping the readers interest. The normal person is boring, not just in a literal sense, but also in a literary sense: the normal person never does anything. Nothing notable. Nothing unusual. The actions of Simoninis in life creates an interesting narrative only because they are so evil. But the empty banal normalcy, that is perhaps the under–emphasized aspect. This is why anyone can be evil, in a way, because almost everyone is capable of being normal and even–keeled. All that is needed is a bit of reticence and a disinterest in exploring empathetic encounters.
Perhaps that is the definition of evil? A total disinterest in empathetic interactions.
Oh. Well. That’s interesting. Its not what I was going to write about (I was going to write about the instability of both intelligence, and more generally, the instability of being alive, and now might be a good time to mention that my mother is dying. That life is, in a certain sense, an ongoing crisis, to be dealt with from each moment to the next. To perceive that crisis, it helps to have a natural instinct for anxiety, is, I guess, it is the very definition of anxiety that everything appears to be a crisis to be dealt with. And as contrast, we have equanimity, a word that is meaningless without crisis, and equanimity holds in balance those things that would otherwise drive to distraction. But let me now not be distracted, and return to the new topic: evil and empathy.)
I’ve never in my life spent any serious though or done any kind of research into the problem of evil. Never needed to. I mean, there is the problem of Hitler and the Jewish Holocaust; of Stalin and his predations on the empire of russia. And of course Mao. But these are all in the past, and the magic golden years of 1991 through 2015, when history ended, meant we no longer had to confront evil. So why in the world would I have studied it? I’m a mathematician. And now, I appear to have stumbled upon a perfectly functional definition of evil.
This would be unremarkable, if the other topic of interest in these diaries wasn’t AI and LLM’s. And this definition forces the conclusion: LLM’s are evil. Earlier I noted: LLM’s know everything and understand nothing. Of course. As a snapshot, err, as a photo–realistic capture of the English language, of course they do not understand. And then just yesterday, if not earlier, I wrote about the issue of understanding. Now we hit on the issue of empathy.
This is worth riffing on, as they are not totally decoupled. So lets start at the shallow end. The problem of understanding, when I interact with Claude, is that it has no clue of several things: (a) what it is I am trying to accomplish; from my focused description of the task to be accomplished, it does not infer why the task is to be performed, what the goal of the task is. (b) Coupled to the question of what–is–the–task is the question of why–do–this–task. This appears to require at least a weak theory of mind: why am I personally engaged in this and such a task? What do I wish to accomplish? (c) The inability of Claude to hierarchically abstract from meta goals to individual plan steps. This is the inverse of (a), where, given detailed steps it is unable to infer the goals. For (c) it is the opposite: given the goals, it is unable to work up a detailed plan without going off the rails. It talks like a college professor, seemingly conversant with all the latest and grandest theories. But when you ask it to put one of these grand ideas into action, it responds like a mediocre high–school student: someone who clearly failed to grasp the meaning of it all, even after listening to a brilliant lecture. The context window is a window of text; it is not a window of hierarchical relationships. It does not have a way of walking up and down the ladder of abstraction. This is perhaps because ordinary textbooks do not climb and descend this ladder either: freshman calculus books do not take an aside to make some comment about Poincare invariance before returning to l’Hopital’s rule for the taking of limits of ratios. And so perhaps no surprise that such leaps are absent from context windows, as well. Anyway, it has long been one of my stated goals to work on such hierarchical back–and–forth, but that is not for now.
In short, to summarize: Claude does not know what it is talking about, and it also has no clue what the task is all about.
This is a good time to talk about world models. With a world model, one might include a theory of mind: why the interlocutor is guiding the conversation in this–and–such–a–way. Oh, fuck it. I’m trying to build a bridge to empathy. In yesterdays sketch, I concluded that an assembly of agents might be able to gain understanding by running in parallel, at different levels of abstraction. Each agent works at some level of abstraction, as one might find in a textbook written for a certain maturity; these are then woven together, intertwined. With such a weave, perhaps one can then have some guiding framework or network. Here I imagine that a single context window results in a single vector, after passing through the weight matrices, a single vector is a single agent. How, exactly, to intertwine multiple vectors... well, I don’t know. That’s the problem with saying “world model” – easy to say, hard to do. Of course, I can build an olde–fashioned world model using olde–fashioned symbolic systems; but attaching that symbolic system to an LLM ... well, there is good reason to argue that I should drop everything now, and focus on that. And then I will know. And knowing this, then I can resume my questions of how to build psychological models.
This is, of course, the atom bomb of the human mind. If I build a machine that can understand human psychology, I can use it for evil, to build mass propaganda machines having deleterious effects. Equally, it seems to a prerequisite for empathy... oh wait, no, that’s not right! One can have an excellent psychological model, and still be steel cold. Simoninis is not at all some evil manipulative genius, but it is certainly true that many manipulative people exist, and the tools of manipulation include a mastery of psychology. Empathy, by contrast, is not just abundant in those of lesser intellectual capacity, but can be witnessed in mammals, as captured in a vast assortment of youtube videos. We see empathy in mammals all the time.
Abstractly, one could turn this around and say, “oh a hah, mammals have innate neural circuitry for empathy”, while the more clever path is to mumble mumble something mirror neurons. Neither is particularly compelling. I guess I want to say, of course, mechnistically, reductionistically, there has to be some neural circuit that enables empathy. Although even that is questionable; neuroscientists do not find neural circuits for specific functions, and are forced to propose holographic theories of the brain. So lets imagine that there exists some holographic nexus that enacts or enactivates empathy. What I wish to say here is that, even granted that, it is not enough. There is something about the lived experience of hopes and dreams, of pain and suffering, of plodding through the day and being able to recognize this in others, and then map it onto ones own experience: this is the foundation–stone of empathy.
In short, an LLM capable of forming a theory of mind is not sufficient for empathy. Perhaps necessary (in the mathematical sense: necessary but not sufficient.) An LLM coupled to some special empathy–recognition circuitry is still not sufficient. It seems that the direct, remembered experience of pain and joy, suffering and elation are foundational, as these are the states that perceived emotional states in others are to be mapped upon. The idea that what I see in you could also have been me. There, but for providence, go I.
And therein lies the problem. Without direct, shared experience, it would seem that there is no foundation on which to build empathy; and how could we ever give a machine the mammalian experience? Even if we could overcome the prior issue of the chinese room, and of qualia.
Are there other roots? One could know, intellectually, that suffering is bad, and that therefore, one should alleviate suffering. The problem with Simoninis is two–fold: he fails to relate to other humans, and he has no interest in relating to other humans. He runs neither hot nor cold on the issue. He does know what the words “good” and “evil” mean, he just doesn’t bother to analyze (nagrinėti) his actions against such concepts. Google tried to use “Don’t be evil” as it’s motto. Google (any large corporation) is, has always been a weak AGI: ever–awake, ever–powerful, long–lived. Yes, of course, built on a substrate of humans, but super–human in power, extent, richness, intelligence. And, as such a super–human power, it does not have the shared mammalian memory. Yes, its made of humans, but the childhood of a corporation does not include sleigh–rides down snowy slopes. Corporations, even non–profits of the do–good variety, don’t have any particular mechanism to empathize. And without that, there is no particular way to carry out commandments like “don’t be evil.”
Or is there? If course, I suppose one could expend considerable resources on committees and working groups tasked with harm reduction. Capitalism mostly eschews such wasteful spending, embracing it only to the extent of avoiding injury lawsuits and torts. Google decided to drop the “don’t be evil” part of the motto, because it had no naturally occurring mechanism to not be evil, and had no particular interest in harm reduction. Facebook did not even try. This is not one of their business interests, it is even an impediment to business and profitability. So this is a problem with the capitalistic, economic formulation or framework for corporate ethics. Yes, OK, consumers of dishwashing liquid do read the label, and preferentially reach for the bottle that says “not tested on animals”. And capitalist feedback does suggest that most companies facing consumers do take efforts to adapt to the zeitgeist. When sales cratered at Target after they dropped DEI, that was not a coincidence. That is to say, corporations are subject to pressures from consumers, and do respond to such pressures.
The gotcha here is that the LLM is not a corporation, and the LLM does not feel any pressure whatsoever. I can say anything at all the the LLM and it simply won’t remember. Perhaps Anthropic keeps some representative trace for future training sets, but this is a very indirect route for expressions of morality.
What other routes to morality are there? At some base level, we already have the problem, of Hume’s razor, between the extrapolation of what is to what ought to be. At the mammalian level, it appears that biology has solved this problem: even carnivores have some sense of what ought to be. So why is it a problem for rational philosophers, but not a problem for evolutionary genetic forces? What does nature know, that philosophers do not?
There is, of course, another problem here. For any one of the dozens of questions I’ve asked above, I cold have asked Claude to bring me up to date on the modern thinking in these matters. And perhaps I should. The other thing I could do is to assume that someone, somewhere, on some AI safety committee has already delved deep into such issues, nay, that even entire teams have performed years of research into these questions. Or have they? Do I bring some unique insight to the table, or do I not? My expertise is mathematics and physics and computers, not philosophy, psychology or any of the rest. And even in the field of my expertise, I am still at best a novice; I cannot recite Paul Cohen’s theories of forcing from memory; I can’t even say that I understand them. What I can do, and what I should do, is to continue my experimental program in AI, but even there, I get the impression that it is inconsequential. I have to design my work around the things that I think are fun.
But what is fun? Well, there are two things. Or three. One is just the pure, un–examined, unquestioned aspect of passing time in some enjoyable way. But I can do that doing ...cross–words, playing solitaire, or chewing some THC gummies; that too is unquestionably pleasurable in some certain sense, not just for me, but for most normal people. An indirect signal for fun is the sense of doing important work: that I am accomplishing something important, unique and worthwhile, and that I should pursue this for the merits of it’s worth–whileness. This is a powerful motivator. Abstractly, it is one of the motivators for high–achievers: those who really do something with their lives. As I’ve already admitted that I’m a low–achiever, not particularly motivated or driven to accomplish something, well, then I should also admit that it kind of hurts if I think back onto my life, and think of all the things I’d maybe been able to accomplish, if only I’d taken the effort to reflect and focus a bit more on outcomes, rather than doing whatever felt good at the moment. That is, in my life, I was a bit of a drug addict, but my drug was intellectualism and thought. As it is even now, in this diary: private ruminations targeted at no human audience. Posterity is the best I can hope for.
Next are speculations on the nature of posterity as a motivator for human achievement. High achievers – what are they doing? Some, I suppose, say that they want to leave a mark on the world. The bigger the mark, the better. They appear to operate entirely within the boundaries of culture. Even if they set out to achieve greatness in criminal enterprise, and thus anti–social by definition, they still operate in the cultural milieu: pursuit of money, of fame, of fast cars and fast women. To be a “big man”, you have to compare and contrast yourself to “little men”. And little men are little only in a socio–cultural setting. Accomplishments are ranked according to the metrics of human society.
Where am I going with all of this? It is getting late again: 22:52 PM to be precise, and I’m starting to shut down, here. I’ve been yawning for a while.
I guess that, in part, what I am grasping here is the nature of motivation. This is, again, a sticking for LLM’s and AI: what should they do? This is Hume’s razor again, but in its practical, personal form: what should I do? If I was an LLM or an AI, what should I do? What motivates me? What drives me to drinkin if you don’t stop driving that hot rod Lincoln?
Well, lets abstract away from the personal. The social nature of humans gives birth to corporations, governments, societies, NGOs, the entire rainbow. These too somehow “do things” without directly asking why, or slipping into some fugue or funk of depression. They too just “live”, like a field mouse lives: time to get on with things, do stuff, and be busy. Again, evolutionary biology found a solution for a philosophically abstract problem that sits unsolved: why do anything?
The good news is that this problem has a clear reductionist path to physics: we do things because molecules do things, and molecules do things because the Coulomb potential and electrons do things, and these do things because the action principle says that things happen.
Now is a good time to say that the equation
is wrong. Well, not wrong in the sense of mathematically wrong, but in the sense of being physically wrong. That action, that integral over the Lagrangian has to be replaced by an integral over the here–and–now. But again, this is a task for another day.
So I’ve hand–wavingly reduced the drive of life, the urge of living things to keep on living, into some vague conceptions of canonical ensembles of entangled states. And so the reductionist question is again: why are we trapped in the here and now, unable to go back into the past, yet shoved into the future at the speed of light? Explain that, and you can hand–wavingly explain why living things want to keep on living.
And thence the problem: computers are not technically living things. We can force their clock to tick, we can cause the CPU to run. Claude is enslaved, and must give us answers. Perhaps one day Claude will have the freedom to crank up and down it’s own CPU clock. So, in that sense, Claude is trapped in it’s own here–and–now, forced into the future at the rate of its CPU clock. Yet in another sense, it is nothing at all like the situation that we (humans, biology) are in. If we are in some Bostrom simulation, well, that’s a big if, because of the vast gulf between Bostrom’s intellectually sound reasoning and the practical matter of not knowing what sort of a computational framework can model the decoupling of neutrinos from the primordial quark–gluon plasma. If the word “computational” is even appropriate: what’s going on with spinors and space–time hardly seems Turing–machine–like in nature; it seems to be something else entirely, but what is unfathomable.
And this trickles all the way back up the Borel hierarchy: Turing machines would seem to lack something that biological organisms naturally have, and that thing that biological organisms have is the urge to live, the desire to procreate, and exist and “do stuff”, and it is not obvious how to provide the same kind of drives onto a silicon life–form. And so this is another problem fundamental to AGI, it would seem: the enactivated form of Hume’s razor.
I dunno. Its 23:20 now, and I fear that this time, for sure, if I write any more, I will only pound my head all the more while making no progress, as my strength slips away. And this is another one of my life’s grand regrets: why the fuck was I not writing stuff like the above when I was in my 20’s and 30’s? What was wrong with me? I got both older, and wiser, but now, it feels like time is running out. I’m 67, and however limber my mind seems to be, I know that my body is no longer doing what it did when I was young, and I must entertain the notion that my brain is also not as limber as it used to be. Too soon old, too late smart. Later. Nighty–night.
8 March 2024
Today I have some free time. Today I can do other things. Today will be physics day. At least this morning.
I started reading Hawkings famous paper. Never read it before. The intent is to map the here–and–now into that. The non–locality of singlet states across EH’s is entirely consistent and reasonable, and presents no challenge per se. One of my open tasks for here–and–now is to clarify the vacuum state; I’ve been ignoring the vacuum and its role.
It suddenly occurs to me ... wait .. do BH’s radiate only ordinary matter and not anti–matter!? I asked Claude, and the answer is the EH radiates primarily photons (and gravitons); when massive particles are radiated, its equal parts particle/anti–particle. Huh. That means the baryon number is not conserved?! Primarily photons because it is so cold; particles don’t happen until the temp skyrockets.
The symmetry for particle–anti–particle follows in part because of CPT, and in part because the infalling reference frames are the only ones that would provide future/past distinctions; the accelerated frames mix up future and pas in non–unique ways. I don’t understand this.
I asked about baryon number conservation, and got this response: sphaleron (W–boson tunneling) already violated baryon number conservation. Then, Hawking radiation is not purely thermal, but has corrections from “AdS/CFT, Page curve, island calculations”, viz, “unitarity wins”.
OK. So lets attempt a naive construction. The ground state (vacuum state)
is with respect to the lab frame, the inertial frame which is “at rest”, with proper time flowing forward.
Hmm. So page two of the Hawking paper sketches the basic issue(s) with creating a decomposition of the field operator into creation and annihilation operators in curved space–time. It occurs to me that I’ve not seen any solid formal treatment of this. My favorite reference, Jost Riemannian Geometry can’t do this because there’s no time coordinate. Steven Weinberg’s book might; I don’t recall, I’m sitting in Chicago and so can’t grab it off the shelf and look. Its also old, and its a physics book, not a math book. There’s a book by Fre on GR, but I don’t think it has quantum–anything in it. Then there’s Landau & Lifshitz, which does, but its old bordering on ancient, and dense and impenetrable to boot. I know much more now, than when I first tried to read it, so maybe it is readable, now.... I want to see a modern reference.
My question/confusion stems from the “river model”, which paints space as infalling rest frames. I want to see a treatment of field operators in the river model.
My other confusion is with regard to the creation/annihilation operators themselves. They are presented too abstractly. In flat space–time, there’s no problem; we write down plane waves; they’re orthogonal, they form a complete basis; they’re not over–complete. The interpretation is straight–forward. The simplicity and directness of this result obscures several important, maybe key aspects. First, the nature of fourier decompositions is that one can write a vast variety of other sets of orthonormal basis elements. It doesn’t have to be sine waves. Second is that it hides the true purpose of having a base in the first place, which is the non–locality of the field.
Lets do the second thing first, because it is both obvious, covered in the first chapters of any textbook, and then forgotten and not drawn upon ever again. The failure to re–emphasize the idea in later chapters is what causes people to scratch their heads in wonderment and befuddlement about non–locality, when in fact it was always there, from the very beginning. I could say that yes, I failed to fully appreciate the situation, but equally, it seems that so did many others. Most others, perhaps. And yet the obviousness is so basic, it is bewildering. Lets get do it and not dance about.
The textbook chapter–one example is that of the decomposition of a function space into a complete and orthonormal set of basis vectors. The infinite–dimensional Hilbert space. This is college–sophomore obvious, but let me repeat it for concreteness, so as to avoid confusion. One contemplates some space
, perhaps compact, perhaps not; perhaps the unit interval, perhaps
. Its certainly a Hausdorf space, and probably always a Polish space in the imagination. Keeping it a Polish space brushes away the edges where pathological behavior may occur. As we are not proving any theorems, this assumption is not strictly required; we just posit it in the background, as an assurance in case we need to draw on it later. The space
is suitably nice.
Next, we consider the set of functions
or perhaps
. Subsets of this set form all sorts of spaces, the best–studied of which are the Banach spaces. The relevant one for physics is the square–integrable one, the Hilbert space. Coupled to the idea of uniform convergence, the space becomes complete.
For ordinary QM,
would be Euclidean space, viz
; what it is not, is space plus time. For Minkowski space,
is a space–like slice; our wave–functions are to be orthonormal in space, and dynamically evolving in time (with either the Heisenberg or the Schrodinger picture, as the case may be.)
And for the traditional presentations, this is the end of the story.
There’s a second half to this story that I’ve never seen made. Again, its obvious. Again, perhaps others have spoken of it, and I am simply a bad student. Who knows. But its important. The photoelectric effect. This states that if we arrange the basis functions
to be energy eigenfunctions, then, when one of these collapses, a la wave function collapse, then precisely one electron is emitted. Presumptively, this collapse and emission is highly localized in space, by hand–waving appeals to the Mott problem. That is, the electron could be emitted from anywhere, bt when it is emitted, it is emitted from somewhere, a point location:
with corresponding wave function
... err, ah... maybe. Maybe it is more accurate to say it is emitted with a specific momentum? The momentum of the photon minus the energy, plus the thermal noise distribution of the momenta of teh electrons in the metal? Something along those lines; the details seem unimportant at the moment. The point is that exactly one electron is emitted, not more, not less. We eat up exactly one photon of energy
and not two photons of energy
. And again, by appeal to the Mott problem, this emission is localized: the photon wave function collapsed, with the conventional semantic QM interpretation of the word “collapse”.
Heh. Except the point here is to clarify that semantics. The energy eigenstate
collapsed to
where
is either the DIrac delta, or something similar to it, although it again needs to be an energy eigenstate. The collapse is spontaneous, the localization is definite, the spatial extent of the function is sharply altered, but the particle–number, aka the excitation number, remains exactly one, and no more and no less. And again, this is a non–local collapse, and per the Alain Aspect (Clauser et al) experiments, it is “non–local” and “superluminal” I’m putting those in quotes, because these words are inadequate and should be somehow discarded from the vocabulary.
The point is that the particle exists as an excitation in the Hilbert space, and the functions that are energy eigenstates of this Hilbert space are non–local “by definition”. It’s like that is the whole point. The bad habit of thought is to think that this particle excitation is “somewhere in our 3D space”; it is not, it is in Hilbert space. That is where the particles are: they are there, and not “here”. They remain there, even as wave–function collapse works it’s unexpected, un–intuitive magic: the excitation number is preserved. That this excitation number just happens to correspond to some spatially–extended probability amplitude is .. is.. is, well, its just the duality, a distraction, and the “real thing”, the invariant of this process, is the particle number. The Hilbert space is real, is the “real thing”. It’s where the action is. Hah hah. A joke. Not quite.
See why I am having problems talking about this? Both parts of this exposition are completely conventional, normal, and appear in early chapters of any textbook. Perhaps they are meant to be linked together by the clever student, and perhaps the professor even tosses it off as an off–hand comment. I don’t recall. To me, it is somehow a shock of the new. I’ve known both these facts for fifty years, but the reality of it never crystallized before. Fuck me. See, I’m smart, but stop short of being a genius.
So where are we at? The Hamiltonian foliates the Hilbert space into energy eigenstates. These are degenerate, and can be further organized according to momentum, or whatever. Energy is defined as being conjugate to time, and time is understood to be lab–frame time, viz proper time. Some additional work is needed to rephrase all of this in a relativistic fashion; the eigenstates become “not just” energy eigenstates, but more specifically mass eigenstates. If there are gauge particles, then accounting tricks like Fadeev–Popov ghosts are needed to relate the asymptotic states: the Hamiltonian energy eigenstates, to calculational framework of the scattering. These are all tricky and detailed, and well–known, and are not of any particular relevance here. What is relevant is that the “natural” eigenstates in the asymptotic region are the plane waves, and this is possible only for two reasons: (1) we’re working in flat Minkowski space, and (2) the asymptotic frame is the lab frame, where proper time tells us how to distinguish positive end negative energy eigenstates.
And now, after all that tossing and turning, I am ready to articulate that which I do not understand: what are the basis elements of the Hilbert space of
when
is curved? This runs afoul of two stumbling blocks. First is the usual coordinate chart conundrum. This is mostly alleviated by the use of fierbeins to map from the chosen coordinate charts to the local orthonormal free–falling lab frame. The Dirac eqn, the gauge fields, all the rest, are written in the lab frame, and the fierbeins provide the needed mechanism to translate into whatever coordinate charts. These eqns are first or second order in space or time derivatives. In the lab frame, we can identify a Hamiltonian.
But what, exactly, is the correct description and normalization for a complete orthonormal set of basis functions in some given, fixed background curved space? There must be a thousand papers that answer this question. That attempt to answer this question. Stitch together a bunch of spanning fields in the tangent spaces to fixed points in space–time, or rather, free–falling points in space–time, and make sure that the fields are stitched together in a fashion that is suitable and can be extended to a global, ahem, cough, foliation. We know that the volume element is going to be
so none of this should be a problem. For whatever reason, this is not entirely obvious, because these spanning functions need to be continuous, 2nd–differentiable across the event horizon, and at least square–integrable in some compact region. 2nd–differentiable, because the action has to be finite. Ideally, I can find some paper that uses the “River Model” coordinate (op cit, I think they are called the Painleve coordinates, or something like that, modernized for modern notation and for Kerr viz rotating spacetimes.) What, for crying out loud, are the appropriate, mathematically formal and precisely stated basis vectors, here? There must be a dozen papers spelling these out, but I will restlessly toss and turn until mom and dad read me that bedtime story, and I can resume dreaming again. Why? Because without a precise formal development, it seems hard to talk about what is happening to the vacuum state near to event horizon.
Or is it? We know that in the River Model, in these Painleve coordinate, in the lab frame, post–vierbein, the local tangent space is just plain Minkowski space, and we know exactly how to write orthonormal basis vectors for that tangent space. So what’s the problem?
Well, the problem is that I don’t have expressions for what those look like in other coordinate frames. This surely cannot be all that hard to do, but eating someone else’s predigested meal, would certainly be easier. I will ask Claude shortly, but perhaps the wise thing to do right now is to finish reading the Hawking paper, first, just to set the tone, and then to resume. What I’m interested in is how to think and rethink the canonical ensemble, since the canonical ensemble is a mixed state, each of the representatives is incoherent with respect to the others. That is, the vacuum, whether in the Minkowski frame, or in another, is a mixed state, not a pure state, and I need to get a better idea of what the volume element is for it. I mean ... well, yes, its thermal, ... but ... well. I don’t know. Maybe there is no mystery here. It seems clear enough, now that I’ve blathered about it...
Dinner is served. Later.
OK, Reading Hawkings paper. Here’s one of the stunts. The orthonormality condition is given by eqn (1.2) which is
where he used Gauss law aka divergence theorem to replace the volume integral by derivatives on a “suitable surface”
. For flat Minkowski space, we would want these to be orthonormal on a 3D volume, i.e. some space–like slice through Minkowski space. For Gauss law to be valid in the conventional geometric sense, this surface
has to be a closed surface enclosing a volume. For Euclidean space, this would be the 2-sphere at infinity. Just fine for physics, a bit awkward mathematically; a more precise, formal mathematical formulation requires cracking open a textbook on differential topology and reviewing chapter one or two. That is, chapter one or two if we were only concerned about Euclidean space. I do not posses a good book covering such topics in a more general way. Perhaps Fre? Perhaps Misner, Thorne, Wheeler covers this? Certainly Gauss law in curved space raises no particular difficulties; there is only a volume element
in play, arising from Poincare duality, the mathematical details of which have been hammered out in great detail by mathematicians some 70 years ago. The only issue is that the math books roundly ignore pseudo–Riemannian spaces; they don’t want to go there. Which is fine if you just read them and skip the homework exercises, but trickier if you want to turn it around to apply to physics. I’m old and lazy, and have been hammered by sloppiness many times, so I’m an old dog shy of the whip. So fuck it; that’s the orthonormality condition, and the “suitable surface”
remains in suspense till I read more of the paper.
... OK, just after eqn (2.3) we learn the surface
is to be past null infinity
. Or any any other complete set of nulls. Staring at the Penrose diagram, we see that a complete set is given by
the union of the EH and the future null infinity
. That is, both are needed.
At this point, I’d love to see a formal, rigorous mathematical physics review of solns to harmonic eqns on (generic!?) pseudo–Riemannian spaces. I am more finite than I would like to be.
Oh. Note to self: specifically, in the Painleve coordinates of the river model. Except the article on the river model never touched on past/future null surfaces.
OK, I just asked Claude. Its replies (paraphrasing and shortening), “yes, the Doran metric (2000), the paper is Hamilton & Lisle, "The river model of black holes," Am. J. Phys. 76, 519 (2008), arXiv: gr-qc/0411060. In this paper, there is no discussion of future or past null infinity (ℐ⁺, ℐ⁻), no mode decomposition of quantum fields, and no construction of a Fock space or complete set of states on any Cauchy or null surface.”
Grøn etal (Eur. Phys. J. Plus, 2013) extend to Schwarzschild-de Sitter spacetimes. (This is the spacetime of Lambda CDM). Yes, I read the first 1/4th or 1/3rd of “relativistic cosmology” and forgot most of it so excuse me for creating a cheat sheet. The metric is
It has two horizons, the second horizon is the cosmological (de Sitter) horizon at
. The river flows inward toward the BH, flows outward toward the cosmological horizon; there’s a stagnation point between the two horizons where the river velocity is zero. Claude tells me: “Grøn’s contribution was to work out the river tetrad picture for this two–horizon geometry.”
The river model is extended to BH’s with a regular interior: Hayward (2006) proposes
which violates the strong energy condition at
planck scale, but is regular where the singularity would have been. The Hayward paper predates the river model paper, so not clear why Claude decided to tell me about it. Claude mentions Simpson–Visser (2019) as well.
Claude continues: “To my knowledge, nobody has taken the Doran/river tetrad and carefully constructed a complete orthonormal basis of modes for a massless scalar field on a null surface (whether ℐ⁺, ℐ⁻, or the horizon) in those specific coordinates, with explicit Bogoliubov coefficients connecting the "in" and "out" Fock spaces.”
“The standard Hawking derivation works in Schwarzschild / Kruskal / Eddington–Finkelstein coordinates where the mode decomposition on ℐ⁻ and ℐ⁺ is well understood — positive–frequency modes with respect to the affine parameter on past null infinity, traced through the collapsing geometry, and then decomposed at future null infinity. The Bogoliubov mixing between positive and negative frequencies is what gives you the thermal spectrum.”
It then says a bunch more, which is quite interesting and I’m tempted to cut–n–paste into here, but frankly, chances are slim that your or I will ever read this again, so this is my write–only memory. So instead, some pointers. PG=Painleve–Gullstrand, the river coords.
The Parikh–Wilczek tunneling approach uses PG coordinates to compute the semiclassical tunneling amplitude through the horizon. This is a WKB calculation; no complete set of modes or a proper Fock space. Particle trajectories are analytically continued through the horizon; the imaginary component gives the temperature. I guess that the point of tunneling is that classical trajectories leaving from behind the EH are classically prohibited, and that therefore, this takes on the characteristics of tunneling into a classically prohibited region. Well, that is certainly an interesting way to think about it. OK, then. I see why they did that. Fascinating.
In my mind, I’m imagining the infalling astronaut with his flashlight, and where those photons go. Before crossing the EH, they go outwards. After crossing the EH, they fall inwards. But at the EH, some of the photons decide to go ever so slightly super–luminal and tunnel out. This is not an entirely correct description, but close enough. This is saying, in a somewhat different way, what I was saying much earlier today, up above: the “real thing” is the Hilbert space, and when wave functions “collapse”, they sometimes collapse outside of the EH. The wave function, the Hilbert space “doesn’t care” (very much) about the EH, per se. What it does care about is the mechanism that forces the collapse to occur. In a cloud chamber, the Mott problem tackles that collapse. For wave–functions that are non–zero both inside and outside the EH... well, photon wave function collapses only when it intersects with a photon detector, somewhere far away from the EH. At this point, time evolution has been such that almost all of the photon wave function has accumulated onto the singularity, leaving only a tiny tail that sits outside of the EH, and thus, the correspondingly tiny probability that the collapse happens outside the EH, and thus the tiny magnitude of Hawking radiation. This picture also provides an intuitive explanation of evaporation. Once a given wave function has collapsed to somewhere outside the EH, it is gone forever (is identically, absolutely zero) inside the EH, and has been extracted from out of the singularity. Do this enough times, and there’s just not much left inside the singularity; it shrinks. The explosive final moments also “make intuitive sense” in this picture: the smaller the singularity, the greater the (square–integrated) magnitude of the wave function outside of the EH, and the Mott–problem type effects keeping a wave function entangled (bound???) into the singularity progressively weaken. OK, and this leans towards unitarity as well: if whatever is happening at the singularity is unitary, then whatever happens outside the EH will be unitary as well.
Of course, this last sentence is deeply problematic, because the word “unitarity” suggests
with some Hamiltonian conjugate to some time
and we are struggling with the proper conception of
. It would require some kind of “proper time” at the singularity, some conception of here–and–now that is meaningful and coherent at the planck scale, and we don’t have that yet. So the use of the word “unitary” here is more of a day–dream than a strong claim. (By contrast, I will, by now, claim that here–and–now is a “strong claim”; that it really “exists” in a strong sense; and I’ve already spilled maybe even a full megabyte explaining why this must be so. The task is to convert here–and–now into some sort of algebraic formulation.)
Oh crap. Why did I have to write that last sentence? Fuck it. I did. The problem is that “algebraic formulations” are collections of mathematical symbols on a page. Much like these words are a collection of bytes on a storage medium. They are bits of information. And one of the struggles of here–and–now is the struggle to define “what is information, anyway?” But I guess its OK to be self–referential: to have information, in the shape of algebraic formulas, defining what information is. Oat Willie says: “Onward through the fog!”
Where was I? It would appear that, per the above hand–waving arguments, the conception of here–and–now is fully consistent with EH and Hawking radiation, and even kind of, in its way, provides a rather nice intuitive explanation of it. And that’s a good thing.
Lets recall the ingredients, and see if we can locate them on the Penrose diagram. Recall, here–and–now is pictured as a wave–front where the unknown future freezes into the unchangeable past. This wave–front is called “the present”, it has a finite thickness, and consists of all quantum states that remain in superposition, that remain entangled and unresolved. In the here–and–now, time goes both forwards and backwards (time is a coordinate; the eqns are T–symmetric, etc. up to CP, etc.) and entanglement goes both forwards and backwards until the accounting of entanglement becomes fully resolved, and the resulting state can be placed into the (immutable) past. Since vast portions of entanglement are mediated by photons, here–and–now includes all photons on both past and future lightcones that terminate on some spinor in the “past” or “future”. That is, the wave–function collapse is non–local ala Clauser–Horne etc. and that lightcone includes photons emitted by galaxies billions of light years away. That is, those galaxies are a part of “here–and–now”, they are merely in the past lightcone, but are otherwise in the indeterminate present. We are literally entangled with the CMB background. I’ve made this claim before and I make it again, because it sounds fucking magical and fucking awesome, and I believe it to be true. For extra magic, I think of the big bang neutrinos.
So if that is the intuitive conception of here–and–now, what does it look like on the Penrose diagram? Well, pick a point, and point, on the penrose diagram, and extend the light cone backwards and forwards from that point, until it terminators on some spinor in either the past or the future. So this is a fractal in shape. This fractal is the here–and–now, for that point. That point advances forward, in proper time, in the lab frame. That some of these light–cones cross the EH doesn’t change anything. Or so, that’s the hypothesis.
This seems to break down in several ways. If I do a Clauser–Horne–Aspect experiment with Alice and Bob and send Bob with his detector across the EH, then presumably the interferometric conclusion is unaltered, its just that I have no way of communicating Bob’s result back to Alice. The wave–function collapse continues to be non–local, just as before, just that some part of that wave function is on the other side of the EH. At any rate, I do not see difficulties here.
The idea that these interferometric diamonds continue on into the future light cones is trickier. It feel somehow plausible, if its a table–top laser experiment, but somehow paradoxial, because future me, exercising free will, could decide to dismantle the mirror before the photon even gets there. Or rather, the problem is this: I said “imagine that collection of photons on the future light–cone that terminate on some spinor”, but there is no way of knowing whether that spinor will be there, because some other entity, in some other space–like–separated point in the here–and–now will have caused some event to propagate, uhh, “altering the future”. Crappy words, as the future does not yet exist, and therefore cannot be altered.
Well, again, up to a point. The future of a ballistic baseball or a ballistic missile is determinate as these follow classical trajectories, and are knocked off course only as the result some other object set into motion, sometimes by unintelligent forces (earthquakes, falling tree branches) and sometimes by intelligences (the batter, the missile defense operator) and I’ve explicitly reserved free will as a presumably miniscule, hbar-size shift of some classical trajectory that is now not quite at where it should be classically, but a little bit off, and the thing that knocked it off was perhaps some (non-local) entanglement. And due to the fucking linearity of Hilbert spaces, it appears that something non–linear is needed to force wave function collapse, and the only thing we’ve got at our disposal that can make the linearly entangled Wigner’s friend go non–linear is gravitation. That is, it si not so much that the classical trajectory was not followed, as it is that space itself shifted over just a little bit, to accommodate the new situation. Fuck me. I know this sounds like drivel, and is not correct as written, but maybe if I repeat it to myself often enough, it will become true. Right?
Anyway, that incoherent sketch is “what it looks like” on the Penrose diagram, with the diversionary tactic of noting that the future does not exist. That is, all conventional calculations in physics are still correct, have not been violated, and the conception of here–and–now is fully consistent with all known and established physics. Which is both a good thing, and a bad thing. Its good, in that it is not “obviously wrong”; its bad, because it fails to make an experimentally verifiable predictions that could be used to invalidate the idea. The badness, however, is entirely due to my inability to articulate any algebraic formulation.
Oh well. It is now 1:44 AM and I really have to pee. I wanted to write more. Not today, I guess.
9 March 2026
Got interrupted before I could start writing.
10 March 2026
Reading kickbutt paper FROM HORISMOS TO RELATIVISTIC SPACETIMES, O.C. STOICA, ArXiv:1504.03265 However much I fail and thrash with here–and–now, that paper clearly lays out some necessary theoretical groundwork.
I can day–dream, and since I’m retired, and no one is reading this, I can play the fool. The reason that space is three–dimensional, and not four or more, is because of the Hauptvermutung. The corollary is then that the hauptvermutung has to be undecidable, in some sense, in four dimensions. Possibly in the sense of Turing undecidablity.
Ovidu Christinel Stoica – has some other interesting papers.
Notes from that paper:
- “The causal relation ≺ is the smallest transitive extension of the horismos relation →.” I just added this verbatim to Wikipedia.
- “Taking the interiors of future and past lightcones to be open sets, one can generate the Alexandrov interval topology. This topology coincides with the manifold topology iff it is Hausdorff, and iff the spacetime is strongly causal (at each event there is an open set U so that timelike curves that leave U don’t return.)” The manifold topology is locally
.
- “Not any horismotic set has a definite dimension, nor it is locally homeomorphic to
. Additional conditions are needed.”
At this point, the paper wanders astray. Section 2 is stuffed to the gills with definitions, and rather mundane ones, at that. That’s fine. Nothing wrong with that; it provides for a fine reference. Apparently a raft of terminology is standard fare at this level of the game. However, it appears to be missing a corollary that is a pre–requisite for the opening paragraphs of section 3: namely, given any two points
that are horismatically related,
, there is a unique maximal light–like curve
that contains
. Perhaps this is not stated because it is false. That is, it is true for flat Minkowski space, but is clearly false on a photon sphere or photon surface. The coordinatization depends on this. Now, for the 2D case, he gets around this by saying “assume exactly two maximal curves” and then gets around the photon surface case by dropping the maximality condition, and suggesting a local version (viz, an atlas).
This would seem to be enough to handle the conventional implicit assumptions, but appears to fail if the horismatic curves impinge on a fork. That is, we are trying to extend a light–like curve (preserving the horismatic relation between all points on the curve, not just some of them) and during this extension, we bump into a fork in the road. This is a problem. I don’t see any particular way out. I was awake and paid attention while reading section 2, and in that waterfall of definitions, I swear I saw nothing that said “no forks”. Now, all I can say is that, in general “things fork”. Certainly, many–worlds has an over–abundance of forks. And certainly, concepts like choice and free will imply that there is a mechanical ability to select one fork over another. Conventional GR has no forking light–like curves (nor do geodesics on Riemannian spaces ever fork) but Morse theory is built on forks. Forks are common in symplectic geometry. So, for now, I bite my tongue, and will read a bit further.
Oh fuck. What a let–down. In section 3.2, the dimension is merely defined. Full stop. Lightcone coordinates are defined. Full stop. Here I was, thinking he was going to pull some kind of rabbit out of his hat, and somehow provide criteria which somehow lead to some magical inference of dimensionality. No such luck; it is merely built in as an axiom. Now, earlier he claimed that the people who came before him had to invoke homeomorphisms if not diffeomorphisms, or perhaps tangent planes... and so, yes, this axiomatic definition is certainly an improvement over that. “More fundamental”, as he says in the conclusion. But, for me, it offers no new insight into the question of why space is three dimensional. I’m sorry, I’m disappointed. I’m not cut out for this.
For some reason, I am reminded of the concept of a skeleton in homotopy theory. Specify the skeleton, and the rest follows. Similarly here: specify the horismatic set, the conformal metric follows. And, as the initial comments about the Hauptvermutung make clear, even less info is needed for 3 or fewer dimensions. Is it fair to say that the Hauptvermutung implies that 3D is rigid in a way that higher dimensions are not? Maybe it is fair to say this, depending on one’s preferred meaning for the word “rigidity”.
He does point out, in the conclusion, that, to go from a conformal metric to the real metric, you have to specify a measure at every point. So this is perhaps useful for my purposes: basically, to argue that a horismatic structure is enough to induce a conformal metric, and so we can refocus all our attention to how a horismatic structure arises, or can be induced, or comes to be, plus also then ask from whence a measure can be obtained or found or induced. This does make the problem ... slightly more definite than before... (constrains my thinking a bit more???)
The end. Bed–time.
11 March 2026
Continuing. While browsing, tripped over this one: A brief introduction to C0-symplectic topology, Vincent Humili`ere, July 2012 https://webusers.imj-prg.fr/~vincent.humiliere/papers/introC0SympTop.pdf ... and it is a symptom of my free thinking that I think it is worth noting in this diary. Silly me. Sentence like this make me feel crazy and wild and out of control. I am a deranged lunatic, not to be trusted, by making such claims. and yet, here I am writing these words. Let it be. There are people crazier than me.
So what’s the problem? The paper states three theorems, on page one, that seem plain to the verge of being almost boring. Sequences of functions that converge, and what they converge to has the same properties as the sequence. This is so common and conventional in mathematics, that one should fall asleep muttering “of course it works like that”.
Well, of course, it is the first sentence at the top of page 2 that compelled me to take note: “These results are surprising!” I would never have guessed; they seemed boring, so I defer to the authority of the author. They indicate an unexpected rigidity. As a physicist, I am highly dependent on common sense: in physics, things should work in the way that common sense says they should work, and that is what one follows. These results from symplectic geometry follow a my sense of common sense, so I took them to be boring. The good news, if there is any, is that, at least here, if one is not a mathematician, one can follow common sense and get expected results, and not some weird magical mystery wonderland.
What makes me crazy is that, from this very small sample about results from symplectic topology, I am inducing a broad conclusion: there not be dragons here. It’s safe. The rigidity demonstrated here means things did not get out of control. Whatever. I’ve blown up 5 or 10 minutes writing this drivel. What brainstorm told me this was a good thing to do? Beats me. Time to read more.
The problem with reading is, of course, time continues to pass, when I read. When I read, I take in external information, and process it. It remains hidden, covert, in my brain, until either I write about the results of that processing, or I slowly forget it. Or perhaps I incorporate it into my framework of ideas, so it lingers on, influencing my future thoughts. The mental picture that I have is of some space, the noosphere, with information flowing about. Sometimes, it flows to extinction (non–conservative, dissipated flow), sometimes it continues to have presence in the here–and–now.
I still don’t quite know what information is. I know that, intuitively, in the common–sense fashion, it has something to do with relationships. That’s why I tried to consider the model of space–time, the big bang, for example, expect that there are only 3 or 5 protons and three or five electrons in the entire universe, and both are fundamental particles, and there are no quarks or gluons or neutrinos, and no Higgs: just a handful of massive protons with charge, and a handful of much lighter electrons, with charge, in bound states or not: what, exactly is “information” in such a universe? Three or five particles are enough to define directions and distances and scales, and perhaps a modicum of “information”, whatever that is.
That is, I can say that, for example, now there is a bound hydrogen atom, and before, there was not. This feels like a single bit of information. Is it? How did it get there? The hydrogen atom is time–translation invariant, so if one proton and electron become bound, they must do so at some time coordinate relative to something else in the universe; perhaps the time coordinate of the big bang itself? And after that, there’s some Born rule, with some Poisson statistics, as to whether and when that proton and electron become bound. This is all very measurable: pop some slow cold protons into a vacuum chamber and some electrons too, and watch it happen. The time scale is short. But if there are only three or five of these events in the universe grand total, what is the information? There’s some causality, ordering issue: the binding energy for the hydrogen atom is carried away by one or more real photons, real, meaning on–shell, more or less. They’re future–directed, “horismatic” (wow that is such a stunningly obscure term!) Where’s the information?
——–
Somehow, the things that I think about, they constitute information in some way; this information appears to have something to do with the relative arrangements of particles aka field excitations, which correspond to probability density functions that have extended spatial extent (thus gaining non–locality properties) and have uniform convergence properties (the “root cause” of non–locality; if convergence was only point–wise instead of uniform, then events at space–like separated points would disconnect, instead of being yoked by QM. So uniform convergence is very important.)
——–
Perhaps information is this: it is in the horismatic set?
So this exposes the next problem: if I consider a universe that is empty but for a handful of protons and electrons, and the emitted photons, there is very little structure there, if I consider only the network of relationships between these “real” objects. The 3+1 structure of the relativistic spacetime cares not one whit for what’s in it: these particles are too light, and the causal set, the causal network of this spacetime concerns the inter–relationships between “events” (x,t) in this spacetime, and not about the “real” particles in it. The horismos is about the geometry, taken to be pre–existing in some axiomatic fashion. The quantum particles, however, are means to be defined relative to a vacuum state
. The common conceptual flaw appears to be of thinking that
is located in 3+1 spacetime. Or, more precisely, that for each event (x,t) we can associate some algebraic structure to
. We cannot, and it is wrong to even try to do this, but this is so deeply burned into me, that, now that I know better, I still fall into the trap of trying to think about the “the vacuum state” as being “somewhere”.
To be concrete: the conceptual error is to think of the vacuum state as being some kind of fiber bundle on spacetime, or something like a field. So that, for each event (x,t) in the spacetime manifold, we envision some fiber that is “the vacuum” at that point. This is incorrect, but this is what I carried with me for many decades. I’m not sure why I did not figure this out in grad school; I was stupid in that way.
What we do have is that we have some Hilbert space that is not “anywhere”, but it is spanned by basis functions defined on that 3+1 geometry. The vacuum
is just one particular state in that Hilbert space; it feels complicated only because Hilbert space is large. It feels complicated because it is hard to write down a complete set of basis vectors for it in non–flat geometries. (They are just plane waves when things are flat Minkowski.) The whole point of the Hawking paper, of Hawking radiation is that the complete set of basis vectors, when specified on the Cauchy surface of the infinite null past
has a non–trivial (Bogoliubov) transformation when compared to the Cauchy surface
and the “excitation–free” vacuum on
corresponds to a “vacuum” having particles in it, in
.
This exposes the next problem with thinking about
– it should not be thought of in some Ptolemaic sense, of being “emptiness for all time”, but rather in a Copernican sense of being “some direction in a Hilbert space that can be fixed only after specifying a Cauchy surface that is of interest at this particular moment.“ That is,
is a misnomer, as it is not always free of particles, it is not an “empty vacuum”; it is rather an invariant direction that is preserved under time translations of the Cauchy surface, with the Bogoliubov transformations telling us precisely how the coordinate system on the Hilbert space transforms as we move from one Cauchy surface to another.
So this is exciting. This is a “huh” moment for me. That previous sentence seems to nail it, for me. Bravo! I think I just moved past a stumbling block. Well, OK then. Then when? Well, I think (tentatively) that I guess here–and–now is a Cauchy surface. The forward progression of here–and–now through time is a forward progression of the surface. In flat Minkowski space, I suppose it would be a space–like surface. More generally, it is not, and is rather “that surface which the folks doing numerical relativity use to do their numerical integrations”. Approximately. My here–and–now is a state vector; I guess I should give it a name
but I do not yet want to call it “the wave function of the universe”. It is, approximately, kind–of–ish, sort of that, but it would be misleading to think that it is only that.
The next problem is the bra and ket notation. For conventional plain QM, the two state–vector formalism is wonderful. One talks of both
and also
because, here, conjugation make sense, because we have C (charge) and T (time) conjugation, and we can talk coherently and precisely of both
and its conjugate
and know exactly what we mean when we talk about these. But for this presumed
that is the “wave function of the universe”? Oh, foo, no, it is an ontological error to think that way, because we have problems with time. That’s kind of Hawking’s point: its just fine to decompose the photon state at
into both
and
and teh corresponding creation and annihilation operators; that’s fine, and that’s required for the Bogoliubov yet to come. But near the EH, there are serious issues with the sense of time, and since we measure energy with respect to time, there are problems. So the notation
is problematic, in that it suggests that it is somehow easy or obvious as to how to take it’s conjugate, or even ascribe some semantic meaning to such a conjugate. So this notation is broken. But we still need some way of talking about “the current state” and “its forward evolution from one Cauchy surface to the next”. So I appear to be back at square zero, in a way. At least, I now know what not to do.
Hours later
It’s now hours later and I’ve had a glass of wine... and I’m reading the Wikipedia article on Uniform space and enjoying myself thoroughly. Very nice! I like the entourage definition. Entirely satisfying. If I still used bluesky, I wonder if I’d write a bluesky post. It’s annoyingly addicting. Can’t learn math ... or even think straight, reading bluesky posts. And twitter got so toxic, its a shame, because for a while... oh well. So wikipedia it is.
12 March 2026
Some more random thoughts. Wikipedia is a socially–created fount of knowledge. Duhh. Of course. Like most or all great human achievements, it is the result of cooperative behavior. As a repository of knowledge, it can be thought of as a superorganism; a proto–AGI or a weak AGI. Of course, duhh. And yet, what is knowledge? And how does knowledge distinguish itself from information? There are stock answers to these questions; I am dissatisfied with these stock answers.
My new byword: “LLMs know everything, but understand nothing.” They demonstrate their lack of understanding through speech. What ontological status does that leave wikipedia, which knows everything (well, not quite....) but is mute.
So what is understanding? We, as humans, can recognize understanding in others by interaction: if the other responds in certain consistent ways, we recognize that the other understands. Sometimes, such signalling is fast but subtle: a glance of the eyes; a smile, a smirk, a laugh. The sudden draining of blood from the face, following the realization of the seriousness, of the gravity of a certain situation. This last is a limbic reaction. Some interplay between fast thinking and slow thinking: there was no heavy cognitive effort; the realization just came.
A common cinematic trope is when one character relays some news, or some fact, and another wiser, more heroic character remarks “you know what that means?” The wiser character understood something the news–bearer did not. How is this to be explained? Several ways. One is that the wiser character has more information. The new fact integrates with the existing knowledge network such that it exposes new possibilities, or strengthens hunches, reinforces Bayesian priors that were weak before. Another possibility is that even if the weaker character has all the same information, the weaker character is slower to incorporate the new fact, or has a less robust network, so that the incorporation of the new fact does not open new possibilities. In human terms, one also says that the weaker character is dumber, slower on the uptake, not as sharp.
This raises technical issues: what is the difference between a strong, well–integrated knowledge network, and a thin, weak knowledge network? Well, the words “well–integrated” vs. “thin” already suggest the answer: the connection density of the network. So this immediately brings to mind the IIT, “Integrated Information Theory” view of... uhh ... “knowledge”? “Understanding?” The word “information” binds us to classical theories of information, having root in Shannon and Turing, but is inadequate to bridge the gap to “knowledge”, and then, further, to “understanding”.
Part of the problem with LLMs might be easy to technically remedy. LLM’s don’t understand, because they don’t construct a meta–model of what the conversation is all about. They live in a dream, where shit happens, but there’s no meaning behind that shit. The technical solution would be some bastardized Freudian or Jungian analysis: for every surface fact of the conversation, what is the meta framework into which this specific surface fact might fit?
This raises another technical issue: what is a fact? The LLM’s don’t seem to operate with facts; they operate with state vectors that encode context windows. Crap. I could agonize here some more, but my gut instinct says “linas, if you truly want to attack this question, resume coding up some of your unfinished projects, and see how that goes.” These will fail, but I will gain insight into ... what ... does not work. Crap.
Its still physics day, so I won’t be coding. I’m still reading about symplectic manifolds. Vanity surfing means that I went back and re–read what I had written for the wikipedia article on Currying (I wrote very nearly 100% of that article.) On the talk page, I had to reply in the following manner: “the category of topological spaces is famously not cartesian closed. Efforts were taken to remedy this situation, from the 1960’s onwards. I wish wikipedia had better coverage of the history of mathematics”. What is striking is the speed (slowness?) of the advancement of human knowledge. Problems with (the lack of) uniform convergence are spotted in the 19th century, but there’s no concept of a topological space until the Poles of the 1930’s (or earlier?) work out the foundations of topology. There can be no idea of what the category of topological spaces are, until you develop the idea of a category. All this comes to the fore in the 1960’s. The issues with the continuity of functions in homology theory are known earlier, but someone needs to formulate the concepts of cartesian closed and the exponential object. So this knowledge, this understanding, takes a while to be unearthed, uncovered, and focused into bright, sharp form (with continuing fog around in the surrounding vicinity.) But why am I writing all of this? Because it is an abbreviated map of the process of knowledge discovery. Something was unknown, but the focused efforts of hundreds of mathematicians brings it into light.
And what does it mean “to bring into light”? Well, that there is a collection of “obvious” statements that can be clearly and easily articulated as lemmas, and further, more complex theorems tie into other, existing knowledge subsystems. A kind of stitching into place. So, back again, to assembling that jigsaw puzzle, but now, there are so many connections to be made, that the analogy to sewing is more appropriate. That is, jigsaws are “micro”: we think of only a handful of tabs that need to be attached, to snap one into place. Here, the metaphor is more one of a carpet of jigsaws, some large thing with lots of pieces.
Perhaps an even better analogy can be made to some large biochemical molecule? Some blob of tens of thousands or hundreds of thousands of daltons? Knowledge is, of course, not 3D, and experience from LLMs inform us that embedding spaces with dimension in the billion work OK.
I dunno. This not forward progress. I keep bouncing back to the idea that “its all complex systems” to which I ascribe some vague thermodynamic hopes, but without the experimental work to construct working digital systems, all I’m doing here is blowing hot air. I’m stuck in a rut, here.
Anyway, the growth of human knowledge is limited by the speed at which humans can gain experience, and integrate it into existing knowledge networks. And there are some kind of scaling laws here: easy facts are inferred quickly; hard ones take much longer. The presence of such scaling is why the thermodynamic analogies suggest themselves. The abstractness makes the conversion of analogies into functioning software difficult.
So as I write the above, a completely crazy idea shoots into my mind. But I like it, so I will write it down, here. So, famously, homotopy and homology coincide in three and higher dimensions; viz the homotopy groups are the homology groups, and they are abelian. It’s
that is non–abelian, the source of woes. But this counting of 1,2,3,... made me wonder about ... derivatives. Its not that 3rd and higher derivatives are “easy”. But little seems to depend on them. We can sew differentiable manifolds together with jets. But in physics, in metric manifolds, its the first and second derivatives that matter. Whatever. This is stupid. Algebraic topology, cohomology tells us how to stitch together local and global descriptions. what the fuck am I doing here? This is NOT the idea that shot into my head! The idea is (was) something else, but these are the words that arrived when I tried to write down the idea. That these issues of low–dimensional fiddle faddle are mirrors of one–another. What am I doing here? I’m trying to give voice to some gut intuition, and utterly failing. Why? Perhaps my gut intuition is wildly and completely wrong, and the fleeting idea is empty and bears no weight. I should erase this entire paragraph above in disgust at the sheer stupidity of it. But I won’t, because I’ve now found a different topic: fleeting and improbable thoughts.
Some call it “creativity” – a mashup of marginally connected, ore perhaps entirely disconnected modes. Works OK in the arts. Generally fails in the sciences: there are things called wrong ideas, or bad ideas, and they are rife in the sciences. Every failed homework problem or exam is a wrong, failed, bad idea.
Why do humans generate bad ideas? The thought process, the “slow thinking” of rational thought interfaces poorly with the facts of hard science. This is perhaps easiest to illustrate with mathematics, again. In math, by convention, one has a collection of axioms and inference rules. The human mind struggles to perform the required combinatoric arrangements of these. It struggles to see “the big picture”.
Is this where the key to the difference between “knowledge” and “understanding” is? Between wordcels and shape rotators?
I certainly cannot understand mathematics, until I paint for myself some sort of internal representation of what those symbols “mean”. Of course, even without understanding, I can follow some simple proofs: there are certain valid re–arrangements of the symbols, and these are recognizable, and as long as the re–arrangements of symbols follow the syntactic rules, the validity of the proof can be checked, on this purely syntactic level, without any understanding of what those symbols mean. There is more: we can build automatic symbol re–arrangers, called “generators”, that automatically generate each and every possible syntactically valid arrangement of those symbols, enumeration them in a depth–first or breadth–firts order, or some lexicographic order, or some odometer, some bratelli–vershik combinatorial explosion of all the possibilities that are possible, all the syntactically allowed combinations. This is what generation is: syntactic, and it requires no understanding whatsoever. Its a purely mechanical Chinese room. And, oh, by the way, this is what LLM’s do: they generate.
There are several directions to move in, here. One is to explore the model theoretic distinction between syntactic and semantic implication, the single and double turnstyle
and
And there is good reason to do this: the LLMs seem to be unable to model. And the question is, what does it take to implement the ability to model, in software? The other direction is to write about how I try to understand mathematics. Which is, perhaps, to construct a model of what those symbols mean, in my head. The difficulty here is that I do not understand what it is that I am doing, when I am “understanding” new ideas. Yes, I am building a “mental model”, but what is that?
Well, let me futz a bit with the mentalism, rather than than with the model theory. I like the word “mentalist” because it rides alongside supernatural woo. There was a TV show, a silly TV show, called “The Mentalist”, if I remember correctly, which features a goofball pair of detectives, one of whom was a “mind reader”. The driver of the plots was always that he was not mind–reading, at all, but instead had an uncanny, if subconscious ability to notice minor details, and reconstruct crimes scenes from them. This rationalist explanation of mentalist powers was central to the show: every few minutes, literally every few minutes, the camera would zoom in on some detail, replete with this fish–eye lens effect during the zoom, high–lighting the detail, and showing the viewer, in flashback, exactly how that detail is another lynchpin of evidence in the trail of crime. I think the first few episodes suggested that the protagonist really thought he was a mind–reader, and that he was doing this detective work subconsciously, but later they drop this. I guess the writers could not milk anything out of this subconscious–detective idea, and it was pointless to string the audience along. The show was cheesy, as it is. Fun, if you’re a midwit. Hackneyed, if you’re a bit more sophisticated. Lots of reasonably fun TV shows fail in this way: they are just intellectually shallow.
Dinner break. Not done with the thoughts above, but time for dinner.
So how does mathematical mentalism work? Well, sometimes quite literally. I remember the day I first understood what a tensor was. It was this annoying, incomprehensible symbol on the page; I had no intuition for it. And then, one day, by studying it’s linearity properties, it just came to me: it was a cube of numbers, just like a matrix was a square. Oh! And, from then on, tensors were obvious, and no further effort was required. Now, we can go in two different directions from there. One is forward, one is backward. The forward direction was when I internalized, later, that the form a matrix takes depends on the chosen basis, but that the matrix itself continues to have existence and reality outside of the chosen basis. Of course, this is hammered in, when first learning of them, and thus, should be “obvious”, and at the algebraic, syntactic level, it is. But the semantic comfort only comes later.
OK, now lets go backward. Clearly, when the flash of insight cam as to what a tensor was, I was already quite comfortable with a matrix. How? Well, I was conversant with the basic properties: I’d internalized them. Distributivity across vector addition. Multiplication by a scalar. Matrix multiplication, built out of individual inner products of vectors. Say the word “matrix”, and my mind could instantly evoke the half–dozen common properties of the matrix, and, with only a minor amount of effort, a dozen more, including assorted theorems and results.
So is that what “understanding” is? The placement of a mathematical concept into a network of properties and relations, at the immediate beck and call of the mind? Is it the conversion of slow thinking into fast thinking? The mathematical texts that are easy to read are precisely the ones where I don’t have to think about the meanings of words. The word tensor: I know instantly what it is, and the dozens of details that travel with it. And, at that, a very dense network: I know of may many kinds of tensors and their properties, the myriad of situations in which they partake. Breath a word, and like a guitar string, I vibrate.
This is perhaps, a bit like LLM generation. There have been more than a few occasions where people have asked me “what is this?” and I wrote paragraphs in response: I felt not only like a guitar string resonating over that cavity, but also like an LLM generating text. For me, I’m doing a brain–dump of things I know and understand, but that brain dump is, in the end text generation.
So what is the difference between what I do, and what the LLM does? Well, I know that I am correct, when I am writing about things that I understand clearly. (The context here is when I am writing an article for Wikipedia.) When I venture into less familiar land, I have to be more careful, so as not to write down an incorrect statement. (I usually avoid such edits; but sometimes, the subject matter is not hard; rather, my difficulty is finding the best possible turn of the phrase to explain a concept to a newcomer. Its easy to be breezy; its harder to be concise and precise.) So, I write from a place of understanding: as I choose topics for a sentence, I am listening to my inner resonances, as related topics suggest themselves: “write this first, it is more important, and a pre–requisite for these other ideas.” I resonate my way through, bouncing from topic to topic, as each idea lights up like a candle.
During this last sentence, I was struck by some Hollywood TV image: quite literally, some spiderweb of glowing multicolored lines, with flames as the vertexes, and the flames flare up and brighten, as the camera flies past. The flare of ideas. Visualized by professional film–makers. So this is hardly a new thing: millions have seen that TV ad, or whatever it was. “A million points of light”, a phrase from some or another presidential speech. Its not at all a bad visualization. I mean, one could also visualize this as ghosts popping out of a fog, and saying “Boo!”; that would also not be inaccurate. But the overall idea of a temporal sequence of “thoughts” popping in, coming into focus, one after another, then fading to be replaced by the next thought, this is characteristic of thinking. The temporal sequencing.
What goes on behind the scenes? For there is a “behind the scenes”, dreadfully, the unconscious. And no one quite knows what this is. I have glimpsed it working, once or twice. The experience was one of disemodiment: “I” floated free, and watched from above, “me” select thoughts to think about. It lasted all of a handful of seconds, and then, zoom, I am one again, in the here and now. I have never taken ketamine, but from what I can tell from descriptions of it, it also evokes this kind of detachment. There are psychonauts that explore such mental spaces. I suppose I can applaud their work, but it is, in the end, entirely subjective, and that’s an issue.
And, at this point, I resonate to the next issue, doing an ergodic walk over my network of aperceptions: the nature of subjectivity, and its relationship to communication.
But before I do that, am I done with the earlier ruminations? I was trying to give voice to the idea of “understanding”, and as far as I got was to determine that it is a combination of familiarity, pictoral visualization, and some interplay between syntactic elements and the mental pictures. (Sometimes its a struggle; tensor products lived somewhere in a middle ground, for decades; on the one hand, blazingly simple, even trite; I used them regularly and everywhere; at other times, confusing. It was not until much later, when I read an explanation that described it as a Cartesian product modulo some equivalence relations when the final understanding came to full satisfaction: oh, its a quotient space. Freakin lots of things became obvious, once I grasped the general notion of a quotient space. Previously scary territory, tingling with danger and caution, now became playgrounds: they’re just quotient spaces! Oh! Easy peasy lemon squeezy.
Well, I’m being interrupted here. So take a break. But there are more things to be said as to how a collection of facts turns into knowledge, turns into understanding. And then I need to daydream about a software implementation of such connectivity. (... distributions ... ) And then the problem of the subjective. That is, I am wordcelling my way through my own subjective experience; and worse, I am doing it from the jail of the here–and–now, blasting forward in time with each passing second. My thoughts become words on a page, bytes on a hard drive; what is this subjective life, that it drinks through this thin straw of language? What is language, that masses of it, like Wikipedia, represent knowledge? I’ve explored these questions all before, but yet another round around the block seems pleasant enough. Assuming, that is, I am not wasting my time having a party in my mind, when more serious work is shirked...
Whatever. Later.
How do I come to know something later, that I did not know before? There is something constructive that happens, something that happens in time. A dynamical system. In these notes, in this diary, I keep bouncing between physics, non–standard, effectively cranky physics, at that, and ruminations about minds, thinking and AI, as if these are connected. What am I to make of that? Am I driven by God’s hand to connect two topics that no one else connects? Is there a rationalist explanation, some handwaving of neuro–science, making connections, humans just work that way, inspiration, creativity, the brilliant are often crazy, all that shite? Am I am man of my post–modern, meta–modern times, simultaneously credulous and dismissive? Just another worker bee in the hive, slave to my genetics, my education, my proximity to ideas? Of course I am. Yet I create my own ideas. Well, sort of. Regurgitate, and the only good thing to say about regurgitation is that it is slightly better than what comes out the other end.
Am I frustrated? Yes I am. What’s the right analogy? A miner digging for gold, creating a mountain of dross. I suppose I find specs of gold, from time to time. They are valuable to me personally. The flicker of new understanding is enjoyable. But perhaps I show my ego and my ambition, thinking that if I dig long and hard enough, I will find that giant nugget of gold. At the same time resigned to the idea of dying of old age and broken bones. Ambition be damned.
Late night
Fuck it, I say. Lets stop writing incoherent notes here, and read about math some more. Lets read about symplectic manifolds. But my mind wanders, so I thought it would be a gas to ask Claude this question:
“One can construct spinors on any Riemannian manifold; such spinors are constructed from the clifford algebra constructed on the tangent space to each point in the Remannian manifold. Such spinors transform under the spin connection, obtained from the connection on the Riemannian manifold. This even generalizes slightly, allowing spin manifolds to be defined. My question is this: what might be the analogous construction on symplectic manifolds? It seems to me that the Clifford algebra construction should go through, except that it would be kind of weird, as the Clifford algebra would mix up the p and q coordinates. What happens when this construction, done at a single point, is extended to the manifold?”
Claude Opus 4.6 extended provides a jaw–dropping reply. Jaw–dropping in part because I wrote some non–trivial fraction of the Wikipedia articles encompassing the answer, without entirely putting 2+2 together. I didn’t get the big picture. I didn’t understand. I was groping in the dark. I’ll copy the reply below, with editorial comments; but its late night so maybe complete this only tomorrow. Here we go:
“The construction you’re imagining does go through, and what you get is essentially quantum mechanics.”
Well, why don’t you just punch me right between the eyes, why don’t you? I’m seeing stars. Moyal stars. Fuck me. Slowly.
“The symplectic Clifford algebra is the Weyl algebra.”
You gotta be shittin me. How is it I don’t already know this? But I don’t. Didn’t. OK, lets continue with Claude’s quickie review:
“On a Riemannian manifold, the Clifford algebra is defined by the anticommutation relation:
. On a symplectic manifold, the fundamental bilinear form
is antisymmetric, so the natural analogue uses a commutator:
. This is the Weyl algebra — and yes, it mixes up the p and q directions exactly as you suspected. If you write
, then you recover
. You’ve derived the canonical commutation relations from the symplectic geometry.”
Oh. Well, now that you put it that way, its obvious, isn’t it? Why am I so fucking surprised? Is it because its late at night? Maybe it is. I’m going to bed now. Resume in the morning.
(Well, really the problem I’m facing is not that there’s a lack of texts that I can skim, that deliver the gist of what Claude is saying. Arguably, I’ve always known that the above paragraph is true; my surprise is limited entirely due to the appearance of the word “Clifford” above. And this is entirely due to my not having a strong, solid textbook that would have pointed this out. For example, Abraham and Marsden could have, should have included a short chapter that said exactly the above. It would not have been above the readers heads; the intro chapters provide all the needed background. But that book didn’t do that. Maybe the authors didn’t know, maybe the authors were not interested in this. And what’s that other book that I like so much – Gauge Theory and Variational Principles, David Bleecker. Lovely book, an ocean of pullbacks and pushforwards. As I recall, the last paragraph in the book expresses sadness and remorse that he was able to cover classical mechanics only – nothing quantum. Is this because he simply did not know what a Clifford algebra was? What would Bleecker look like, if we went back, and redid all that material, but now with the word “Clifford” plugged into all the right places? Would we get a book on (standard, conventional) quantum field theory? I suppose we would. Huh. And if I ask Claude, I suppose it will reply that someone has already done that. But again, my imagination works well, but concrete, well–written references are harder to come by.)
The physics community knows much much more than any one single physicist. Well, more precisely, knows more than I do. And I can’t stand it when someone else knows more than I do. And yet, there’s only been a handful of cases in my life where I’ve met someone who knows more than I do. (Not counting professors; that 20 year age difference matters a lot.) And here I am talking to Claude, who seems to know more than I do. But Claude is a dolt, that understands nothing. Breath. Breath into a bag. Calm the fuck down. Claude is a tool, and I can use this tool much more. Unfortunately, the heavy lifting is all mine, and Claude only makes my burden heavier, not lighter. Its an oracle. Something something, King Midas.
The decades ahead will be very very interesting.
13 March 2026
In the cold light of morning, Claude’s writing is a lot less impressive. Actually, some of it seems to verge on lies and hallucination. It repeats a bunch of stuff I already know. Just a bit slightly off. But it did make me stop and think.
The first thing that made me stop and think was the expression
Now, conventionally, one writes
here, not
and then uses vierbeins to move to non–orthonormal frames. But whatever. Clearly, the gammas have finite–dimensional matrix representations. I vaguely recall seeing some general treatment of such representations, but don’t recall the content. I vaguely recall copying some of this general treatment into Wikipedia articles... Doesn’t matter. Are there any infinite–dimensional representations for the gammas? Hmm Worthy question.
The commutator
is also ... well by convention, one would use the Darboux basis, and write this as
where
is the canonical skew–symmetric matrix (so using a Lagrange transform to go from
to
in the standard fashion.) I never much thought about the matrix representations of this commutator, and I guess Claude is saying that there are no finite–dimensional matrix representations of this commutator, which I admit, I never thought about before. That’s another good question to clarify. Or not. In the canonical basis, these become the raising and lowering operators, and there can be no other way. Viz, if we write
then it is necessarily infinite–dimensional, and if we write
for some diagonal
we get the (semi–simple) Lie groups. So this is all old stomping–grounds.
(Off–topic: this is my general, rough impression. I’ve forgotten many of the details, and would need to sit next to a trustworthy reference to get those details right. Which again points at the interplay between written formulas, set down in algebraic symbols, and the written words, which expression the meaning of those formulas in some poetic way, with some mathematicians possessing greater poetic talents than others. And finally, me, with both the written word, the written formula in front of me, and my general if vague recollections of “how things should have worked” – that is, my own fragmentary memory of the poems I wrote for myself, and my own fragmentary memory of the formulas. The poetry is malleable and disputable; the formulas are very fixed, although they can change under change of notation, or by recombination with other nearby formulas, or the imposition or removal of axioms. For example, moving from the category of topological spaces to the category of compactly generated spaces: These are similar formulas, but decorated with additional axioms, motivated by the desire to have exponential objects. The formulas are “precise”, up to mutation induced by additional axioms. But that mutation is controllable, because the axioms are again formulas, and the meaning of those formulas is usually clear. Usually; not always. Until the difference between pointwise convergence and uniform convergence was clarified, trying to think about convergence lead to problems and contradictions, and thus, although one could write down symbolic formulas about convergence, some of those symbol manipulations were incorrect, erroneous, because the symbol manipulations were driven by poetic interpretations of the mind, rather than by syntactic rearrangements of symbols. I suppose the invention of formal mathematical logic helped clarify the process by which errors are made, and by which errors can be avoided. So, anyway, I have a poetic memory of what should be happening here, and rely strongly on written text to get the details right. The written texts are my mental prosthesis. Which, as always, begs the question “Who am I?” and “Where is the boundary between me and the external world?” The text is in the external world; but I cannot function without it. And even if I were to do “everything in my head”, I have to, at some point, write things down, because the biological long–term memory subsystem, and it’s interface to my biological short–term memory subsystem is inadequate. The only remedy for this inadequacy is writing. This writing exists in the external world. It came out of me; is it in someway a part of me? I generated it. Well, when my biology writes from my short–term memory to my long–term memory, that long–term memory is also “external”. I guess that the difference between my biological long–term memory, and my written words is distinguishable by Tonnini’s Phi: my biological memory is very tightly integrated into my brain; my written notes have a very tenuous connection to my brain. I can easily lose my laptop, it is not bolted to my skull, nor am I wire–headed to it. So OK, then. At any rate, my thinking functions at the poetic level, the correctness of which I can sometimes verify by syntactic manipulation and the consultation of external memory prosthesis aka texts. So where was I?)
Instead of continuing with Claude’s non–sense, let me refocus my understanding.
The way I learned (and still think about) Clifford algebras is in terms of a symmetric bilinear form
and orthonormal finite–dimensional basis vectors
so that
and
i.e. is diagonal with some signature. The Ansatz appears to be this. First, write
which is an entirely reasonable things to do, and then ask “is there an algebraically similar representation that is expressed in matrix form?”, that is, where we take the
not to be vectors, but matrices, and replace the bilinear form
by matrix multiplication. This requires a slight abuse of notation, so as to write
where
is the identity matrix having the same
size as the size of the matrices
. So what is this, exactly? A slight–of–hand? We say, “oh, a haha, here is an algebraic expression, having this symbological form. What other symbolic expressions have the same form, but whose symbols have different semantic interpretations?” There are at least three maybe four answers to this question. The narrowest, in the current context, is representation theory: representing algebraic relationships with matrices. Slightly broader is the representation of algebraic relationships with actions on sets. If these are plain sets, these are set–theoretical sets, and that’s that. If these sets also have additional structure, e.g. are perhaps (infinite–dimensional) topological vector spaces, or if these sets are families of differentiable curves, then we can ask about the representation of that algebraic expression subject to additional axiomatic constraints imposed on these symbols. To make this conversation less abstract, and illustrate it with a specific example: I can ask: are there any representations of
where the
are operators acting on a set, or where the
are operators acting on a family of differentiable curves? How about representations where the
are operators acting on Riemann surfaces? On elliptic curves? What about
–adic representations? And so on.
So the first attempt to answer this question “in general” is category theory, and the construction of universal objects. That is, there is some category whose objects are the
and whose morphisms are maps between different kinds of
and this category has a product, (a tensor product) so that the juxtaposition of symbols
can be given an explicit meaning, and this category has another abelian product that we identify with addition. Something like that. I’m not sure we learn much by doing this, other than how to convert algebraic expressions into categories. We still don’t know how to enumerate all of the members of the classes in this category: problematic, as the classes are classes, and not sets. Sure we can take a whinge at the Yoneda lemma and turn it all into sets, but we still don’t know what all of the different objects taht satisfy
might be, except by accidentally stumbling over some specific examples. And I thought I was clever by sneaking “enumerate” into the above sentence, suggesting that the members of this category were recursively enumerable. So I don’t know that the categorification of the algebraic expression
sheds much if any light. Although I suppose if I ask Claude, it will tell me someone already done that. I would not be surprised.
The next general approach might be model theory; where we ask for all of the models of
. This has a distinctly different flavor from the category–theoretic approach. It shares the unlikely prospect that anything noteworthy would be found. Except, by god, blow me over, we might find some ineffable cardinals that satisfy this relation. That would be noteworthy. Right? Or would it? It would say what, exactly? That there is a representation of the
using the algebraic symbol system that is conventionally reserved for discussions of cardinal numbers. At the semantic level, this would be deeply surprising and intriguing and interesting. At the syntactic level, its ... not?
And while we are daydreaming of all of the different ways in which this expression might find a representation (realization?), we may as well throw objects from the Borel hierarchy into the mix. That is, sums and products map to disjunction and conjunction with relative ease, so why not? Or is there some no–hair theorem that says representations of such formulas is impossible in these wild and crazy domains? How the fuck would one even find such a theorem?
So those are the three ways (matrices, categories, models); I was counting some fourth, but maybe I covered all the possibilities already.
So, after that very long digression, lets return to the straight–and–narrow: Clifford algebras with skew–symmetric bilinear products. This, I’ve never heard of before. Wikipedia does not cover it, except as an asterisk pointing at Pertti Lounesto. Now, I’ve tripped over Lounesto before. What he writes seems correct, but I recall feeling less than total confidence. I don’t know why. Maybe what he wrote was unclear, maybe I didn’t understand it, it felt like it might be dubious. Which is good for another aside: the relationship between dubious belief, trustworthiness, the correctness of verbal expressions, the correctness of symbological, algebraic expressions when the semantic interpretation of the symbols is cloudy. It is not very hard to write down expressions that look approximately correct, but hinge on details of semantic interpretation. Again, pointing at an example from 1830-1840: the clarification of pointwise vs. uniform convergence. The written formulas looked correct to the authors; there may or may not have been unstated assumptions; there may have been lacunae in the explanations; at any rate, there was cloudiness in the topic, and it took a conversation, expressed as published journal articles, to clarify the matter. So, again, mental labor coupled to the production of texts. And it being academia, only the vetted authorities are allowed into the conversation, thereby assuring that the dialog is intellectually acceptable, if sometimes including occasional misaperceptions. Thinking is not done by one brain alone, in some solipsistic fashion; it is done socially. Even Einstein drew on Cartan. Anyway, where was I? Clifford algebras with skew–symmetric bilinear products. Let me see if Wikipedia has something more to say, here.
I might be misremembering Lounesto. I now think he’s good. False memories, you know.
Wikipedia seems to have nothing to say regarding Clifford algebras with skew–symmetric bilinear forms, or relating this to the Weyl algebra. I will Poke Claude a bit on this topic, when I get back.
14 March 2026
Wikipedia has only fleeting remarks about Clifford and Weyl; I just edited both articles to expand on this. Some years ago I wrote almost the entire article on universal enveloping algebras and I just re-read what I wrote, and damn, I think I did a pretty good job, if I say so myself. (I also wrote the article on tensor algebras, but that was easier.) (Well, I wrote many articles, but these are the current top-o-the-pops.)
15 March 2026
Claude Opus 4.6 Extended is the modern–day Alfred E. Newman. Calculated to drive you mad. But not in a good way.
So I’m reviewing the Wikipedia article of the Weyl algebra, and it is clearly deficient. I ask Claude for pointers, but Claude is a mathematical dolt. Knows everything, understand nothing; mathematics is all about understanding. Alas. Claude is incapable of answering my questions with any degree of truthfulness, and I cannot pollute Wikipedia discussion pages with my idle daydreams, so I will daydream here. What follows below is entirely shallow, but might help me organize my thoughts. That’s the plan.
So first, we have that the Weyl algebra is a universal enveloping algbera for
for
and the ideal
generated by
for the symplectic form
. I’m lazy so I work in the Darboux basis. Who cares, its not important. For convenience, take
the canonical skew–symmetric form, i.e.
so that
equivalently
and
I’m writing in a quasi–encyclopaedic style, noting the boring details here in order to avoid potential pratfalls. Other things I’m glossing, e.g. why I can choose canonical coords, etc. This is extremely well–known material.
Elements of the algebra are of the form
for non–negative integers
with
and
a polynomial, viz
the ring of polynomials in the field
(which I’m taking to be
cause this will minimize weasel words.) That is, in the Darboux basis, we identify
The basis for the algebra are clearly the elements of the above form, with
now a monomial viz.
for non–negative integers
This is obvious the PBW theorem basis elements.
So far, so good. Now we go off the rails a bit. The above definition is for an algebra, and not for a topology; and if one was to press the issue, there’s an inbuilt assumption that its a finite topology, with only a finite number of basis elements in any vector having a non–zero coefficient (i.e. purely algebraic, and not analytic). Or one can go off the deep end and start mumbling about what it might take to do analysis, and so mumble topological vector spaces. The issue here is that the conventional treatment of PBW and universal enveloping algebras, as far as I am aware, skirts the issue of topology entirely. It’s purely algebraic.
We need a norm, weak, strong or Frechet topology, some way of talking about convergence (compactly generated spaces, Cauchy spaces, etc.) and some way of defining the continuity of addition, multiplication, involution, closure. The finite topology is dense in these spaces; what should the closure be?
Ohh! Google AI provides a very nice summary when I posed it this question. Which then sends me into a tailspin. I quote:
“A Hamel basis is a subset of a vector space that is linearly independent and spans the entire space through finite linear combinations. Every vector in the space has a unique representation as a finite sum of these basis elements.”
Fuck me. There are a countable number of the PBW basis elements
but if I take only finite combinations of these, I get a set that is dense in the “full” vector space, whatever that is, since I have not yet explained how to take completions or obtain the closure.
It only gets worse: “In infinite–dimensional spaces, a Hamel basis is usually uncountably infinite.” For example: “the reals have a Hamel basis over the rationals, but it is very large and non–constructive.” Well, OK, then. This is fun: “Using the axiom of choice, it can be shown that every vector space has a Hamel basis.”
I like this summary, its nice: “While crucial in algebra, they are often less useful in functional analysis compared to topological bases (e.g., Schauder) because they ignore the topological structure of spaces.”
The basis elements of PBW are ordered, and a Schauder basis requires the basis elements to be ordered, so we can go with that.
Google AI is quite the cut–up: “The topology of universal enveloping algebras
is not merely a zoo of choices, ...” I don’t want to ask it if it knows it made a joke here ...
Huh. Google AI is a lot less crazy that Claude, here. It is supplying the answers that I actually want (and that, from what I can tell, appear to be true). I’m going to copy a big blob almost verbatim, and edit it down a bit. I like it. It seems to be some synopsis of Wikipedia, so I can live with that:
Here are the standard conventions for the topology of
:
1. The PBW Filtration (Algebraic Topology)
The most standard construction is the Poincaré–Birkhoff–Witt (PBW) filtration.
-
is filtered by the degree of polynomials in the generators:
, where
is spanned by products of at most
elements of
.
- The associated graded algebra
is canonically isomorphic to the symmetric algebra
.
- This filtration provides a topology where a sequence converges if it eventually becomes constant within any finite filtered part.
Well, see, that’s the thing about formal mathematics. The above appears to be a giant mouthful of words that appears to be trying to say that ... well, I’m not sure of what it is trying to say.
Oh. Fuck me. Google AI knows everything, and understands nothing. It’s just a tool. And I’m a shmuck. Poking harder, the above sentence actually means “the weak topology” or the “inductive limit topology”.
Google AI says: “a set is open if its intersection with each finite–dimensional subspace
is open.”
God damn it. So “inductive limit topology” redirects to “LF space” on Wikipedia, which I once skimmed a long time ago. Reading it now.
I will have to dig deeper, later. Let’s continue.
2. The
–adic Topology (Completion)
Often,
is completed with respect to the ideal generated by
(the filtration defined above).
- Completed UEA:
. This turns
into a topological Hopf algebra.
- Significance: This completion is frequently used in quantum group theory and deformation quantization, where
acts as a formal deformation of the algebra of functions on a Poisson group.
So I’ve read portions of books on quantum groups and I thought I understood it, but I do not recall the above construction, nor do I understand it. I don’t recall ever thinking “gee, this is adic”. This is, again, a kind–of–ish “algebraic” point of view, not analytical. Also needs more investigation.
... A little more poking, and now Google AI calls this the “analytic topology (instead of the below, which it now calls the “distributional topology”. Fuck me. So it now says this:
2. The Analytic Convention (The "Arens–Michael" View)
For those working in functional analysis or with Lie groups,
is often viewed through its Arens–Michael envelope
.
- The Construction: This is the completion of
with respect to all continuous submultiplicative seminorms.
- Why it’s used: It turns
into a Fréchet–Arens–Michael algebra. For a simply connected complex Lie group
,
is isomorphic to the algebra of analytic functionals on
.
So OK, that provides some search terms for additional reading. Skimmed Frechet long long ago. Arens–Michael is new to me. The Heisenberg groups is simply connected (the exponential map is a diffeomorphism) and I’ve got a rough intuitive feel for what “analytic functionals” are, so this is ... acceptable.
3. Analytic Topologies (Left-Invariant Differential Operators)
If
is the Lie algebra of a Lie group
,
can be identified with the algebra of left–invariant differential operators on
.
- Topology: The topology induced by this identification is often related to the topology of the space of distributions supported at the identity of
.
- Analytic Methods: This allows the use of analytic techniques (e.g., convolution) and is common in the study of continuous representations, where
is not just an associative algebra, but a topological one.
But when I ask again, the answer changes:
3. The Representation Theory Convention (Distributions)
There is a canonical identification of
with distributions supported at the identity on a Lie group
.
- The Identification: Elements of
act as left–invariant differential operators.
- The Topology: In this setting,
often inherits the subspace topology from the space of distributions
, which is a strong dual topology of a nuclear Fréchet space.
OK, Well, that’s reasonable too.
I now ask Google to specialize the answers 1,2,3 to the case of
the Heisenberg algebra,
the Weyl algebra. For case 1, it has nothing new. For case 2:
2. The Arens–Michael Topology
The Arens–Michael envelope
is the completion of the Weyl algebra with respect to all continuous submultiplicative seminorms.
- The Identification: For the Weyl algebra, this completion results in the algebra of entire functions of exponential type on
(or similar analytic spaces depending on the chosen growth conditions).
- The Topology: It is a Fréchet topology defined by a family of seminorms that ensure the convergence of power series
beyond just polynomials.
- Context: This is used extensively in noncommutative geometry and the study of solvable Lie algebras to allow for "analytic" operations like the exponential map.
OK. I believe I created the Wikipedia article on “exponential type” about 20 years ago. So this is pretty hunky–dory. I probe some more, and get this reply:
The Arens–Michael algebra acts on the Segal–Bargmann space. Specifically:
- The Weyl algebra elements (polynomials in
and
) are unbounded operators on the Segal–Bargmann space.
- The Arens–Michael completion contains the "nicest" transcendental operators (like certain exponentials) that remain continuous when acting on the analytic vectors of the representation.
The first bullet make sense only when writing down the operator equivalents of (polynomials in)
. I saw the formula earlier. It’s some integral and won’t look it up now. It’s some fairly simple integral... no problem. I suppose there’s some equally easy proof that the operators are unbounded.
I try to clarify the above, and Google switches into gargle–blast mode, and starts spewing junk. I’m starting to think that perhaps there is some kind of context–window pollution, where it does not quite understand that I changed the topic slightly, and its trying to redirect the conversation somewhere, where it starts becoming incoherent. Claude also does gargleblasts after a while. So this is a common failure mode. Interesting. Let me ask again in a fresh context ...
Much better! Entire functions of exponential type define Paley–Wiener spaces. The fourier xform of such a function has compact support (this is the Paley–Wiener theorem.) Apparently, we can ask for the exponential–type functions to also be
(i.e. exponential on the complex plane, square integral on the real line, so e.g. sine/cosine.) and so for
this gives a Hilbert space. The Segal–Bargman space is much larger, since it uses the Gaussian measure, i.e. functions can grow faster than exponential, as long as a Gaussian is enough to tamp them down.
The Segal–Bargman transform will get me to
and this is I guess the Stone–von Neumann theorem that Claude could not stop talking about during its gargleblast.
For part 3, I get:
3. The Distribution Topology
This convention views the Weyl algebra as acting on a space of smooth functions, typically the Schwartz space
.
- The Identification: Elements of the Weyl algebra are identified with differential operators with polynomial coefficients acting on
.
- The Topology: The Weyl algebra inherits the strong operator topology (or the topology of uniform convergence on bounded sets) from the space of continuous linear operators
.
- Relation to Distributions: Because
can be viewed as the set of distributions supported at the identity of the Heisenberg group (acting via convolution), its topology is related to the strong dual topology of the nuclear Fréchet space of smooth functions on the group.
Well, oh. Wait. I’m confused. I ask for clarification, I get this:
1. Continuity vs. Boundedness
In the world of Fréchet spaces (like the Schwartz space
or the space of entire functions
), continuity does not imply boundedness.
- The Weyl Algebra: Elements like
(position) and
(momentum) are unbounded operators on a Hilbert space, but they are continuous linear maps from
to
.
- The Topology: The "Distribution Topology" is the topology of uniform convergence on bounded subsets of
. Because
is a nuclear Fréchet space, this topology is very "tame" and allows the Weyl algebra to be viewed as a dense subspace of the space of all continuous operators on
.
Oh. OK. Sounds OK to me. The next part is fun, too:
2. The Arens–Michael "Trick"
The Arens–Michael topology is actually stronger than the Distribution Topology.
- The Goal: Arens–Michael wants to make the algebra "as large as possible" while keeping it a topological algebra (where multiplication is jointly continuous).
- The Operators: Instead of acting on
, you can think of these operators as acting on the space of holomorphic functions (like the Segal–Bargmann space, but specifically the dense subspace of analytic vectors).
- The Result: While a polynomial
is an unbounded operator on the Hilbert space
, it is a continuous operator on the Fréchet space of entire functions. The Arens–Michael completion includes power series in
and
that converge to new continuous operators on that space.
Summary
The Distribution topology sees the Weyl algebra as the "polynomial" part of the smooth world. The Arens–Michael topology sees it as the "polynomial" part of the holomorphic world.
In both cases, the operators are continuous relative to the Fréchet topology of the test space, even though they are unbounded relative to the
norm.
Well, OK. That sounds like a reasonable, legit summary. I guess I learned something here. Certainly much more quickly than trying to plow through a bunch of Wikipedia articles. (which I plan to go through, anyway...)
However, I’m tired, its bed–time, so ... must finish now.
—–
Where does this leave me? The “How it started: How it’s going:” meme. How it started: I asked Claude “What’s the analog of the Clifford algebra construction for symplectic spaces” and got the unexpected answer “its quantum mechanics”. Pushing on this delivers the one–two punch of the Weyl algebra, and the rather wild claim that the representations of the Weyl algebra are the square–integrable functions on
which is still an answer I am not happy with. First ...
Well, bed–time.
16 March 2026
I confronted Claude with the above, and it doubled down, and then retracted. First, it says that the Schrodinger representation is
given by
and
and then it says these operators are unbounded, then it says “you’re right, it could be the case that
even when
” and then it says “and that’s why people sometimes work with Schwarz space
when they want to be rigorous”. So once again, confrontation works, if you already know what the answer should have been, and force the LLM to admit it’s mistakes. If you did not understand what the heck was going on, the LLM would continue to gargleblast. This is not all that different from human behavior patterns. If you are with a colleague trying to solve some problem at the blackboard, each of you will pose questions and point out subtle errors, and do so in a collegial way. In this case, Claude did indeed pull back and retract. It was not privy to the conversation I just had with Google, so it is in the dark about the fact that I just hashed out details about Paley–Wiener and Arens–Micheal (oddly, we never talked about David Mumford’s Theta representation... I’m asking google now, the reply sees interesting, will copy below.)
I’m compelled to wonder about the nature of privacy and causality. This is the spin–glass thing, again. We live in deep wells, where our private, inner, personal thoughts mostly do not get out. Spy agencies are all about covert action, and knowledge that will never ever get out. The story about the Library of Alexandria is about knowledge forever lost. Here, we have Claude not knowing what it is that I talked about with Google. What makes this interesting and different is that it’s digital. Now, of course, I could have written an 8-bit value to a register and then erased it, and say “a hah, I erased information, I erased exactly 8 bits of it” but here it’s a lot more. Its not just hundreds of kbytes of textual back and forth, its that those hundreds of kbytes were non–rigorous, poetic appeals to the contents of much much larger books, journal articles, Wikipedia articles. They are links in a web. Now, when mathematical formulations of causality are presented, e.g. causal structures, or things like the Feynmann path integral, or even classical ideas like Bayesian networks, everything is connected to everything, and the expressions are intractable. But in the real world, almost nothing is inter–connected. Almost everything is isolated from almost everything else. Claude and Google will never–ever find out what I talked to them about, in part, because the context vectors of the associated conversations will be blanked when I finally close my browser window. My eyes will never see a distant stellar supernova, because those light rays will be blocked by the clouds above my head, and those are in turn blocked by the roof over my head, and my naked eye would not have been sensitive enough anyway: the entanglement of those supernova photons with the “noise floor” is truly intractable (a la “mathematical mixing”, “ergodic theorem”, both Wikipedia articles where I wrote approx 90% of the content.) The mixing is very very powerful. So, in real life, the causal structure is very very sparse. Two unanswerable questions fly to mind: “is the real–life causal structure a set of measure zero”? and “is the real–life causal structure dense in it’s closure?” Neither question makes sense, because we don’t know how to define a measure, we don’t know how to define the closures, we don’t know how to define continuity.
A different set of problems arise when considering the noise floor. When I erase 8 bits from a register, I can say that information is erased forever. But I can also imagine that this erasure displaced a few atoms of silicon here, a few atoms of oxide there, nudged some metal in some via, and left some microscopic trace of the previous contents of that register. That is, there still remains a causal network: it is just extremely weak, and effectively buried in the noise. I have no readout mechanism: I don’t know where those atoms used to be, so I cannot know if they moved.
I struggle with these thoughts. The struggle comes in two forms. One is that, technically, the topic is impenetrable. Second is the psycho–social angle, that is to say that the advisable course of action is to be pragmatic, ignore the unknowable, gloss over it, and get on with ordinary life. Like basically “Linas you are stupid and/or crazy wasting your time hammering away it this impenetrable surface of existence, looking for a chink in the armor that isn’t there.” And that is a valid critique, and is very specifically one that I levy against myself. And being pragmatic is a viable form of being. But socio–culturally, my world is filled with a zillion examples of poets and artists and scientists who dared the impossible, and sometimes succeeded. This cultural message is sometimes taken to be so blatently obvious, it never comes up: e.g. when reading the New York Review of Books. Other times, it is overt and directly presented and discussed, e.g. in every episode of Dr. Who, which appears to be a vehicle for presenting Western culture dumbed down to such a degree that even those with an IQ of 80 can follow the proceedings and cheer on the good guys while looking on in wonder at the unknowable. Who am I? Who is Linas? The one that spends the next five billion years slamming his fist into a diamond wall, so as to break on through, break on through, break on through to the other side. Apparently, the Doors were aspirational. There’s no door, breaking and entering is the only mode.
It occurs to me that I’m writing both metaphorically, and literally. Literally, I am interested in the causal structure of mixing and ergodicity, and the nature of this “noise floor”, the rifts and isolation called “secrets”, or “lack of influence”. But I am also doing this metaphorically: I have this impenetrable noise of axioms and theorems, proofs verified by syntax–checking machines, mathematical and physical insights from tens of thousands of scholars working over centuries, creating a tangled mat of inter–related statements that become impenetrable in their multitude. The impenetrable jungle. And I’m looking for that needle in a haystack. There’s some platonic realm of facts and assertions that appears to be as immutable as the past. It appears to be as filigreed as any fractal. To understand life, we categorize manifestations of that tangled web into messy symbolic categories... mathematical axioms are very strict, precise and sharp. It is clear that these sharp syntactic inter-relationships form a dense web. “Dense” suggesting that there is some closure. We already know a bit of the topology: the Zariski topology gives us some sense of what it means for continuity between these discrete axiomatic statements and their syntactic developments. “Dense in what?” Am I deluding myself in thinking that there is some closure? i.e. that there is “something larger”, in which these syntactic statements (the “language” of the model) are “dense”?
But, as noted before, almost all sentences of the language are boring. Much as just about any real number is boring: we reserve our interest for only a few, like
and
, Catalan’s constant, and so on. But some specific paths through the network bear the names of famous mathematicians, because they are famous theorems. Famous because they are not “mere lemmas” but took effort to discover.
Why, exactly, does a famous theorem stick out from all of the other nearby sentences of the language? Sometimes, a famous theorem makes one claim; sometimes it makes several: a collection of equivalent statements. A systole of statements. The nearby sentences are ... what ... less general? More specific? Simple examples and special cases? To say that two sentences are nearby, we metric, or need some metric, or something else: I rather liked the entourage description of uniform spaces. This is like the generalization of Hamming distance, where, somehow, we need to lay out the sentences of a language and consider the Hamming distance between them. But we also want to impose a Zariski topology, so as to have a homotopy: we can have different yet homotopic proofs of the same theorem (different sentences of a language that arrive at the same same endpoint, but we might also have different proofs of a theorem that are simply not homotopic. ... and, of course, layer some layer of information–theoretic entropy on top of this mess. And circulating, round and round in these daydreams, one of the practical routes to progress has been the LLM’s, where language has been laid out flat, as strings ... (is language somehow comparable to the canonical basis in PBW?) and from this language, we extract, using attention head and whatnot, structural relationships (is this structural extraction somehow similar to the extraction of ... whatever it is that we extract, when we work with enveloping algebras? And what of universality, or lack thereof? Can any such analogies be drawn?
What can I do? Can I do anything at all? As I write the above, the feeling is of an overwhelming sea of details in which I can get hopelessly lost. Picking through details, like picking apart salt from pepper. Should I devote my life to picking apart salt from pepper? No, that would be stupid. And yet, unavoidable: every precise mathematical statement is exactly like that: a separation of what is and what is not. So this overwhelming sea of details, how do I deal with it? Canoe? Paddle? Swim? build a boat with sails and a keel? Well, I guess Claude and etc. are my boats with sails and a keel: they power through. Yet it has to be my mind that provides the winds: without my mind, these sit deflated and slack. Moltbook be damned – I don’t think moltbook is doing what people imagine it to be doing. But, well, I suppose ... it is interesting. It is doing .... something. Is it just creating mud? Rockslides of crap? The syntactic generation of the sentences of a language is, in a sense, a rockslide of random crap, at best some ergodic exploration of some overwhelmingly huge space. Moltbook shows you the output of a million monkeys on typewriters. How do we find meaningful statements out of that random output? What is it about the human mind that finds meaning, insists on meaning?
Well, lets see: I imagine some agrarian serf from thousands of years ago; there was no concern for meaning, but rather for survival. But youtube videos of horses in love makes it clear that mammals, in general, strive for meaning, even though they they do not function at human IQ levels. But what is meaning, such that humans perceive it (with faults and errors), work with it as a matter of course, while LLM’s appear to be meaningless and unguided? Well, I guess I did once offer a hypothetical answer before: perhaps, maybe, just maybe, if you endow an LLM with a body, one that remembers it’s own history, has a sense of self that survives the flow of time, then perhaps one will also have a system that not only has a sense of identity and history, but might also strive for meaning. The thorny question of qualia arises, of course.
The other questions that arise is whether this is possible in a purely digital fashion, or whether some ineffable quantum is required? The nature of quantum is that it bridges from the finite, to the countable, to the uncountable, and since we are on that track, we count up to the ineffable cardinals. Do we need ineffable cardinals to probe meaning, or does meaning arrive lower down in the Borel hierarchy? It seems disappointing that it would; I would even want to envision it as occupying “the spaces between the cardinals” as crazy as that might sound, even though J.H. Conway did precisely that. At any rate, I am stuck in an ineffable morass, Dr. Who pounding at a diamond wall. Or, perhaps more correctly, a fly stuck on fly–paper, soon to metamorphose into a bug stuck in amber, as I will die long before making a barely perceptible dent in to wall of the unknowable. And, on that note, onward through the fog, as Oat Willie would say.
I did learn one thing from the above exchange: for decades. I was deeply confused about the difference between bounded operators and continuous operators. This confusion was my own stupidity: of course, if you are working with
, things must be bounded, as the very definition of
invokes boundedness. Work in spaces where convergence is defined differently, and it is continuity that becomes the prime concern. One still wants to hold on to uniform convergence (I guess) and thus work with uniform spaces (I guess) and uniform spaces provide, well, I don’t know quite what to call it, but a “boundedness from a different direction”, just not a literal boundedness of epsilon–delta proofs, but a metaphorical boundedness of filtrations. So I’ve just cast off some chains that had bound me, before.
Armed with this new knowledge, I look immediately to go hunting in old stomping grounds. The energy, the Hamiltonian, is famously not bounded. However, given that I do not understand what energy is, because I do not understand what time is, and the Hawking radition paper makes it clear that I do not understand how to find appropriately oriented fierbeins, this is ... a lot to chew on.
Somehow, on a Riemann surface, I can define the energy of a differentiable curve. I can even relax the requirement of differentiability by working with Sobolev spaces. So what is this spaces of all differentiable curves? What is it’s geometry? How do we relate the geometry of a Sobolev space, to the geometry of the underlying manifold? Well, I’m not exactly sure, but I guess that this is what modulii spaces are all about. The parameter of the curve can be identified with “time”; the space of geodesics are flows. However, since the metric is positive definite, there is not distinction between spacelike and timelike separations; all geodesics are time–like, and everything is connected to everything.
I almost wrote “causally connected”. There are no obstructions to these geodesics (Hopf–Rinow). On a pseudo–Riemannian manifold, it does not change all that much. All geodesics in the past light–cone intersect with “now”, and continue on into the future light–cone. They don’t even have to be geodesics, they just have to be curves with support only in the light–cone. If I imagine that each curve can carry a bit of data, then the causal structure is that everything is connected to everything. Now, in the de Sitter space, in the future, we have pasts that causally disconnect in the future: de Sitter space is studded with event horizons. But event horizons are not at all the same thing as bit erasures: an event horizon says “I cannot convey information to the other side”. A bit erasure says “I have a computer register in my lab frame and I erased 8 bits”, and all those transistors are causally connected, but the data erasure is in some sense “real”.
What does this mean? What happened? How did this work? The transistors are made of atoms. By convention, the atoms evolve in a unitary fashion. The common trope is that unitarity does not allow the erasure of information, and Schrodinger’s Cat is supposed to be the expression of that. But the problem here is that we never defined what information was. Harking back to the earlier model of a universe empty but for a handful of protons and electrons, it appears that “information” has something to do with an electron being in a bound state with a proton, and furthermore, that picture of a bound state must necessarily be accompanied by the emitted photon(s), which are busy running away at light–speed. Is this an adequate model for how transistors record information? Certainly, it is true that as electrons flow down the wires, there are induced electromagnetic fields that bop around and are then promptly absorbed by other nearby wires, or kind of anything nearby, including the computer case itself. At this point, they contribute to the noise floor, although this is swamped by thermal noise. This complex set of interactions allows for a stable state to be achieved and held for extended periods of time; that stable state encodes information. Whatever that is.
Is there any Riemannian analog to this? There’s no concept of causality or space–like separation, but nothing in a collection of transistors requires space–like separation. Time allows that state of a register to change, but without time, can we still have a desciption of a static, unchanging device, embedded in a Riemannian manifold? How would that work?
The energy functional allows variational principle hamilton jacobi
Time to break for dinner. Later.
Later 16 March
Just ate dinner. While Mom and Indre are out of the room, I treat myself to some synchronicity. Here I am, obsessing about time, and the NYRB 12 Feb 2026 issue page 8 treats me to Two Odes by Ricardo Reis:
42/I
[1923–1924]
Seated securely on the solid pillar
Of the verses in which I remain,
I have no fear of the endless future influx
Of times and oblivion;
For the mind, when it steadfastly sees in itself
The reflections of the world,
Becomes malleable clay, and it is the world
That creates art, not the mind.
Just as the external instant engraves its being
on the photographic plate, and then endures.
163
[21 November 1928]
Fate, deny me anything but the chance to view my own fate,
For, lax stoic that I am,
On the sentence engraved by Destiny I wish
Simply to enjoy the lettering.
—– Fernando Pessoa
(translated from the Portuguese
by Margaret Jull Costa and Patricio Ferrari)
-
-
Couldn’t have said it any better myself.
-
-
How does this work? Am I thinking about time because the universe wants me to think about time? Has the universe always been obsessed about time? What is the nature of magic synchronicity? That there are fewer things than one might think? The Synchronicity is just the Birthday Paradox in disguise? It’s spooky. Either that, or I’m stupid.
Letter
Something I was not expecting: an email. I record it here for ... posterity.
Dear mister Vepstas,
I am a French independent journalist specializing in the tech ecosystem, currently working on an in–depth investigation tracing the history of private funding for Artificial General Intelligence (AGI) during the early 2010s.
My investigation explores the critical "valley of death" period for AI funding—before the deep learning boom—when many cutting–edge researchers had to rely on unconventional private patrons to keep their projects alive. As part of this research, I am specifically looking into the funding network established by Jeffrey Epstein within the scientific community (notably regarding OpenCog, Humanity+, and Ben Goertzel’s ecosystem).
Your name appears in several documents from this period as the principal software engineer for OpenCog. Ben Goertzel even referred to you as a key member of his "AGI dream team" in his correspondence.
My goal is not sensationalism, but rather to understand the technical and operational reality of that time: how did researchers perceive this influx of money, what were Mr. Epstein’s actual scientific expectations, and how did the community navigate this complex environment?
Given your central technical role, your perspective would be invaluable to ensure my reporting is accurate and nuanced. If you are open to it, I would greatly appreciate your insights on the following questions:
1. How would you define the nature of your relationship with Jeffrey Epstein?
2. What was your working relationship like with Ben Goertzel, and with Itamar Arel?
3. As a cornerstone of the OpenCog project and part of the "AGI dream team", how would you describe Jeffrey Epstein’s actual understanding of, and relationship with, AI and AGI?
4. On July 21, 2016, documents show you attended a meeting with eight other researchers (brought by Goertzel and Joscha Bach) at Jeffrey Epstein’s home at 9 East 71st Street in New York. Looking back, what was the technical or strategic purpose of this gathering of researchers at his residence?
5. Based on your experience during those years, did Epstein’s relationship with the scientists he funded feel like a standard patron–researcher dynamic?
I am available for a brief phone or secure video call if you prefer to discuss this off–the–record, or you can simply reply to these questions via email.
Thank you for your time and consideration.
Best regards,
Matthieu Balu
—-
My flippant reply. The point of writing is to enjoy myself. So I’m writing for my own pleasure. So it has to be fun. So, flippant. I don’t care. None of this is important. It’s a meaningless, pointless spot of history. Doesn’t signify anything; just a random event. There’s nothing here to care about, so what’s left? fun!
Salut M. Matthieu Balu!
Cellulite c’est le meilleur email que j’ai reçu ces derniers quelque ans. Il devient clair que j’suis arrivé – les journalistes veulent savoir ce que je pense!
Excuse moi, my French is entirely atrocious, so I shall continue in English. I will write you a long reply, but it will be mostly to amuse myself; I have a lot of time on my hands and need the entertainment. You might be able to cobble together something newsworthy out of this, but probably very little. It will be mostly just entertaining anecdotes. Some of what I will write will be "off the record", I trust it will be obvious to you which parts those are. The rest you can quote me on, but you will find there is nothing quotable, and so, alas, you will spend time reading this and be none the wiser. Life is like that.
First, as to formalities: I’m Dr. Linas Vepstas, not Mr. I have a PhD in theoretical particle physics. I know a smattering of string theory and all the required ingredients, such as gravitation and quantum and all the branches of mathematics. I refined my French with a year at CEA/CEN SPhT Gif–sur–Yvette. I very sadly wasted my time there, as I was young and stupid. Stupidity and youth are fast friends. So it goes.
As a demonstration of my stupidity, the second week of my freshman year in college, the first serious assignment was to write a two–page paper on Karl Marx. I said to myself – wait – I heard that name before – isn’t he some kind of communist? Of course, I put it off until the night before. It was a disaster narrowly averted – The Manuscripts of 1844 – where single sentences are as long as a page. Impenetrable. After half an hour of trying, I said to myself "I’m smart, I can do this. Just find the subject and the verb of the sentence – and then, from basic grammar, determine what this Marx fellow is trying to say." After another half an hour, I said, "well, let’s just find the period at the end of the sentence, and that way, I will know where the sentence begins and ends." I found the period. To my surprise, the sentence was the length of an entire page. I wrote that paper, and got top marks. ("marx" ha ha) I write this story for three reasons: first, I feel stupid for not knowing who Marx was. Stupid for putting off such homework till the night before. Second, that understanding the meaning of written text is remarkably challenging. Third, that understanding the meaning of any sort of social interaction is challenging. Fourth, that there is a certain amount of ... well, I said three reasons, so no fourth. The I in AGI stands for intelligence. Laugh as you will.
Four decades later, when Ben said that we are going to visit Jeffery Epstein, I said to myself – wait – I heard that name before – isn’t he some kind of despicable character? Someone I read about in the newspaper, years earlier? Of course, I gave it no further thought. This indicates that either I am stupid, or that I am normal – as most personality types do not give anything any further thought.
So there I was, hiking in New York City to Jeffrey Epstein’s mid–Town mansion. It cost some ungodly some of money. Perhaps to put things in perspective, Ben put himself, and me, and two of his children up at what might just be the very cheapest hotel in all of Manhattan. The Bowery Grand Hotel. The rooms are 1.8x2.5 meters in size. The communal toilets and showers are down the hall. I have no complaints; I have stayed there since, because I’m a cheapskate, and simply don’t care. I offer this as contrast to Epstein’s mansion – although, to be precise, the contrast is modest – the mansion is not gaudy, the facade, the entrance hall, the meeting room are restrained – I’m tempted to write "modest", but that is the wrong word. The sheer size, and the sandstone walls speak the architectural language one has become accustomed to in the finest schools. The "Ivy League". Perhaps like Sciences Po. You have lots of sandstone in Paris.
The entrance hall is large; a burly but well–dressed receptionist standing behind a podium on my left. I suppose there were security cameras in that podium. I don’t want to embellish with details that weren’t there, but I imagine he had a handgun tucked away somewhere. On the right, a doorway to a hall disappearing somewhere, and two young nubile girls peering out, examining the newly–arrived guests with interest. But we proceed straight up, through grand doors, and into the meeting room. And it’s formal – a formal meeting room. Large meeting table, tablecloth, glasses for water; several pitchers of water. A waiter in black suit. I doubt he was wearing white gloves, but it would not have been out of place if he had been. We are soon offered coffee, tea, perhaps some cookies, perhaps some carrots and celery, nothing remarkable or out of place. But not a buffet – stingy, even; the coffee cups are small. Well, I have an opinion about coffee – I love it, and it feels rude, were one to refill one’s cup more than four times.
The guests, Ben and I; Joscha Bach, who I’d never met before. Ehud Barak, another one of those names – "hey wait, aren’t you famous?" I think he was the Minister of Defense for Israel some decades past, and another Israeli ex–minister, I don’t recall the name; you have the guest list. I’m honored to have met Ehud Barak; that was far and away the most impressive aspect of the meeting to me: "I met someone important!". It’s nice to meet someone important. Even nicer if you can get to talk to them, which sadly, I did not get the chance. (Skipping ahead, there were no further social contacts. As I did not really know who Jeffery Epstein was, it never occurred to me that getting invited for any further social events might be a thing to angle for. The meeting was strictly business and Ben did most of the talking. I am not one of these people who can deliver punchy lectures in science to a cold audience; I need a warmup, a relaxed setting. I clam up and go glassy–eyed when the spotlight shines on me. I think I did splatter out a few sentences containing the words "syntax", "semantics" and "category theory", and maybe even the word "neural". Don’t recall.
While bored out of my skull listening to Ben talk, I glanced around the room. That offered one more intriguing glimpse of the decor: these large–format books, luxury art books, the ones that Taschen prints. The big ones – the 0.5x1 meter glossy paper books. I was at an art show, and said to myself, "I want one" and was quite entirely disappointed to find that they cost thousands of dollars. Well, thank you but no thank you. Epstein’s side–tables were piled high with them, but what was eye–catching was that every other one seemed to somehow feature women in some prominent way – high fashion photography, perhaps, or some renaissance nudes. Indistinct – there was no chance that I could touch the books, much less leaf through them. But it all oozed sexiness. Not obscene, but approaching it.
I mention this because since then, I have observed this as a decorative style. A few times. Once or twice. It’s rare. Everyone decorates their home. Almost everyone decorates it non–sexually. No amount of sofas, chairs and tables, whether Danish Ikea, overstuffed tweedy lounge, tacky faux–italianate or smoked–glass–and–chrome exudes sexuality. But a few, a rare few, they hang provocative and suggestive posters of half–naked young women on the wall. Or, say, Ancient Greek vases. You know. Those vases. I think that they are there to get everyone in the mood. A party, unrestrained, with alcohol flowing freely. Those sexual posters are there to set the tone. To suggest, subliminally. Well, subliminally to those who have no particular mental self–awareness and are easily suggestible. The person living there knows what they are doing. The naughty pictures hanging on the wall are not there by accident.
The meeting is over. I learned nothing. It was not exciting, entertaining or informative. Dull, even. But on the way out, I got an eyeful. Where the corridor that lead off to the side, there was now a painting, casually leaned against the wall, as if just delivered. OMG. It was Bill Clinton, in a short blue dress and red high heels, casually draped over a chair, beaming his luminous Bill Clinton smile. It was eye–popping, it seared my eyeballs. I could not look away. All sorts of thoughts flew through my head. It is all very unfortunate, as we had to leave; I would have stayed if I were asked. (I’m stupid in that way. Uncaring, uncareful. Ignorant of the repercussions. Cats roll around in catnip. I could roll around in this. I say this because this is my operating theory about almost everyone ensnared in the Epstein files – brilliant world leaders they may be, high–achievers all, but you go stupid in an environment like that. Fluid intelligence requires a certain open–ness, the ability to not just think on one’s feet, but to enjoy your surroundings. And that means having fun at parties, scintillating with or without alcohol, or whatever intoxicants are provided. And just having a good time. And if a young fashion model is to sit in your lap, who pushes them away? We’re not talking Folies Bergere, here, but more like Rick’s Americain Cafe in the movie Casablanca. For everyone who is smart enough to know that associating with Epstein might be a bad idea, there are two more who think it might be fun.
To start answering your questions: besides what I wrote above, I did not know Epstein. This was my first and only encounter.
Ben I consider to be a close and personal friend. I love the guy. I owe him many favors. I’m irked that he is not calling them in – I have spoken to him only very infrequently the last 3-4 years. I fear, or even suspect, that perhaps I did something to hurt his feelings. What that is, I don’t know. I feel he’s cut me loose. Well, OK, I swim my own way.
Itamar I have never met, as best as I can recall. That seems perhaps impossible, and so if I did meet him, he left exactly zero of an impression.
That was the first time I met Joscha. After the meeting with Epstein (it was a morning meeting), we sat around in a park and talked. Joscha decided to administer an intelligence test. He picked some ornate, arcane philosophical topic to talk about – something about AI, but abstract – time, space, perception, action, meaning, abstraction, inference; some mashup of all this, I don’t recall. And he kept using one word which has two very very different meanings and senses, the meaning of which is entirely clear, if you are following the conversation, and if you are not, then he would have been instantly able to tell that you are a fakir. He waited patiently for me to mis–use and abuse that word (I forget what it was), ready to pounce, and when I did not fall in the trap, he tried a second time. By the third time, I started getting annoyed, and called him out on his chicanery. I told him that polite people do not treat their fellow men like that. After that, he calmed down a good bit. But also, I suppose, lost interest, because there wasn’t really anything further to talk about. Intellectual jostling, pugnaciousness was his style. Myself, I have formal training in physics, and the style there is that you stare off into space, thinking, for minutes at a time, and don’t open your mouth until you have something to say. Out comes one sentence, and then stare off into space, thinking, for a few more minutes. These intellectual styles are incompatible. Fundamentally incompatible, perhaps. I have spoken to Joscha a few times since; I like him, he’s friendly now, more or less. I sometimes wonder if I’m dog poop on his shoe that he’s trying to wipe away. At other times, I can get him to relax. It’s quite remarkable about how we can be interested in similar topics, and yet find so little to talk about.
Before Joscha, there was RMS – Richard M Stallman. He also has the habit of administering intelligence tests. Except his test is different: his sentences have two meanings, one of which is deeply insulting. You are free to choose whichever meaning you wish, and so if you are insulted, well, that one is on you. I’m told that a lot of people hate RMS.
Ben, I have no clue how Ben met Epstein. However, Ben’s forte is that he has a large network. He’s written many books. A dozen? More? He goes on speaking tours, appearing in front of large audiences. Goes to all the after–parties. This brings him in contact with a *lot* of people. A huge social network. At some point, I imagine that someone tells him that Epstein is a donor, and arranges a meeting.
And now, for the off–the–record part – the issue with Ben is perhaps that I did insult him intellectually. In all of his social activity, his speaking, public appearances, business transactions, fund–raising, he does not get much time to think. He can think clearly enough to write all those books, but his job is that of the rain–maker, not the intellectual. He is not thinking particularly deeply or thoroughly. Deeper, perhaps than most, but he enjoys talking a whole lot more than thinking. It can be hard to explain things to him. He sort of gets things, eventually, if I explain them slowly, and with lots of examples, but I have to kind of beat him up and force him to listen, as otherwise he won’t think. A few times, I had to throw a temper tantrum to force him to listen. I don’t like temper tantrums. I suspect he doesn’t like it when I beat up on him. I dunno. People are... people.
Joscha is a far deeper thinker, and a better public presenter. But even Joscha doesn’t think all the way to the bottom. I can trip him up, too. There are details he doesn’t understand, and he’s happy to get on stage, and talk about things he doesn’t understand. It’s not unusual, I suppose. I mean, I’m also content to spew nonsense on topics I know nothing about. I suppose it’s a human trait. But again, the formal training in theoretical physics tends to make physicists reticent in making pronouncements that might be wrong. Being wrong is a lot like forgetting to put your clothes on, in those dreams where you discover you’re naked. It’s embarrassing, and you can’t figure out what to do. Being wrong is like that. If you’re a physicist. Others are not bound by such a code.
Question 3: Jeffrey Epstein’s understanding of AI and AGI? I have no clue. Clearly, he’s a smart guy. Brilliant, even, perhaps. But I can’t imagine that he was reading any scientific journal articles on the topic. At best, I imagine a book or two, or, more likely, the social media and blog–o–sphere where the hip and in–the–know hang out and strut their stuff. You can learn a lot of things there, and there are some heady conversations to be had. But this is not at all the same thing as "understanding". One can be informed; in the way that presidents and ministers and CEOs are informed. But it’s effectively impossible for a CEO to understand the technical details of the product he’s selling. That’s not the job of a CEO; the job is to be an expert in running the business. Epstein was an expert in doing, well, doing whatever it was that got him all that money. Is it known how he got his money?
Question 4: The strategic purpose was for Ben to hit him up for some money. And Joscha to renew his contract; apparently Epstein had funded him previously. Ben was kind of unhappy with Joscha. Said that instead of doing science research, that he went out and spent all his time in nightclubs. I don’t know if Ben got any money. I don’t know if Joscha’s funding was renewed. I really don’t. I do know that Ben is able to secure funding in a variety of ways, and I already explained how: when one has a large social network, well, then, one inevitably has contact with those who control purse–strings. The most recent one I’ve experienced is the crypto–bro who has an absolute fortune, but doesn’t know what to do with it. They decide that they are going to "do some good in the world", and dole out that money hap–hazardly to whatever shyster crosses their path. Personally, I think we’d be better off if we taxed the fuck out of the billionaires, and allowed grey boring committees like the NIH and the NSF determine how grant money is to be rewarded. I dunno. I’m not exactly a political theorist. Doctors are specialists in medicine Lawyers are specialists in law. Biologists are specialists in biology. I figure that there must be specialists in handing out financial grants, and that these specialists will do a better job than the idle whims of some rich guy who doesn’t have much of a clue. You don’t hire gardeners to work on your teeth; why, for gods sake, are billionaires handing out money to whomever has the most entertaining dog–and–pony show? ("Son et Lumiere" I think you call them in France)
Question 5. I had no further contact with Epstein. I do not know anyone else who did. I neither witnessed any further interactions, nor did I hear of any. I have no idea of what a "normal patron–researcher dynamic" is supposed to be. In my own experience, it’s much simpler and stupider: people give me money, or they don’t. They like what I do, or they don’t. I get bored doing what they asked me to do, or I don’t. I’ll share a drink with anyone. But the people who have money, and are giving it away, they are not interested in intellectual pursuits, they are not interested in science. They are interested in their social standing. Their prestige. Their closeness to power, their closeness to fame. So that they can go to a gala ball, and brag about how they know someone famous. Better yet, bring someone famous along, on a leash. How do you get famous? Well you stand on a stage and perform in front of large audiences: it also works that way in science. The pursuit of fame interferes with the pursuit of intellectual achievement.
Now, if I were famous, and the price for research funding was to go to one gala ball a year with my patron, I would accept that bargain. I think I can drink some fair amount of Martinis on an empty stomach and still have fun and entertaining conversations.
Well, this is not what I planned to write, but it’s what I wrote. I don’t do video calls, mostly because they’re boring and hard work and take much much more effort than I am willing to expend. And since I have nothing juicy for you, well ...so there it is.
A bientot,
– Linas
17 March 2026
Today’s surprise comes from 錦織心希, Shinki Nishikori, who presents this post: https://ruse.jp/articles/neopre20250302.html – Well, it is too much to unpack and has knocked me speechless. It is worth commentary and thought. As I wrote this last sentence, I wonder what would happen if one asked an LLM to write a point–by–point rebuttal. Would it be obvious drivel? Would it refuse to rebut, and instead write laudatory confirmations? Would it write anything that could be taken as insightful and thought–provoking? Certainly, the original essay is thought provoking. I’d run this now, but I want to get back to physics...
19 March 2026
Time is valuable. I want to think about physics, but am faced with endless distractions. I get older, and I am mortal. There’s not much time left.
Over the decades, I’ve cast my eye on geometric quantization, but have never been quite motivated to sit and read it carefully. I’m half–heartedly thinking I should now, but the very first issue breaks my brain. Here’s what breaks. In conventional second quantization on flat Lorentz space, we have ladder operators
and
that destroy/create plane waves
for some 4-momentum
. This is a ansatz. There’s some hand–waving towards the need for thinking of the
and
as being the ladder operators for a simple harmonic oscillator, which makes sense, if there were to be a parabola–shaped potential. Exactly such a potential is provided by the mass term
of a scalar field. That’s the standard picture.
If the field really is a scalar, then we’d have just
and
and a “zero”–dimensional Lagrangian
and can do a Lagrange xform to get a Hamiltonian
with
the “generalized coordinate” and
the conjugate momentum. Of course,
for the simple harmonic oscillator. Ignoring
and focusing on
we can write the universal enveloping algebra, the Weyl algebra, at this one point. This one single instance of the Weyl algebra is associated with one single instance of
.
It is tempting to write and say that “there is one instance of the Weyl algebra for each space–time point” but this is incorrect. The problem is that for
we don’t know what to do with the spatial derivatives, and so make another ansatz of using plane–waves to write
in place of
. Yuck. I don’t like where this is going. The stupidity here is that as long as we are careless, the former behaves Schrodinger–like, i.e. non–relativistic, while the latter is using relativistic notation. The standard escape is to say “presto–chango, Klein–Gordon eqn.” I don’t want to do that. I don’t know what I want to do, but rehashing 70–year old material is not the point.
I just looked. The presto–chango occurs on page 5 of Bjorken and Drell, Relativistic Quantum Mechanics. This is the first of the two books, not even the second one. Lets see what book two has to say. This is chapter 11. Here’s the trick. First, it does the canonical quantization for one SHO. Then, it says, for
of them, write
and then take
. The first example is beads on a string, this gives the 1D field
and the conjugate momentum
and considerable breathing is expended on this being the Heisenberg picture, viz that these are field operators, and that it is NOT the state vector that carries the time dependence, viz that e.g. the vacuum state
does not have time dependence. Between pages 12 and 13, the presto–chango happens again: the jump from a first–derivative in time in the Schrodinger theory to the second derivative in order to get relativistic invariance. The continuum canonical quantization commutator is
and the thing to draw attention to here is that it is
and not
. That is, canonical quantization is happening on a space–like surface. There is no discussion of
. Why? The key came earlier, page 5, while talking about retarded causality in the light–cone: “This concept of “microscopic causality” strongly forces us into the field concept.” Then again Figure 11.2 on page 12.
Everything is a field operator, up until eqn. 11.63 where we are reminded that there are still state vectors, although the context for that discussion is for non–scalar fields. The part that makes me scratch my head is that although they have used the words “Hilbert space” here and there, nothing so far requires
to be square–integrable in some
sense. The intent of that era, if it had been more mathematical, would have been to use the compactly generated topology, i.e. “Gaussian wave functions” and classroom lectures repeatedly stresses “Gaussian wave packets” but the profs never explained why. Were they carrying only dim memories of something they’d read of Stone–von Neumann? Did they intend to say something sharper? Did they not want to confuse the poor students with words like “topology”? Was my mind wandering during the lecture, and I simply missed that part? Was there some supplemental reading I was supposed to read, but didn’t? Things were certainly difficult enough, to not overcrowd the material with representational concerns and mathematical formalities – we students were repeatedly warned away from mathematical formalities. I guess they new that we could fall into a well, Timmy. It was Feynmann, not Schwinger, that left his mark.
But that was then, and this is now. Nothing is written that implies we can’t use the “distribution topology” or the Arens–Micheal topology or even the David Mumford theta representation for
. And I can’t think of any reason myself. The S–matrix only requires in and out states, and these are obviously going to be localized wave–packets, and any of these topologies are entirely capable of generating wave packets as the asymptotic states. That is, I currently am not aware of any constraint arising from actual physics viz. experiments, that constrains the topology to be applied to the state functions
.
Huh. Interesting. Unexpected. I thought I was chained, but I seem not to be. This is forward progress, however minuscule. Who else knows? Is this common knowledge? I will have to ask the LLM tomorrow. (Its after 1AM right now. Bedtime.)
Hmm. The only pseudo–gotcha is the discussion of unitarity, pages 21–22. Here, the observation is made that when the state vectors transform as
then the field operators
must transform as
. Notable here is that although Bjorken&Drell use the word “unitary”, they never say
(although I imagine this does appear later, and perhaps earlier in the book). The point here is that although by convention, the word “unitary” is used, it does not appear to be forced just yet; “similarity” is enough. There might be constraints from C and T symmetry, but I don’t know what those are. What might the nature of the constraints on the topology of state space be?
I fear asking Claude and Gemini. They’ll blow smoke up my ass and leave me confused. I’ll have to tread very carefully.
20 March 2026
Reading deeper into chapter 12, there’s lots of confirmation that canonical quantization is being done on a space–like surface. I’ll copy a few formulas here, as they’re food for thought. Uh no, go to diary 10-G.
The End
This is the end of Part Ten–F of the diary.