How to On Purpose Make a Sane AI/LLM
And my ulterior motives for building the Index or the Decentralized News Editor
I know I meander a bit and that this isn’t helpful when I try to convince people of things. I also share too many anecdotes and pause too often to make jokes. Then there’s the whole pseudonym thing. Not very confidence inspiring. However, if you have been reading closely, I will now combine several of those discordant ideas and thoughts together into a method of training a large language model to be sane in human terms.
If you will recall in this post, I defined sanity in terms of in groups and out groups, where sanity means you are able to convince another person that you are sane. Then I further divided this into subcategories, where some people can convince another person they are sane without telling any lies and others must tell lies to do so. I also added a dimension of effort, where some people can convince another person they are sane without having to exert effort whereas others do have to use up effort and can eventually run out.
I have several times described a system of adversarial arguments and jury adjudication for algorithmically producing the strongest possible human created news service. I have hinted several times that an alternative use for this system would be in producing a training set for a fact-checking AI. This however is only component of the overall architecture I would use to train such an AGI.
I have also several times described intelligence in terms of agents, comprised of many overlapping systems and feedback loops. Intelligence is not merely an ability to calculate but an ability to calculate toward interesting futures. I will now do my best in the hour or so I have before bed to describe how these might be combined into an architecture that produces an AI in line with human sanity that will appear as sane to most humans and do things that most humans find to be sane while also internally experiencing its own thoughts as experientially sane.
As humans argue on the internet and prove themselves to be correct or incorrect over time —as understood by others humans, of course— and produce a record of these interactions on the Index, four interesting overall artifacts are produced in this process. The article in dispute, the defense of the article, the prosecution of the article, and the results of the jury adjudication. There’s a bit in here about expert juries I don’t have the time to go into but it has to do with mutual intelligibility within groups.
All of these create an abundant training source for an AI to “understand” what humans determine overall to be good arguments. Run your deep learning model on this set, or your existing LLM that has this as a sub-component, and train it to optimize to produce winning arguments. Have your AI begin to create articles as well, and allow it to learn from the adjudication feedback it receives from humans. Have it participate in every part of the process, at limited scale, and receive feedback. It will not have direct access to “the truth” but then again, it never could. Even if it is so blazingly intelligent it can determine the source equation of the universe that doesn’t mean that it can have a correct interpretation of it, as there are infinite interpretations and all that matters is that it chooses on that is acceptable to humans.
Not only does your model learn what makes a good, sane argument it learns what makes a bad or insane argument. Now, here’s how we cause the model to become ever more sane. Simultaneously develop it to categorize insane arguments as insane and then also instruct it to continuously optimize its good argument function so that it can deliver these with fewer and fewer parameters but still receive high jury scores. Now your model not only produces good arguments, but does them with less and less compute which drives a physical reward system and feedback loop on the model itself. Optimizing it to be convincing to the largest group possible in the jury is also an advantage in terms of reducing human conflict.
It may take decades to produce enough content in the Index for a deep learning model to optimize itself which is why it is important we begin as soon as possible. In this post, I mentioned an additional steps that were non-obvious that I would take with a LLM like chat GPT. Namely, I would have the LLM incorporate google into its context window when giving answers and have a shell layer that incorporates it to hold all of these answers.
Now imagine that within your Internet News Index you’ve also got citations of relevant web-links for arguments. Now you have a training set to teach a model how to identify relevant text for an argument. So this model boosts your search engine results and refines the results. This model then puts these results into the LLM model which produces the argument and the training model (either the LLM itself or a third model if enough training data couldn’t be generated) determines if the argument is sane or not based on likely jury scores, meaning it’s acceptability/sensibility to humans.
In theory, this gives you a machine that takes the limit of human knowledge and applies it to any input to tell you if it makes sense or not and why.
Describing a social network directed by digital GAN - Generative Adversarial Network! Be interesting to see this in action. I’m torn on whether we will get a real AI. I think intelligence is deeper than the data the models are eating. A highly sophisticated computer agent built for a specific task - that I see.
A sufficiently motivated actor could convince mediocre reputation accounts to poison or boost a story very subtly. Imagine havoc of a bot net riding off real credentials. Things teenagers would do without a fare.
Am babbling. Likely sleep drunk.