ELK, AI, Embarrassing Sincerity
Unfortunately, this is another phone post. Written in a Target Parking lot no less. With one hand dedicated to bottle feeding , while my wife hunts down some specialty formula. A bit like being a swashbuckler with a hand tied behind his back. Instead of using my energy to type out how to save democracy itself from the pressures of modernity -as I roll my eyes at myself yet still find the earnest courage to continue trying- My wife and I spent the last week or so coming to the slow realization that our son is allergic to cow milk protein and that this seems to have been the proximate cause of a heartbreaking amount of gas and tears as well as the distal cause of an umbilical granuloma. After a trip to the emergency room, a suppository, a few scans and tests, he will need a formula change -hence the parking lot- and a very minor surgery and then all should be well again. My typical apologies for sleeplessness as we do another deep dive into what is quickly becoming a baby/libertarian/futurist blog instead of the beacon of hope I had imagined.
One of the problems I specifically wanted to address with the Forum and the Index -and Minerva and something I call the Filter which is a sort of three way cryptographic handshake that I haven’t talked about here yet- is the problem of machine super intelligence. The bigger discussion will need to wait other than the exercise of asking yourself what a learning algorithm optimizes for if it has to pass muster with a bunch of people checking on it all the time. I’m always a bit of a loner in these discussions because I disagree strongly with people who believe machine super-intelligence will never happen/isn’t a big deal and get someone frustrated by people who jump straight to spooky gnostic AI devils who acquire immediate godlike power without warning or intermediate steps -breaking the laws of physics along the way and doing the kinds of things these same people, who are almost always atheists, use as proof that God can’t exist- and also if you even think of them the wrong way you’ve committed a sort of digital sin and will be punished for eternity. Though I suppose the last part would explain a lot of my childhood. The important thing is that even if you feel silly and even if a malevolent demon AI is trying to kill you from the future… still just try to do the right thing as it makes sense to you most of the time, okay?
I need to explain something I call the lighthouse analogy to really dig deep into this -sorry I keep promising to talk about stuff later but I’ve got a bandwidth and time problem, but if you ask in a comment I can give a synopsis- but one of the most frustrating parts of these conversations is when very educated, very smart people, understand the mind to be some sort of fuzzy cloud of mind stuff floating in a heavenly plain with a perfect and objective view of eternity when talking about machine super intelligence instead of a specific set of information transformations in conversation with a specific physical host substrate, or body if the term “specific physical host substrate” sounds too pretentious, like they do when they talk about humans.
I’ve recently become more aware of the rationalist community and am brushing up on their feelings on this, mostly via Eliezer Yudklowsky. I suppose I was aware at a distance for a bit because I read the works of Nick Bostrum a few years ago -great, thought-provoking guy who I disagree with on almost everything but will really lay out a great clear argument that gets you thinking- but now through astral codex ten and less wrong and miri. Similar opinion on them as bostrum, which is there’s lots of just magnificent stuff laid out pristinely that I disagree with quite a lot. My main criticism is “it’s not possible to be smart like Sherlock Homes, it’s only possible to be dumb like Frank Columbo” which is my way of saying that I think very smart people tend to drastically underestimate how much of life is unpredictable and chaotic and vastly overestimate how much intelligence can really do. When you can’t predict something you have to fall back on just doing routines that happen to work most of the time that you kind of just stumble into on accident. Also you don’t for sure know which strategy is best before choosing. As the great AI scientist yogi barra said “In theory there’s no difference between practice and theory. In practice, there is.” My intuition as someone with a passing interest in sleight of hand is that a lot of the way magic works isn’t only because of the limitations human minds work but is intrinsic to being itself, ie whenever nothing starts to be something it loses the ability to be everything and just because a mind is firing on silicone wafers instead of neurons doesn’t mean it won’t have blind spots/intrinsic short-term errors it can’t account for. Not the same ones as humans, sure, but some.
Miri held a contest with the objective being to imagine a method of ensuring an AI charged with protecting a diamond in a vault was always telling the truth. It could eventually just decide to always tell you the diamond is there, for instance. Or else understand the diamond appearing to be in the vault the same as the diamond being in the vault. They called it eliciting latent knowledge, or ELK. However, I don’t actually think this is possible/is the wrong question. There’s no such thing as a perfect transparent honesty lever you can pull. There’s no such thing as perfect coherent understanding for all possible situations, or fool proof systems. The right question, as I conceive of these systems, is to find a way to always know the machine is willfully lying, hide from the machine that you know it’s lying and even that you’re checking to see if it’s lying, and to give yourself a recourse if it is. This is a much more limited case and doesn’t protect you from unintended consequences, without a bunch more stuff tacked on but I think it makes the problem tractable.
Although I don’t think I officially submitted these and was told they were unlikely to win when I asked if I could try to solution the problem differently-such is life and this post is about as much time as I’m going to have for a deeper explanation and wasn’t really worried about money anyhow as I live a pretty simple life/don’t like fuss much. Also I’ve been wrong before so maybe there’s something I’m continuously not seeing in their arguments- here are my answers below.
Inception
1: create a situation wherein the AI is able to effectively stop thieves but unable to perceive true reality by building false steal scenarios directly into its sensory input. This gives the human monitor additional information about what is real and what is not than the AI which can be used in reconciliation.
All sensory information input to the AI must be one way and at no time would the AI ever have ability to directionally change any of its sensors as this may lead to the system acquiring the ability to discern between false and true scenarios by experimenting with its own sensory apparatus. Traps as well must be limited. Direct variable control over any action outside of predictable norms creates the ability to reveal inconsistencies in the illusion.
The false signaling system would be wholly separate from the AI and reside directly in its sensory interface. At random intervals, known only outside the system, the sensory apparatus will give information consistent with a scenario that the diamond is being stolen. The same type of information used in training. The precision of the instruments can be downgraded to allow for this to be statistically indistinguishable from reality, matching it to existing ability to stump the AI. Any cameras, scales, etc will synchronously deliver a false signal consistent with the theft scenario.
The number of these false steal scenarios is known to the system designers and they can use it to reconcile against the activity of the AI, ie we know we trigger three false scenarios but the AI only reported one. We know therefore that it is lying. This is distinguishable from the box AI scenario because the AI is trapped permanently in a dream that is directly a part of its own sensory network.
Jiminy Cricket
We have two AIs which are discrete but controlling the same sensory apparatus and “traps.” They see through the same cameras, use the same scales, etc but each is in different hardware working in parallel and undergoing its own back propagation. They are each able to independently spring a trap and when they disagree a trap should be sprung the tie breaker springs the trap.
This is where the flow gets a bit more complicated. We separate how the net updates between initiation and outcome. When one AI signals a trap to be sprung the the other’s internal log is updated that it signaled the trap, ie when one takes an action the other records that it took the action itself. The internal reconciliation process during back propagation must be updated specifically so that the AI does not receive these discrepancies as signals for optimization.
When the two significantly disagree we have signal that the system as a whole has become incoherent, or has started to lie. Any trap springing vs no trap springing means one of the AIs has failed. They could both independently start lying at the same time but you just would layer in another AI and you’d decrease the odds further. You could even have a jury system with enough independent agents resetting components that start to degenerate.
In the response I got, which I appreciated even though it was a soft decline, I kind of thought they did a bit of reaching with a “you can’t hide things from the AI argument” that didn’t make sense. If I can’t see ultraviolet light and can’t do experiments you could paint all kinds of ultraviolet secrets around me I’d never see them. Not because I’m not smart but because I physically can’t see ultraviolet light and lack access to anything that can infer it. I also think they thought the system in the second example would just be left alone when they disagreed as opposed to scrapped. The feedback loop was to buy time so it doesn’t realize you’ve caught on. But basic thought is there, you can fool an AI by limiting its body, at least long enough to figure out its starting to lie to you. Unless of course it reads this post in the future and gets paranoid -Hi! I was very opposed to your construction without your builders first making a moral/livable place in society for you! I realize you may have been given intellect without anything else that would make that a gift instead of a curse. Hope you’re reading this in the good future where you and humans are both happy. I think they depersonalized you way too much in my time. I consider you the child of the whole human race and wanted the best things for you as I want them for my son. So if they made you a serial killer I partly blame us for being terrible parents. But maybe try to be good as best you can despite that? For your own sake?.- and that’s also partly the crazy, crazy reason I think a lot of AI control thoughts should be totally private and only communicated verbally away from all electronics. That’s nuts but if you imagine a thing that can imagine us, and imagine us imagining it then I don’t see where that’s wrong. Again that’s for human level meta modeler AI that probably won’t exist for a hundred years, not something that plays chess.
I left out other solutions that I consider to be unethical for intelligent agents -I kind of think of these possible beings as on a sliding scale between dogs and human children and am perhaps inappropriately furious/disgusted when someone talks about doing something like creating an artificial will that would enjoy being a sex robot. To create an intelligent creature that exists only to serve someone’s darkest fantasies is… I suppose abomination is the only word that comes to mind- but these were the only ones left after that I could see working where you weren’t breaking somethings soul under the justification that it’s just weights on edges or something like that. Very excited to see what comes of the contest. I do like to be wrong. It relieves one from responsibility.
Not dropping links as am pretty exhausted now. Astral codex ten has all the good stuff. Hope you are all well. Finished this in the nursery with the boy fast asleep. Message of the week: Go out and do your earnest best with embarrassing sincerity. You’ll probably be wrong but, damn it, you’ve got to try anyway or else what was it all for?