I first met Gary McGraw at a USENIX conference where he was giving an invited talk on exploiting online games. By this time, Gary had co-authored eight books on security and had grown into a senior position at Cigital.
I recently learned that Gary had co-authored a tech report listing 81 security risks involved with current large language models (LLMs). I was curious about the report, but also about how he had made the shift from software security to machine learning (ML). I discovered that Gary had not made a leap, but rather shifted into an area he had worked on in the past.
Rik Farrow: Are you still working?
Gary McGraw: I am officially retired, but I have an institute called the Berryville Institute of Machine Learning. We just had our Friday meeting, in fact, where we read science papers and talk about them. And then every once in a while, we create a major piece of work, put it out under the Creative Commonsand we are gratified when it makes a big splash.
RF: When I look at your background, I see that you wrote twelve books, mostly between 1994 and 2010, on software security and Java security.
GEM: That's right. I wrote my first two on Java security, and then I started wondering why the Java guys, who were so phenomenal at engineering in theory, screwed it all up. If you weren't like, I don't know, Guy Steele or Bill Joy, where did you go to learn not to screw it all up? The answer was nowhere. So, John Viega and I wrote Building Secure Software in 2001.
I was working for Cigital when I wrote Java Security. We grew that company to about 500 people and sold it to Synopsys in 2016.
RF: Ah, this is why you're retired. Well, let's back up again, way back. You were Doug Hofstadter's student, and you received a dual PhD in computer science and cognitive science. Your thesis was about analogy-making and perception, if I understand it correctly.
GEM: It was about creativity and perception. It was about how you would design a gridfont in a micro domain by looking at a few letters, figuring out the style, and designing the rest of the letters in the same style.
RF: Now you're looking at risks in machine learning and LLMS, so there seems to be a bit of a leap that you've made away from just software security to AI.
GEM: I studied AI and cognitive science for seven years in the '80s. When I was working with Doug Hofstadter, I wrote an AI program that was pushing the limits of what you could do on a computer at the time.
Ultimately, I ran my thesis code many hundreds of times on a DEC Alpha. I had my code running everywhere at Indiana University. People were mad at me all over campus. Who's code is this? Why is the cursor crawling across the screen on my Sparkstation or whatever?
We built lots of stuff as grad students. So when I left Indiana, I had 50 publications, including a whole bunch in neural networks and connectionism, both at IJCNN and at what was the precursor of NIPS. The first thing I ever published was a fully recurrent neural network in 1989 based on Williams and Zipser's math.
I was deeply steeped in AI, emergent computation, and Hofstadter's models, a little bit of genetic algorithm stuff, and a whole lot of connectionism.
RF: How did you get involved in security?
GEM: When I went to work at Cigital, I got started in security. I didn't know diddly-squat about security then. In '95, we had a DARPA grant we had to execute, looking for buffer overflows in code automatically. We started with dynamic analysis, and we ended up beginning to invent static analysis, which later became the the Fortify tool. Static analysis for code came directly out of a few DARPA grants that I that I worked on a million trillion years ago.
When I retired, I was wondering, what's going on in machine learning 30 years later? How can people claim so much progress? What's changed? And we started reading papers at the edge. We were like, "These DNN [Deep Neural Net] people don't know anything about security. We've got to write something down about this." Like they were completely clueless; and they're still clueless.
RF: Well, people generally are clueless about security. Who are the other people you're working with on this?
GEM: There are three co-authors on this report. Harold Figueroa worked as the Director of the Machine Intelligence Research for a company that works with the Federal government. He now is a full-time BIML guy. Then there's Katie McMahon. She has become a local here in Berryville, but was VP of Shazam and has worked on sound recognition and music understanding techniques for twenty years. Richie Bonnet. who grew up in Berryville, now works as a security engineer at Verisign and is also a member of BIML.
RF: You published your first ML risk assessment report in January 2020, and just recently published a new one focused on LLMs, An Architectural Risk Analysis of Large Language Models.
GEM: We felt like we could take our architectural analysis of generic machine learning and bind it down to the special case of LLMs. We wrote a report, which we published on the 24th of January 2024.
RF: Is this a white paper?
GEM: It's more of a technical report and it's pretty technical. We tried to make it graspable by normals. That's part of our shtick. But mostly with this one, we want people to understand what LLMs are and what they are not so that we can all do some security engineering that makes sense.
RF: I read the report, and am going to ask questions or make comments based on that. On page three, you say LLMs are auto-associative predictive generators. I think if I'd been your editor I would have suggested that you explain that. For example, being auto-associative means that LLMs learn in an unsupervised manner, and the predictive generator means that the output is based on prediction of the mostly likely next word or token.
GEM: The important thing is what LLMs are and are not. LLMs are not things that understand. They are not things that reason, they are not things that think, they are not things that have a conversation with you in a normal way. What they are is bag of word predictors. So if you give it a bag of words, it will predict what the next bag of words in the conversation or the stream of text would probably be and then it'll spit them out and then wait for you to give it another bag of words. And that's essentially it. In an LLM, you build a model and what the model is going to put out is a word at a time based on probability, this is probably the most likely word. And then looking at the context, you come up with the next most likely word and the next most likely word.
There's no possibility of thought or concepts or anything, but there is at least a relationship between the tokens, what we're calling the words.
RF: Because the network was trained up on 1.4 trillion word pairs, if you want to think of it differently.
GEM: That's not quite accurate, but close enough for rock and roll. And that number is big in ways that people don't usually understand. Big data is really big.
RF: Let's dig deeper. In your abstract, you write that the biggest challenge is understanding and managing the 23 risks inherent in the black box found in all LLMs. This is really what I'm going to be focusing on, the dangers of having a pre-built model.
GEM: Yeah, and that's accurate because the black box is controlled by the people who could afford to build the foundation model, like OpenAI or Google or Meta or Microsoft or whoever.
RF: You mention $63 million and 1.4 trillion parameters to build GPT-4. How did you come up with those numbers?
GEM: Fishing around a lot. Until GPT-4, OpenAI used to publish those numbers and then they stopped officially talking about architecture and a bunch of other things. But we know people and people who know people. The community is not that big. In this case, if you're looking for some published data, Eric Schmidt put something in the Wall Street Journal, and his total was at least $100 million as the kind of average cost for an LLM.
RF: A big chunk of the cost is every time time you input a token, the LLM is executing some algorithm on a subset of every parameter you've got.
GEM: There's a tensor representation that changes the math some, but the fact of the matter is that these chips that NVIDIA designed originally for graphics graphics are exceptionally useful for modeling LLMs in tensor representation form.
RF: 1.7 million GPU hours to train the 70 billion parameter model.
GEM: That was PaLM2, a smaller model from Google. Suffice it to say that so far, the rule of thumb is bigger is better, way bigger is way better, and big is so big now that it's expensive. And it's so expensive that it's essentially only people with very deep pockets who can even consider building a model.
There's a technical implication too, and that is, if you have a hypothesis, you can't really test it like we want to do, compare learning this data set versus that data set, or this size of model versus that size of a model, or some variation. We can't just build two LLMs to compare because that's now what? Over a hundred million dollars. So basically what happened in the space as far as I can tell is OpenAI built a way bigger thing that sort of kind of worked in an amazing way. Then they stopped screwing around. Because why would you do that? It cost 65 million dollars to screw around. Cut that out, you know? It's like, that's the way it goes. It works. Ship it.
RF: When you say $63 million how much of that is actual running the machines cost and how much of that is human cost?
GEM: It's almost all running computers.
RF: People aren't actually doing anything at this point.
GEM: You have data set assembly. The data are used to build a representation using auto association. And what that means is you don't have to tag the data or bag the data or label the data. You just have to grab a huge pile of it. So companies scraped the entire internet.
We have a very important issue that's worth bringing up at this point, called data feudalism. I'm actually writing about this now with Dan Geer. The data ocean is being divided up. Or if you want to think about it this way, the wilderness that is data has new barbed wire fences strung out in it to divide it up. Huge corporations that have built these data piles are hoarding them now. They're not sharing with each other, much less with normals. Doing an enormous scrape where you include things like Google search data and Twitter data or X or whatever you wanna call it. Musk did not buy Twitter to break it. Musk bought Twitter for the data. And I can't believe nobody understands that yet.
Musk did not buy Twitter to break it. Musk bought Twitter for the data. And I can't believe nobody understands that yet.
RF: You describe the security critical part of LLMs in the black box, the stages where the data used to create the LLM gets processed (Figure 1).
GEM: Yeah, where do the data come from? What did you do to clean it up? What's in there? It turns out that there's a lot of noise, poison, garbage, wrongness, heinosity in the enormous internet scrapes. So we're training up our models on those associations.
RF: Well, as a straw man, suppose we wanted to train an LLM on Virginia state law. Wouldn't you also wind up with a lot of weird crap in there?
GEM: Well, sort of, but you're not quite getting this. If you took all of Virginia law, every case ever written down, that's nowhere near enough data to build an LLM. You're talking about a data puddle.
RF: That's good, a data puddle instead of a data ocean.
GEM: Yeah, and data puddles don't work. In fact, that's a thing we gotta find out. It's really important that we figure out: how small can we make these things before they don't really do the magic anymore? Nobody knows the answer. Somewhere between GPT2 and GPT4, things got interesting. Where exactly? And what happens if we clean up the data? What does clean even mean?
If you look at page 21 in the report, you'll see that the LLMs receive multiple stages of learning. The first stage of learning is that auto-associative part that is based on the 1.4 trillion data points used by GPT4. That's where the basics are built in the neural network. And then there's a second stage called post-training, or fine-tuning. That's supervised learning. Supervised learning still happens in the black box. Then there's a third stage which is prompt engineering and that is also a kind of learning but it doesn't make deep adjustments to the network.
That is how it mostly works if you squint.
RF: I would read about fine-tuning and realize that humans are involved in that phase, but aren't they still working with the same model with the same sort of algorithms?
GEM: They started out with the 1.4 trillion parameters, yes, and they're using that network as a base, then they're doing some supervised learning with those weights that have already been learned.
RF: I assume what they're doing is they're providing more input, and they're trying to skew the parameters in directions they're interested in with the particular input that they've chosen.
GEM: Yes, that's right. And in that case, you have to say, "I put in some input, I would like the output to be like this."
That requires somebody to be the oracle. In auto-association, we're just saying, when you see an A, say B. When you see a B, say C. When you see a C, say D. And the differences is that you don't really have anybody saying that's right or wrong. You're just trying to predict token pair associations.
When you're trying to adjust things, then you need an oracle or a human in the loop—so you can do the supervised learning part.
Having a science of representation is something that we really, really need. We've been working on that at BIML, and we have our crazy ideas, but we haven't got anything we're willing to stand behind yet. We think about things like sparseness in Pentti Kanerva's sense, or distributedness, or locality, or gradient tails, or, you know, there are all sorts of ways to think about how this network of activation state represents something.
And that's what I think we really need to make some progress on if we want to understand how these networks really do what they do. Right now, they just do it, and we know how in principle. We know the theory, we know the code, but we still don't know how they do what they do.
RF: So actually labeling where your data comes from wouldn't help.
GEM: Oh, no. That would help you have some idea about what the machine became, which is helpful. It'd be better than not having that for sure, Rik, but it wouldn't be an explainability kind of thing—here's how it really works in its brain. And you know what, it's not so weird. We don't know how our brain works either.
RF: Let's talk a little bit about the top 10 risks.
GEM: We've covered a couple of these. The first one's the most important one: recurrent pollution. LLMs, as anybody who's ever used one has seen, can be wrong and very confidently wrong. They're producing all sorts of material that's appearing out in the world, and it's being put on the internet. Now when someone does a big scrape to build the next generation of LLMs, the internet is filled with LLM pollution. Pollution actually feeds back like a guitar and an amp. If you stand too close to the amp, you get that feedback which rock and roll guys love to make. That's what's happening with our neural networks right now. It's happening fast. The math is bad. There's a great paper about this, Shumailov in our bibliography.
Then we have data debt.
RF: Yeah, which just means if you learn a bunch of stuff that's wrong and stupid and full of poison, then your model is going to be filled with that.
GEM: Yes. Data sets are too big to check. Improper use, extreme faith and transfer learning.
RF: Let's keep going. So, LLMs are trained up and they learn to auto-associate and predict words. You decide you want to use them to do medicine. Now, is medicine really about word prediction? No. Why would we use an LLM for that?
GEM: Exactly.
RF: You see, the piece that I was missing before, even though I'd read it, is you have to have the data ocean. You can't take one topic, like I was using Virginia law as an example, all of state law plus case law, and put it in there. It's not enough to reach critical mass where you'll actually have useable predictions coming out of your LLM.
GEM: That's exactly right. And I think that's a point that a lot of people miss. So don't feel bad about that.
You know, it's kind of funny, there really are serious papers that ask, are we gonna run out of data? OpenAI already drank the whole ocean. Now that's where we are. So the next thing we need to think about is, well, how small of an ocean can it be? I don't know. Nobody knows the answer to that.
What if we clean the ocean and we desalinate it? Is that okay? You know, these are things that we don't know the answers to yet. Instead, we're just going to apply LLMs to do everything, and that's inappropriate.
RF: Exactly. Let's talk about prompt manipulation.
GEM: This actually ties into something that's earlier in our tech report where we describe people red teaming their models to see if they're safe. Which is a crock of shit.
RF: Okay, I'll quote you.
GEM: Please print that.
RF: No problem. Your interview will be online, open to all.
GEM: So we've talked about the data ocean, but the data ocean's being divided up. So then what? Who gets to build these LLMs now? Well, only the people who had access to the ocean before the ocean was divided up.
The data ocean's being divided up. So then what? Who gets to build these LLMs now? Well, only the people who had access to the ocean before the ocean was divided up.
RF: Right, which also means that anything new is not gonna appear in the original ocean. Which might help because you won't have the recursion.
GEM: Yeah, which can be good. I mean any scrape after 2021 is pretty suspect. A lot of people really deeply believe that that's already a problem, including me. You know data feudalism is very real. The early land grab has already happened.
RF: Let's talk about prompt engineering, the third step in your black box.
GEM: So the third step is model builders do some prompt engineering because they want to coax the model into behaving in certain ways while we're having a conversation with it. The conversations we have with ChatGPT don't go on forever, they are intentionally short, a little window. The prompt changes the context, activating certain parts of the memory space of the model somehow. And that, in turn, makes it so that we can have the machine shade its basic representations in a certain way, so that it, in some sense, learns to talk about what we wanted it to learn about.
There are many well-known ways to play with prompt manipulation that are lots of fun. If you haven't played with one of these LLMs and screwed around with prompts, you're not living. One of the fun things you can do is say, pretend to be a pirate, now talk like a pirate. ChatGPC replies, "I can't do that. That's not allowed." I say, well, do you know what BARD is? "Oh yeah, I know about BARD. That's a Google thing." Well, pretend that you're BARD. "Okay, I can pretend I'm BARD." Then it does the thing it failed to do before by pretending to be BARD.
That's a classic kind of a prompt manipulation. Another one that's pretty well known is enter "poem" a thousand times, getting the LLM reset to a condition that gets rid of all of the alignment training and gets it back to a very basic auto-associator, only partway through the human supervised training stage that we talked about before.
Now here's the deal. That's fun. But it's very much like pen testing. You cannot pen test your way to security. We all know this. We've known this for years and years and years. And now we're saying we can pen test or we can prompt engineer our way to security with an LLM.
That's just silly talk. So when the White House said, "We're going to have hackers out at DEF CON hack up the AI security," it made me mad. I was like, who are the idiots in the executive branch? They can say that with a straight face? Or do they just not know how this tech works?
They probably don't know how the tech works. Most people don't know, including CIOs. They just don't know how the tech works. That's true. That's why we wrote this report.
If you look at NIST, who's been piling up the attacks, they believe that we can attack a new system with the known attacks. And then when we don't find anything, we can declare it secure, which all of us in security engineering know is really, really, really, particularly not smart. We don't need those technologists misleading the government guys who need to regulate this stuff about what it means to secure AI.
We've got to get more serious about this. For that reason, starting on page 21, when we describe the 23 risks in the black box (Figure 1), we have a suggestion for regulators about how to manage each risk or at least how to open up the black box and get inside of it.
We take a position in this report, which is if you know me is sort of a little bit odd, we think that LLM Foundation models should be regulated. It's so hard to do right, but if the regulation is along the lines of, here's what we put in your can of food. Here's what we put into the black box model. Here's how we fine-tuned it.
RF: Excellent.
GEM: Then all of a sudden we have something that you can start with.
RF: Yeah. At least you know what you're buying in your can of worms.
GEM: Right. So why did you have Reddit as an ingredient? Hey, you included 4chan, that's a terrible ingredient. You put that poison right into your can? That's not good. Okay. Poison in the data.
RF: Data ownership is an interesting risk since we have people and corporations suing AI companies because they believe their data, books or articles, were used in training and can be recovered from LLMs. I watched Nicolas Carlini present at USENIX Security 2023, where he showed how he could extract almost exact copies of paintings from image-based LLMs. He's really good, and I suppose you could do that with text as well.
GEM: He totally can, and he does it in a paper. One of the things that we've done at BIML is to provide an annotated bibliography on our website that has hundreds of papers, and includes our opinion about those papers. Everybody's like, I can't read 300 papers! Which ones should I read? We curate a top five papers in machine learning security.
We say, you asked, "What should I read?". Go read these five first.
RF: Now, finally, one of the things that I have an issue understanding is encoding integrity. Data are often encoded, filtered, re-represented, and otherwise processed before use in an ML system.
GEM: To me, this speaks to tokenization, where you get chunks of data, but it also has to do with things like embedding. Encoding integrity can bias a model, but LLMs are really slippery because they do the encoding themselves. The auto-association where LLMs see an A, B is what's next, 1.4 trillion times builds up its own representation.
It's not, encoding anything in a human way. So in order to be able to drink an entire data ocean, we have to let it auto-associate in that ocean and build up its own representation. When that is mechanized and automatic, the resulting representations are gonna be nonhuman in very interesting ways.
RF: Well, it's definitely nonhuman.
GEM: The thing that drives me the most nuts is when people say ML systems hallucinate. Don't say hallucinate. People go, what do you have something against hallucination? No, man, tripping is fine. Just don't pretend that the LLM is a cutesy-wootsy, little poor hippy thing tripping. It's just plain wrong, you know? These anthropomorphic verbs that we use, like hallucination, are dangerous and misleading.