When one of the main distributers of fuel to the Eastern US got shutdown by ransomware, I wondered once again: what about backups and disaster recovery (DR)? I asked some of my security friends, but their only answers were along the lines that DR was an IT task that wasn't prioritized.
I also wondered how it was that suddenly, all of your systems begin reading, encrypting, and writing all files and nobody notices. Or that backups, another often low priority duty, don't appear to work at all.
I finally got my answers to these questions and a lot more during Ski Kacoroski's keynote at LISA'21: https://www.usenix.org/system/files/lisa21_slides_kacoroski.pdf. Ski explains in detail the recovery from the attack, how their insurance company was involved, how long it took to recover (spoiler: months), and the few positive things that came out of the experience. You'll need to read Ski's slides for answers to most of these questions. I had my own questions, and Ski answers a few of them here.
Rik Farrow: In your talk, you present a timeline representing different phases of the attack, starting six months before the actual ransomware attack. How did folks figure out this timeline?
Ski Kacoroski: The timeline was generated by the incident response company from the forensic data. I am not 100% sure how they figured it out, but I assume is was a combination of when Emotet and Trickbot first started appearing on machines, checking through the emails, and their knowledge of how the attacks happen.
RF: During discussions about the Colonial Pipeline ransomware, I asked some security friends why there were no backups or a working disaster recovery plan that would allow Colonial to recover without paying a ransom. Your experience helped me to understand why DR often doesn't help. Could you explain?
SK: Oh yeah, this is a topic near and dear to my heart. When you plan for a DR situation, you make assumptions about the DR situation such as a cyber attack, loss of power, hurricane, or earthquake. You then create your backup and DR plans based on those assumptions. In my case, I never even considered that my backup data stores could be destroyed so easily and I assumed that breaking my airgap for a few days a year was acceptable. Both assumptions were incorrect. I hope my talk will get people thinking about the assumptions they are making in their DR plans.
RF: You mention that word started getting around during the first full day after the encryption occurred, and both volunteers and contractors started flooding onto the scene. You mention that the two system administrators became the bottlenecks in getting work done, and I am guessing you were one of them. How did your management help you?
SK: Yep I was one of the bottlenecks as my roles included Storage, VMware, and Unix administration :). My management helped in several ways:
- acting as a gatekeeper and filtering out all but the most critical questions.
- taking tasks that we needed done such as verifying a service was running, files restored correctly, training users on how to access temporary spaces and making sure the task is completed.
- making sure that whatever we needed or wanted was provided for us such as food, supplies, equipment.
- setting up spaces for the additional people and giving them tasks to work on.
RF: You said that your NAS appliances used for backups had snapshots intact, but the regular backups were useless. Could you explain this? I am guessing that you ran backups every night, so that encrypted data got copied to the NAS. Were snapshots on the NAS appliances automatic, a default, or something that you had set up?
SK: The database backups were on the NAS storage file systems which were snapshotted. The VM backups were on NAS storage file systems which were not snapshotted because we did not have enough capacity to snapshot the VM backup storage.
RF: Because most of the district's staff were using Macs, and lots of your services ran on Linux or were hosted, some people were not affected by the attack. In fact, they were wondering what the fuss was about. How did you deal with getting through to these folks, as well as other people who were confused by your progress reports, as services came back online?
SK: Fortunately, I did not have to deal with this. My management took care of it.
I asked our IT director, who said:
"Essentially having a few critical systems hurt that were not used by many people got the message across. Having our Payroll system crunched actually helped here - people got some clarity around the severity of things when we explained that we weren’t immediately sure that they’d get a paycheck at the end of the month … and then later we became sure that they’d get a paycheck, but it might not be correct … and only then did we gain confidence that it would all be correct very, very close to payday."