USENIX supports diversity, equity, and inclusion and condemns hate and discrimination.
Exploring the future of storage technologies: Part 2 of our FAST '15 co-chair interview
This is part 2 of an interview with the co-chairs of FAST ‘15 about the upcoming conference. In this segment they talk about current research topics in storage research and the future of IT. Jiri Schindler, Principle Engineer at SimpliVity and Erez Zadok, Associate Professor at Stony Brook University. There is still time to register to attend the conference Feb 16-19, 2015 in Santa Clara, CA. The first part of the interview is here.
Q: What are some of the evolving topics at FAST?
Jiri Schindler: One of the topics that is evolving is how we organize data for Big Data applications -- whatever that means.
Also how do we handle management or access to data that are on these evolving byte-addressable nonvolatile memories and technologies. What does it mean for the system to use those? It is about storing and accessing data, but is also about the new technology. I think FAST is great at reaching out to different areas -- not all the way down to the physics of the devices, but really being aware of what that means -- all the way up to how different frameworks for Big Data intersect.
We are pretty close to having some sort of byte-addressable storage or nonvolatile memory products right around the corner. There are a lot of SSD array vendors and a lot of new startups that are doing exciting work. SSDs and flash memory are beyond the horizon of FAST discussions -- but how do we talk about something that has these new properties, where things are almost the same as DRAM access? Having to step back, look at the fundamental issues that drive that, and discuss it from the academic perspective is very important.
In the industry, there’s a bit of a chicken and egg problem. There are certain vendors who say, “Hey, we can make those but there’s not really a good use case for that, or there’s no company that would make good use of it, so what do we do?” Academia can bring that fresh view and say, “Hey, if we rearchitect the IO stack, and you look at it in a different way, maybe we can make some new co-applications and a different way of doing it, rather than traditional block-based IO with scsi or san or what have you.”
Q: People outside the field might say storage is a solved problem. Why is it still interesting?
Erez Zadok: Think about what the storage does. We’ve had a lot of advancements in virtualization, replication, redundancy. If you think about it, you can replace computer hardware, networking, you can put in all sorts of redundancy. You can even replace people if they’ve left. But the one thing you cannot replace is your intellectual property, your actual data if it is lost, corrupted, or stolen. And who is at the forefront; who is the guardian of that data. It's the storage, right?
And so really the storage is the first line of defense that controls performance and security of the data. That’s why it is so important. I also tell people that we live in a renaissance of storage, of sorts. I happened to teach a special topics in storage course this semester to graduate students. And I tell them this device, magnetic spinning media, has existed for more than 50 years. A tremendous amount of software and algorithms and data structures have been created revolving just around that idea. For example, locality -- placing data close to each other. It was basically to minimize latency from things like head seeks. But now we live in a world where we see a whole lot of new types of storage technologies that even though, from a block interface, they look the same. internally they look radically different. We're going to have to rewrite much of the software, especially operating system software, to accommodate how flash actually works internally, how phase-change and nonvolatile memories operate.
There will be all sorts of tiers and combinations of storage from local storage, something that's close to the CPU, all the way to cloud-based storage. We're going to see more and more of those combinations, hybrids, and tiers and a lot of software is going to have to be radically rewritten and adaptied to this new world.
Q: What areas of research excite you these days?
EZ: The kind of thing that excites me is, first of all, the realization that I don't think we'll see any one storage technology dominate very quickly. I think we're still going to see tape-based systems out there, and we're going to see regular disks. The hard disk industry is working very hard on shingle drives. They're going to give you relatively slower access but a lot more capacity. The flash vendors also pushing forward as well. So I think we're going to see a lot of those mixes of things, and what excites me is to look at how to create customized storage systems for specific applications or specific workloads. I think, for example, a database versus a web server or a mail server require very different access patterns from the storage. You really have to customize the entire software stack for these kinds of important and popular applications.
The other thing that interests me is how are we going to secure all this massive amount of data that we are producing at amazing rates. As you know, we are producing more data electronically than we have ever. How are we going secure it? How are we going to ensure the data's privacy is protected? That only the right people who need to can access to it? How are we going to ensure its integrity and longevity? There is currently no technology that we know of that will store a bit correctly and consistently for a hundred years or more. so this is going to be a serious problem. How do we preserve all this information going forward?
Q: Does hyperconvergence play into the future of storage?
JS: What's exciting technology-wise is that people aren’t thinking any more about the traditional tiers or silos: this is a networking management problem, this is a cpu/memory/operating systems challenge, this is storage challenge. They just think, I have an application that needs to run. I have a collection of entities that perform some service in that application, and that application or service includes all of the above. It includes handling cpu and memory resources, handling network resources -- because all of them are distributed, and no one writes anymore for a single server unless there's some legacy application or reason for that -- and obviously storage is really no longer sort of an hourglass interface with a uniform SCSI block layer or filesystem layer. By being able to reason about what's above the system and combine those silos into one, lots of opportunities arise. In my view, that's the definition of hyperconvergence. Obviously there are different stages through it, but this is not only because we are thinking fundamentally differently about the role of these traditional OS or system categories, it's rather that is such a huge evolution, if not revolution, in these things that we can now bring them together and do much more in totality than just the sum of each part. What that means is still to be determined, but it's an exciting time where new technologies, new interfaces, new ways of reasoning about things will open that for us.
Q: How does the job of the IT admin change in the future?
EZ: I think in some respects from a storage angle, some things for IT people will become easier. For example, the fact that you can now easily provision and virtualize systems and take snapshots, and try a lot of things much, much easier and faster than in the old days, when every little test required reinstalling a physical computer before you could tell if the configuration was correct. But other things are going to be harder because the amount of data that has to be processed keeps growing and growing.
People are asking me what exactly is Big Data. From my perspective, Big Data is when you are processing so much data by some study or application, that it can never fit in all the amount of memory that you have. So fundamentally it's going to be kind of an IO-bound workload. Otherwise it's not big enough to be Big Data. And when it comes to that, there's a huge difference in how you customize or configure your system. So I think more and more IT folks are going to have to know some of the internals of storage systems and decide: What am I going to put in this one? What am I going to put in that one? How to configure features such as how much cache should we keep? What is the rate of dedupe I want to allow and so on? They are going to have to understand some of these in order to optimize their systems for their users.
JS: The human aspect to managing or keeping track of what's going on in a data center or a company will not go away. We are far from the HAL of 2001: A Space Odyssey. But I think the role of an IT admin is evolving and the developments that we see will enable the human to focus on the important aspects and bring to fore the unique characteristics that humans have. They have history, they learn from past experiences, and then they apply it to the future. I think that's a very hard problem in the academic sense, in a abstract or general sense. So I think all these advancements will lead to IT admin specialists not being just networking specialists or storage specialists or application specialists. I think that the manual piece -- how do I make sense out of a disk array of 2000 disks, or how do I slice and dice disks and create LUNs -- will luckily go away, I hope, pretty soon so that the humans and the IT admins can focus on what they do best by being humans. By having intuition, by learning from experience and by correcting things and mediating competing requirements from different users, whether those are departments or applications that are requiring lots of CPU horsepower and not so much IO or vice versa. I think that the role of the IT person is evolving. I hope that it will be better and not require a detailed knowledge of some specific implementation of some vendor, how they define disk groups. And I think what's important for them at FAST is to know that there is a whole new category of hardware and resources. These fast nonvolatile memories -- I think it's not at all clear what they’re going to look like to the admins, and vis-a-vis to the users of the applications. I think it's still a problem that is being defined in the academic space with something like FAST, and we don't have companies that deliver those yet, and we don't have processes or tools and appropriate things for admins to really make sense of that and make it being a total part of IT shop for their employer.